Visual Regression Gate
Committed, theme-aware screenshot baselines that block PRs on unintended visual changes.
Visual Regression Gate
The visual gate snapshots a fixture story across the default + docs themes and
diffs each render against a screenshot committed in the repo
(apps/storybook/src/__screenshots__/). It catches unintended visual changes — a
shifted token, a broken layout — that lint, types, and axe cannot
see. There's no cloud snapshot service: baselines are reviewed in the PR diff like
any other artifact.
Why it's a separate command
Every other gate (axe, keyboard) inherits through the bare pnpm gate. The visual
gate is the one exception, for a single reason: a committed screenshot is only
stable within one render environment. OS font hinting/antialiasing differ between
macOS and Linux, and even Linux runner images drift over time — Vitest encodes
platform + browser into the baseline filename precisely because the bytes
aren't portable. A baseline made on a Mac would never match Linux CI.
So baselines are Linux-only, generated and compared inside a pinned
Playwright container (mcr.microsoft.com/playwright, pinned to the installed
playwright version). That needs Docker, which pnpm gate deliberately avoids —
hence a sibling pnpm gate:visual. CI runs the identical test:visual natively
in the same image, so the Docker command reproduces CI byte-for-byte.
The complete pre-push check is therefore:
pnpm gate && pnpm gate:visual
Use it
pnpm gate:visual # compare against committed baselines (Docker)
pnpm gate:visual:update # intentionally regenerate + review new baselines
Updating baselines is a deliberate act. When a visual change is intended, run
pnpm gate:visual:update, then review the changed PNGs in the diff exactly like
code before committing them. Never hand-edit baselines or regenerate them outside
the container — a host-platform screenshot will not match CI.
When the gate fails in CI, the dedicated visual job uploads the captured
diff/actual images as a downloadable artifact on the workflow run, so the
regression is reviewable without reproducing locally.
How it works
- Vitest 4
toMatchScreenshot, in the existing browser harness. The same@vitest/browser(Playwright/Chromium) stack that renders stories for axe takes the snapshots and diffs them (pixelmatch) against__screenshots__. No second runner. - A dedicated, static fixture.
visual.stories.tsxrenders only theme-reactive--ib-*tokens — no animation, caret, time, or random — so the only legitimate reason a baseline changes is a token/theme change. (No real component exists yet, #4; the gate self-tests against this fixture, as axe and keyboard do theirs.) - Both themes via the toolbar's contract.
baseline.visual.test.tscomposes the fixture with the real preview annotations (loading the scoped theme CSS) and snapshots each theme, drivingdata-themeon<html>from each story'sglobals.theme— the same attribute the theme toolbar writes. - Strict pixel budget. A near-zero tolerance
(
allowedMismatchedPixelRatio: 0.01) on a fixed viewport — affordable only because the pinned container removes cross-machine antialiasing variance — so real regressions can't hide under loose slack. - A dedicated CI job, the conscious exception to "no workflow edits". A
containerized
visualjob inci.ymlrunstest:visualand uploads diff artifacts on failure; the baregatejob is untouched. Containerization and artifact upload are genuinely new infra the inheritance model never covered.
See ADR 0012 for the full rationale.
Maintain it
- Add coverage (from #4) with a
*.visual.test.tsnext to a component's stories that composes them and callstoMatchScreenshot— the same per-component model as the keyboard gate'splayfunctions. - Re-baseline only on purpose via
pnpm gate:visual:update, then review the PNG diff before committing. - Keep snapshot subjects static (tokens only, no motion/time) so red always means a real regression.