The Problem with UI Changes
Unit tests are great at telling you that a function still returns the right number. They are terrible at telling you that a button still looks like a button.
Most UI regressions are not logic bugs. They are a tweaked Tailwind class, a font stack that fell back, a flexbox rule that pushed an icon one pixel to the left. Every one of these is invisible to expect(value).toBe(...) — but the moment a real person loads the page, they see it.
Visual Regression Testing (VRT) fixes this by comparing actual pixels. You take a screenshot of a component, commit it to the repo as a baseline, and on every future run you compare against it. If pixels drift, the test fails.
Why Storybook + Playwright + Docker
There are three problems you have to solve before VRT becomes useful:
- What to screenshot. You need each component rendered in isolation, in a state you control.
- How to screenshot it. You need a headless browser that drives real rendering.
- Where to screenshot it. Fonts, sub-pixel anti-aliasing, and GPU rasterization all differ between machines. A baseline taken on my laptop will fail on yours.
Each piece of the stack solves one of these problems:
| Tool | Category | Role in VRT |
|---|---|---|
| Storybook | Component workshop | Renders each component in a fixed, isolated state |
| Playwright | Browser automation | Navigates to stories and captures screenshots |
| Docker | Sandboxed environment | Makes the render identical on every machine |
Storybook tells you what to shoot, Playwright tells you how, and Docker tells you where — in a container that looks the same on my laptop, your laptop, and CI.
How It Works on This Site
This blog is where I dog-food all of this. Every component has a .stories.tsx file, and a single Playwright test iterates over every story and takes a screenshot.
The whole setup is about fifty lines of configuration. Here is the heart of it:
import { expect, test } from '@playwright/test'
import storybook from '../../storybook-static/index.json' with { type: 'json' }
const stories = Object.values(storybook.entries).filter(
(entry) => entry.type === 'story',
)
for (const story of stories) {
test(`${story.title} ${story.name} should not have visual regressions`, async ({
page,
}) => {
const params = new URLSearchParams({ id: story.id, viewMode: 'story' })
await page.goto(`/iframe.html?${params.toString()}`)
await page.waitForSelector('#storybook-root')
await expect(page).toHaveScreenshot(
[
...story.title.toLowerCase().split('/'),
`${story.name.toLowerCase().replace(' ', '-')}.png`,
],
{ fullPage: true, animations: 'disabled', maxDiffPixelRatio: 0.005 },
)
})
}
A few details worth pointing out:
storybook-static/index.jsonis produced bynpm run build-storybook. It contains every story id, so the test suite automatically grows as I add stories — no hand-maintained list.animations: 'disabled'freezes CSS animations, otherwise a shimmer or fade would fail the diff randomly.maxDiffPixelRatio: 0.005gives a tiny tolerance (0.5% of pixels) for sub-pixel noise that slips past even Docker.
The Docker Layer
Here is where most VRT setups fall over. If I take a baseline on macOS and you run the test on Linux, fonts render differently and every single screenshot fails. The fix is to never take screenshots on the host. Always take them inside the same Linux container.
The Dockerfile for the visual tests is tiny:
FROM mcr.microsoft.com/playwright:v1.59.1-noble
WORKDIR /app
COPY package.json package-lock.json .npmrc ./
RUN npm ci
COPY .storybook/modes.ts ./.storybook/
COPY tests/visual/visual.spec.ts ./tests/visual/
COPY tests/visual/playwright.config.ts ./tests/visual/
CMD ["npx", "playwright", "test", "--config=tests/visual/playwright.config.ts"]
Two things to note:
- The base image is Microsoft's official
playwrightimage, which already has the exact browser binaries Playwright expects. - The built Storybook (
storybook-static/) and the screenshots directory (tests/visual/__screenshots__/) are mounted at run time rather than baked into the image. Baselines stay in git, the container stays cacheable.
The npm scripts tie it together:
{
"test:visual:build": "docker build -f tests/visual/Dockerfile -t michalkolacz.com-visual .",
"test:visual": "npm run build-storybook && docker run --rm -v ./storybook-static:/app/storybook-static -v ./tests/visual/__screenshots__:/app/tests/visual/__screenshots__ michalkolacz.com-visual",
"test:visual:update": "npm run build-storybook && docker run --rm -v ./storybook-static:/app/storybook-static -v ./tests/visual/__screenshots__:/app/tests/visual/__screenshots__ michalkolacz.com-visual npx playwright test --update-snapshots"
}
Three scripts, three jobs:
test:visual:build— build the container once.test:visual— run the suite and fail on drift.test:visual:update— intentionally overwrite baselines after a UI change I actually want.
The Workflow
Day to day, it feels like this:
- Change a component
- Run
npm run test:visual - Open the HTML diff report if anything fails
- If the change was intentional, run
test:visual:update - Commit the new baseline alongside the code change
The baselines sitting in git are the part that makes this whole thing work. They turn "does this component still look right?" — a subjective, human question — into "does this PR change any pixels?" — a flat yes/no the CI pipeline can answer.
What VRT Won't Do For You
VRT is not a replacement for unit tests or accessibility tests. It won't tell you that a button is missing a label, or that a form submits the wrong payload. It will only tell you that something looks different.
Treat it as a safety net under the rest of your test suite. Combined with Storybook stories — which you probably already have if you care about a component library at all — it is one of the highest-leverage pieces of tooling I have added to this project.
If you already have Storybook set up, you are maybe a hundred lines of code away from having this too. Steal my config.
