Open-source AI regression testing

The open-source AI framework for regression testing.

Built on Playwright. Describe tests in plain English. AI runs them once, caches every action, and self-heals when your UI changes. Your regression suite, on autopilot.

Simpler than hand-written Playwright. More reliable than real-time AI.

Get Started How it works

~/your-project

➤ npx passmark init

✓ Created passmark.config.ts

✓ Connected to Redis

✓ Ready to write tests

➤ npx passmark test

• Running 12 tests across 3 suites...

✓ 9 replayed from cache (~4ms each)

✓ 3 healed and re-cached (~28s each)

12 passed · 0 failed · 1.4s total

➤

The problem

Your test suite can't keep up with your team.

Your engineers ship 50 PRs a week. AI coding tools made them faster. But testing? Still manual. Still brittle. Still someone else's problem. Most teams live in one of three places:

Passmark takes a different approach. ↓

No tests.

You skipped e2e testing because nobody had time to write and maintain a Playwright suite. Regressions hit production. Users find the bugs.

Broken tests.

You wrote tests six months ago. Half are skipped. The rest flake on every CI run. Your team ignores the failures. The suite is dead weight.

AI tools too slow.

AI testing tools promise to fix this. But they run every action through an LLM, every time. That works for checking a single PR. It doesn't work for 500 regression tests running on every commit. Too slow. Too expensive. Too unpredictable for CI.

How it works

AI once. Playwright speed forever.

Describe

Write test steps in plain English. No selectors. No page objects. No boilerplate.

steps: [
  { description: "Add the first product to cart" },
  { description: "Open cart and proceed to checkout" },
  { description: 'Fill shipping address with zip "90210"' },
  { description: "Complete payment with test card" },
]

Execute

On first run, AI agents navigate your app using accessibility snapshots and screenshots. Every action is cached to Redis.

First run:  AI agents → ~30s per step

Replay

Every run after uses cached Playwright actions. Native speed. No LLM calls. No API costs. When a cached action fails because the UI changed, AI re-engages only for that step and re-caches.

First run:  AI agents → ~30s per step
Every run after:  Cached Playwright → ~ms per step
Auto-heal:        Cache miss → AI re-discovers → re-caches

Code

This is a Passmark test.

8 lines of test logic. Plain English. Runs on Playwright. Self-heals when your UI changes. Cached after first run.

Compare that to the 40-line Playwright test with data-testid selectors, waitForSelector calls, and hardcoded timeouts you'd write by hand. Then imagine maintaining 200 of those.

purchase.test.ts

import { test, expect } from "@playwright/test";
import { runSteps } from "@bug0/ai";

test("user can complete purchase", async ({ page }) => {
  await runSteps({
    page,
    expect,
    userFlow: "purchase flow",
    steps: [
      { description: "Navigate to the store" },
      { description: "Add the first available item to cart" },
      { description: "Proceed to checkout and complete purchase" },
    ],
    assertions: [
      { assertion: "Confirmation page shows an order number" },
    ],
  });
});

Comparison

Where Passmark sits.

Hand-written Playwright

Plain English test authoring
Runs at native Playwright speed
Self-heals on UI changes
Built for CI (500+ test suites)
Cross-test state sharingManual wiring
Email and OTP testingExternal tools
Multi-model consensus assertions
$0 AI cost on repeat runs
Open-source

Passmark

Recommended

Plain English test authoring
Runs at native Playwright speedYes (cached)
Self-heals on UI changes
Built for CI (500+ test suites)
Cross-test state sharingBuilt-in (Redis)
Email and OTP testingBuilt-in
Multi-model consensus assertions
$0 AI cost on repeat runs
Open-source

Real-time AI tools

Plain English test authoring
Runs at native Playwright speed
Self-heals on UI changesPartial
Built for CI (500+ test suites)
Cross-test state sharing
Email and OTP testing
Multi-model consensus assertions
$0 AI cost on repeat runs
Open-sourceSome

Built for regression

Everything you need at test 200 that you don't think about at test 1.

Cross-test state

// Test 1: Sign up
data: { email: "{{global.dynamicEmail}}" }

// Test 2: Log in (same execution, same email)
data: { email: "{{global.dynamicEmail}}" }

Built-in email testing

Disposable inboxes. OTP extraction. Verification link clicks. No Mailosaur. No third-party infra.

{ description: "Sign up", data: { email: "{{run.dynamicEmail}}" } },
{ description: "Enter OTP", data: { otp: "{{email.otp:Extract the 6-digit code}}" } }

Consensus assertions

Every assertion is evaluated by Claude and Gemini independently. A third model breaks ties. Passes only on consensus. Returns a confidence score (0-100).

assertions: [
  { assertion: "Dashboard shows a welcome message for the new user" }
]
// Evaluated by multiple models. Passes only on agreement.

Dynamic test data

Every run gets unique IDs, emails, names, phone numbers. Parallel tests never collide. Three scopes: {{run.*}} (per test), {{global.*}} (per suite), {{data.*}} (per project).

// Per test run
data: { email: "{{run.dynamicEmail}}" }

// Per suite (shared across tests)
data: { email: "{{global.dynamicEmail}}" }

// Per project
data: { apiKey: "{{data.stagingApiKey}}" }

CI integration

npx passmark test runs Playwright.

Your existing Playwright config, fixtures, and CI setup work. Passmark doesn't replace Playwright. It sits on top of it.

GitHub Actions, GitLab CI, Jenkins, any CI runner
Parallel execution with zero shared-state collisions
OpenTelemetry tracing for every step, cache hit, and assertion
Redis for action caching and cross-test state

test.yml

# .github/workflows/test.yml
- name: Run regression tests
  run: npx passmark test

About

What is Passmark?

Passmark is an open-source AI regression testing framework built on Playwright. Developers write end-to-end tests in plain English. AI agents execute tests on first run and cache every browser action to Redis. Subsequent runs replay cached actions at native Playwright speed with zero LLM calls. When UI changes cause a cached action to fail, the AI automatically re-discovers the correct interaction and updates the cache. Passmark includes built-in email testing, cross-test state management, dynamic test data generation, and multi-model consensus assertions. It requires Node.js 18+, Playwright 1.57+, Redis, and API keys for Anthropic and Google AI.

Team

Built by the team behind Bug0 and Hashnode.

Passmark is the open-source core of Bug0, an AI-native QA platform trusted by 200+ engineering teams. Built by Fazle Rahman and Sandeep Panda, who previously created Hashnode, a developer blogging platform with millions of monthly readers.

Backed by Accel, Guillermo Rauch (Vercel), and Naval Ravikant.

Star on GitHub

Go managed

Love the engine? Let someone else run it.

Passmark is the open-source core. Bug0 Managed is the service built on top of it. A dedicated QA engineer writes your tests, triages every failure, and gates your releases. Same engine. Zero maintenance on your end.

Open-source

Passmark

Test enginePassmark
Who writes testsYou
Who triages failuresYou
InfrastructureYours
PriceFree

Get Started

Service

Bug0 Managed

Test enginePassmark
Who writes testsDedicated QA engineer
Who triages failuresHuman-verified reports
InfrastructureBug0 AI agents for testing
PriceFrom $2,500/mo

Learn more at bug0.com