Testing with AI — AI Programming Manual

Why Testing Is Different with AI

Testing is where most developers shortcut — not because they don't value it, but because writing tests is tedious. Describing what should happen, again, in a different syntax, for every path through every function. It's the part that feels like paperwork.

AI removes that friction almost entirely. Given a function, AI can generate a full test suite in seconds. Given a description of what could go wrong, it generates edge cases you wouldn't have thought of. Given a failing test, it explains why it's failing and suggests a fix.

The result isn't just faster testing. It's better testing — because the cost of writing an additional test case drops to near zero, developers stop skipping the awkward paths. The coverage improves not because it's enforced, but because it's no longer a burden.

The Core Shift

With AI, the bottleneck in testing moves from writing tests to reviewing tests. Your job becomes checking that the AI understood the intent — not typing boilerplate. That's a much better use of your judgment.

Part 1: Unit Tests

Unit tests are AI's strongest suit. The context is small and self-contained: a function goes in, a test suite comes out. The pattern is so reliable that it should be the first thing you reach for.

The Basic Prompt

Start with the simplest possible ask: paste the function, ask for tests.

You

Write unit tests for this function using Vitest:

export function slugify(text: string): string {
  return text
    .toLowerCase()
    .trim()
    .replace(/[^\w\s-]/g, '')
    .replace(/[\s_-]+/g, '-')
    .replace(/^-+|-+$/g, '');
}

Cover: normal inputs, edge cases, and anything that could realistically go wrong.

AI produces a complete suite. Here's what good output looks like:

import { describe, it, expect } from 'vitest';
import { slugify } from './slugify';

describe('slugify', () => {
  // Normal inputs
  it('converts a basic phrase to a slug', () => {
    expect(slugify('Hello World')).toBe('hello-world');
  });

  it('handles multiple words', () => {
    expect(slugify('The Quick Brown Fox')).toBe('the-quick-brown-fox');
  });

  // Whitespace handling
  it('trims leading and trailing whitespace', () => {
    expect(slugify('  hello world  ')).toBe('hello-world');
  });

  it('collapses multiple spaces into one hyphen', () => {
    expect(slugify('hello   world')).toBe('hello-world');
  });

  // Special characters
  it('removes special characters', () => {
    expect(slugify('Hello, World!')).toBe('hello-world');
  });

  it('removes characters that are not word chars, spaces, or hyphens', () => {
    expect(slugify('café & crème')).toBe('caf-crme');
  });

  // Hyphen handling
  it('replaces underscores with hyphens', () => {
    expect(slugify('hello_world')).toBe('hello-world');
  });

  it('removes leading and trailing hyphens', () => {
    expect(slugify('-hello world-')).toBe('hello-world');
  });

  it('collapses consecutive hyphens', () => {
    expect(slugify('hello---world')).toBe('hello-world');
  });

  // Edge cases
  it('returns empty string for empty input', () => {
    expect(slugify('')).toBe('');
  });

  it('returns empty string for whitespace-only input', () => {
    expect(slugify('   ')).toBe('');
  });

  it('returns empty string for special characters only', () => {
    expect(slugify('!@#$%')).toBe('');
  });

  it('handles a single word', () => {
    expect(slugify('Hello')).toBe('hello');
  });

  it('preserves numbers', () => {
    expect(slugify('chapter 1 intro')).toBe('chapter-1-intro');
  });
});

Why "anything that could realistically go wrong" matters

Without that phrase, AI generates happy-path tests and a token edge case. With it, you get empty strings, Unicode characters, whitespace-only inputs, and consecutive delimiters — the cases that surface real bugs.

Generating Edge Cases Explicitly

For complex functions, ask for edge cases as a separate step before writing tests. This surfaces assumptions you didn't know you had.

You

Before writing tests for this function, list every edge case you can think of — inputs that might behave unexpectedly, boundary conditions, and failure modes. Don't write code yet.

export function calculateDiscount(
  price: number,
  quantity: number,
  couponCode?: string
): number

A good AI response surfaces cases like: negative price, zero quantity, floating point precision at exact tier boundaries, an empty string vs. undefined coupon, expired coupons if the function checks dates, integer overflow for very large quantities. Many of these you'd miss in a first pass.

Review the List, Then Generate Tests

The two-step approach — edge cases first, code second — keeps you in control of what matters. You can delete irrelevant cases, add business-specific ones, and make the list your test plan before a line of test code exists.

Testing Pure vs. Impure Functions

Pure functions (same input → same output, no side effects) test cleanly with the pattern above. Impure functions — ones that read files, make network calls, write to a database — need mocks. Tell AI which category you're in:

You

Write Vitest unit tests for this function. It calls an external service — mock sendEmail so tests don't send real emails. Verify it's called with the right arguments.

import { sendEmail } from './email';

export async function notifyUserOnSignup(email: string, name: string) {
  if (!email.includes('@')) throw new Error('Invalid email');
  await sendEmail({
    to: email,
    subject: 'Welcome to the platform',
    body: `Hi ${name}, thanks for signing up!`,
  });
}

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { notifyUserOnSignup } from './notify';
import * as emailModule from './email';

describe('notifyUserOnSignup', () => {
  beforeEach(() => {
    vi.restoreAllMocks();
  });

  it('calls sendEmail with correct arguments', async () => {
    const sendEmailSpy = vi.spyOn(emailModule, 'sendEmail').mockResolvedValue(undefined);

    await notifyUserOnSignup('user@example.com', 'Alice');

    expect(sendEmailSpy).toHaveBeenCalledOnce();
    expect(sendEmailSpy).toHaveBeenCalledWith({
      to: 'user@example.com',
      subject: 'Welcome to the platform',
      body: 'Hi Alice, thanks for signing up!',
    });
  });

  it('throws on invalid email', async () => {
    await expect(notifyUserOnSignup('notanemail', 'Alice'))
      .rejects.toThrow('Invalid email');
  });

  it('does not call sendEmail when email is invalid', async () => {
    const sendEmailSpy = vi.spyOn(emailModule, 'sendEmail').mockResolvedValue(undefined);

    await expect(notifyUserOnSignup('notanemail', 'Alice')).rejects.toThrow();
    expect(sendEmailSpy).not.toHaveBeenCalled();
  });

  it('propagates errors from sendEmail', async () => {
    vi.spyOn(emailModule, 'sendEmail').mockRejectedValue(new Error('SMTP timeout'));

    await expect(notifyUserOnSignup('user@example.com', 'Alice'))
      .rejects.toThrow('SMTP timeout');
  });
});

Part 2: Integration Tests

Integration tests verify that multiple pieces work together — a route handler reading from a real database, an API endpoint returning the right status code, a service composing two repositories. They catch the bugs unit tests miss.

They're also the tests developers skip most often, because they require setup. AI makes that setup fast.

Testing an API Route

The most common integration test pattern: HTTP request in, database state or response body out. Give AI the route, the database schema, and your test framework:

You

Write integration tests for this Express route using Vitest and Supertest. Use an in-memory SQLite database for tests — set it up fresh before each test, tear it down after.

Route: POST /api/users/register

Behavior:

Creates a user with email and hashed password
Returns 201 with { id, email, createdAt }
Returns 400 if email is already registered
Returns 422 if email format is invalid
Returns 422 if password is under 8 characters

Schema: [paste users table schema]

import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import request from 'supertest';
import Database from 'better-sqlite3';
import { createApp } from '../src/app';
import { migrate } from '../src/db/migrations';

let db: Database.Database;
let app: ReturnType<typeof createApp>;

beforeEach(() => {
  db = new Database(':memory:');
  migrate(db);
  app = createApp(db);
});

afterEach(() => {
  db.close();
});

describe('POST /api/users/register', () => {
  const validPayload = {
    email: 'alice@example.com',
    password: 'securepassword',
  };

  it('creates a user and returns 201 with user data', async () => {
    const res = await request(app).post('/api/users/register').send(validPayload);

    expect(res.status).toBe(201);
    expect(res.body).toMatchObject({
      email: 'alice@example.com',
    });
    expect(res.body.id).toBeDefined();
    expect(res.body.createdAt).toBeDefined();
    expect(res.body.password).toBeUndefined(); // never leak password
  });

  it('returns 400 when email is already registered', async () => {
    await request(app).post('/api/users/register').send(validPayload);
    const res = await request(app).post('/api/users/register').send(validPayload);

    expect(res.status).toBe(400);
    expect(res.body.error).toMatch(/already/i);
  });

  it('returns 422 when email format is invalid', async () => {
    const res = await request(app)
      .post('/api/users/register')
      .send({ email: 'notanemail', password: 'securepassword' });

    expect(res.status).toBe(422);
    expect(res.body.error).toBeDefined();
  });

  it('returns 422 when password is too short', async () => {
    const res = await request(app)
      .post('/api/users/register')
      .send({ email: 'alice@example.com', password: 'short' });

    expect(res.status).toBe(422);
    expect(res.body.error).toBeDefined();
  });

  it('does not store plaintext password in database', async () => {
    await request(app).post('/api/users/register').send(validPayload);

    const user = db.prepare('SELECT * FROM users WHERE email = ?').get(validPayload.email) as any;
    expect(user.password_hash).not.toBe(validPayload.password);
    expect(user.password_hash).toMatch(/^\$2[aby]\$/); // bcrypt prefix
  });
});

In-Memory Databases for Tests

Using new Database(':memory:') gives you a fresh, isolated database for every test. No cleanup scripts, no shared state, no order-dependent failures. Each test is a clean slate. For PostgreSQL, the equivalent is creating a test schema per test run and dropping it after.

Writing Test Helpers with AI

Repeated setup code belongs in a helper. Ask AI to generate it:

You

Create a test helper file for our API tests. It should export:

createTestDb() — returns a fresh in-memory SQLite db with migrations run
createTestApp(db) — returns the Express app wired to the test db
registerUser(app, overrides?) — registers a default test user, returns the response body
loginUser(app, email, password) — logs in and returns the JWT token
authHeader(token) — returns the Authorization header object

import Database from 'better-sqlite3';
import request from 'supertest';
import { createApp } from '../src/app';
import { migrate } from '../src/db/migrations';

export function createTestDb() {
  const db = new Database(':memory:');
  migrate(db);
  return db;
}

export function createTestApp(db: Database.Database) {
  return createApp(db);
}

const defaultUser = {
  email: 'test@example.com',
  password: 'testpassword123',
  name: 'Test User',
};

export async function registerUser(
  app: Express.Application,
  overrides: Partial<typeof defaultUser> = {}
) {
  const payload = { ...defaultUser, ...overrides };
  const res = await request(app).post('/api/users/register').send(payload);
  return res.body;
}

export async function loginUser(
  app: Express.Application,
  email = defaultUser.email,
  password = defaultUser.password
): Promise<string> {
  const res = await request(app).post('/api/auth/login').send({ email, password });
  return res.body.token;
}

export function authHeader(token: string) {
  return { Authorization: `Bearer ${token}` };
}

With helpers in place, subsequent test files become clean and expressive:

import { beforeEach, afterEach, describe, it, expect } from 'vitest';
import request from 'supertest';
import { createTestDb, createTestApp, registerUser, loginUser, authHeader } from './helpers';

let db, app, token;

beforeEach(async () => {
  db = createTestDb();
  app = createTestApp(db);
  await registerUser(app);
  token = await loginUser(app);
});

afterEach(() => db.close());

describe('POST /api/bookmarks', () => {
  it('creates a bookmark for the authenticated user', async () => {
    const res = await request(app)
      .post('/api/bookmarks')
      .set(authHeader(token))
      .send({ url: 'https://example.com', title: 'Example' });

    expect(res.status).toBe(201);
    expect(res.body.data.url).toBe('https://example.com');
  });

  it('returns 401 without a token', async () => {
    const res = await request(app)
      .post('/api/bookmarks')
      .send({ url: 'https://example.com', title: 'Example' });

    expect(res.status).toBe(401);
  });
});

Part 3: TDD with AI

Test-Driven Development — write the test before the implementation — pairs surprisingly well with AI. The classic complaint about TDD is that writing tests for non-existent code is mentally exhausting. With AI, you describe the behavior in a prompt, get the test, and then ask for the implementation that passes it.

The Red-Green-Refactor Loop with AI

Here's the workflow. You're building a rate limiter:

Step 1 — Describe the behavior, get the test:

You

Write a Vitest test for a RateLimiter class that doesn't exist yet. It should:

Accept a limit (max requests) and a window (time in ms)
Track requests per key (e.g., IP address)
Return { allowed: true } when under the limit
Return { allowed: false, retryAfter: number } when over the limit
Reset the count after the window expires

Use vi.useFakeTimers() to control time. Don't implement the class yet — just the tests.

import { describe, it, expect, beforeEach, vi, afterEach } from 'vitest';
import { RateLimiter } from './rateLimiter';

describe('RateLimiter', () => {
  beforeEach(() => vi.useFakeTimers());
  afterEach(() => vi.useRealTimers());

  it('allows requests under the limit', () => {
    const limiter = new RateLimiter({ limit: 3, windowMs: 1000 });

    expect(limiter.check('user-1')).toEqual({ allowed: true });
    expect(limiter.check('user-1')).toEqual({ allowed: true });
    expect(limiter.check('user-1')).toEqual({ allowed: true });
  });

  it('blocks when the limit is reached', () => {
    const limiter = new RateLimiter({ limit: 2, windowMs: 1000 });

    limiter.check('user-1');
    limiter.check('user-1');
    const result = limiter.check('user-1');

    expect(result.allowed).toBe(false);
    expect((result as any).retryAfter).toBeGreaterThan(0);
  });

  it('tracks different keys independently', () => {
    const limiter = new RateLimiter({ limit: 1, windowMs: 1000 });

    expect(limiter.check('user-1')).toEqual({ allowed: true });
    expect(limiter.check('user-2')).toEqual({ allowed: true }); // different key
  });

  it('resets count after the window expires', () => {
    const limiter = new RateLimiter({ limit: 2, windowMs: 1000 });

    limiter.check('user-1');
    limiter.check('user-1');
    expect(limiter.check('user-1').allowed).toBe(false);

    vi.advanceTimersByTime(1001);

    expect(limiter.check('user-1')).toEqual({ allowed: true });
  });

  it('returns retryAfter in milliseconds', () => {
    const limiter = new RateLimiter({ limit: 1, windowMs: 5000 });

    limiter.check('user-1');
    vi.advanceTimersByTime(2000);
    const result = limiter.check('user-1') as { allowed: false; retryAfter: number };

    expect(result.allowed).toBe(false);
    expect(result.retryAfter).toBeLessThanOrEqual(3000);
    expect(result.retryAfter).toBeGreaterThan(0);
  });
});

Step 2 — Run the tests (they fail — that's correct).

Step 3 — Ask for the implementation that passes them:

You

Implement the RateLimiter class to make these tests pass: [paste tests above]

TypeScript. No external dependencies.

interface RateLimiterOptions {
  limit: number;
  windowMs: number;
}

interface RateLimiterEntry {
  count: number;
  windowStart: number;
}

type CheckResult =
  | { allowed: true }
  | { allowed: false; retryAfter: number };

export class RateLimiter {
  private store = new Map<string, RateLimiterEntry>();

  constructor(private options: RateLimiterOptions) {}

  check(key: string): CheckResult {
    const now = Date.now();
    const entry = this.store.get(key);

    if (!entry || now - entry.windowStart >= this.options.windowMs) {
      this.store.set(key, { count: 1, windowStart: now });
      return { allowed: true };
    }

    if (entry.count < this.options.limit) {
      entry.count++;
      return { allowed: true };
    }

    const retryAfter = this.options.windowMs - (now - entry.windowStart);
    return { allowed: false, retryAfter };
  }
}

Step 4 — Run the tests again. They pass. If they don't, paste the failure output back into AI and ask it to fix the implementation.

Tests as a Spec

The TDD loop with AI works because the tests are the spec. When you describe desired behavior clearly enough to write tests, you've already done the hard design work. The implementation becomes a filling-in exercise — which is exactly what AI is good at.

Part 4: Mocking and Isolation

Good tests are isolated — they test one thing and don't break because something unrelated changed. Mocking is what makes isolation possible. It's also the part developers spend the most time on. AI handles it well if you're explicit about what you're mocking and why.

Mocking Modules

You

Write tests for this function. Mock the ../db/userRepo module — I don't want a real database in these tests. Verify the function calls the repo with the right arguments and handles the case where the user isn't found.

import { userRepo } from '../db/userRepo';

export async function getUserProfile(userId: string) {
  const user = await userRepo.findById(userId);
  if (!user) throw new Error('User not found');
  return { id: user.id, name: user.name, email: user.email };
}

import { describe, it, expect, vi, beforeEach } from 'vitest';
import { getUserProfile } from './getUserProfile';

vi.mock('../db/userRepo', () => ({
  userRepo: {
    findById: vi.fn(),
  },
}));

import { userRepo } from '../db/userRepo';

describe('getUserProfile', () => {
  beforeEach(() => {
    vi.clearAllMocks();
  });

  it('returns formatted profile when user exists', async () => {
    const mockUser = { id: '123', name: 'Alice', email: 'alice@example.com', passwordHash: 'x' };
    vi.mocked(userRepo.findById).mockResolvedValue(mockUser);

    const result = await getUserProfile('123');

    expect(result).toEqual({
      id: '123',
      name: 'Alice',
      email: 'alice@example.com',
    });
  });

  it('calls userRepo.findById with the correct userId', async () => {
    vi.mocked(userRepo.findById).mockResolvedValue(
      { id: '456', name: 'Bob', email: 'bob@example.com', passwordHash: 'x' }
    );

    await getUserProfile('456');

    expect(userRepo.findById).toHaveBeenCalledWith('456');
  });

  it('throws when user is not found', async () => {
    vi.mocked(userRepo.findById).mockResolvedValue(null);

    await expect(getUserProfile('nonexistent')).rejects.toThrow('User not found');
  });

  it('does not expose passwordHash in the result', async () => {
    vi.mocked(userRepo.findById).mockResolvedValue(
      { id: '123', name: 'Alice', email: 'alice@example.com', passwordHash: 'secret' }
    );

    const result = await getUserProfile('123');

    expect(result).not.toHaveProperty('passwordHash');
  });
});

Generating a Full Mock from a Type

When you have a complex interface and need a realistic-looking fake for tests, paste the type and ask for a factory:

You

Write a createMockUser factory function for tests. It should return a valid user object matching this interface, with sensible defaults, and accept an optional overrides parameter:

interface User {
  id: string;
  email: string;
  name: string;
  role: 'admin' | 'member' | 'viewer';
  createdAt: Date;
  lastLoginAt: Date | null;
  isEmailVerified: boolean;
}

import { User } from '../src/types';

let idCounter = 0;

export function createMockUser(overrides: Partial<User> = {}): User {
  idCounter++;
  return {
    id: `user-${idCounter}`,
    email: `user${idCounter}@example.com`,
    name: `Test User ${idCounter}`,
    role: 'member',
    createdAt: new Date('2024-01-01T00:00:00Z'),
    lastLoginAt: null,
    isEmailVerified: true,
    ...overrides,
  };
}

// Reset between test runs if needed
export function resetFactoryCounters() {
  idCounter = 0;
}

Usage in tests becomes readable and explicit:

const admin = createMockUser({ role: 'admin', isEmailVerified: true });
const unverified = createMockUser({ isEmailVerified: false });
const neverLoggedIn = createMockUser({ lastLoginAt: null });

Part 5: Coverage and Review

Coverage tools tell you which lines were executed. They don't tell you whether the tests were any good. Use AI for both: generating tests toward coverage gaps, and reviewing what was generated.

Filling Coverage Gaps

You

Here's a function and its current tests. The coverage report shows lines 12–18 are uncovered (the error handling branch). Write additional tests that cover the missing lines.

[paste function + existing tests + coverage output]

This is more useful than chasing 100% coverage blindly. You're targeting specific gaps with tests that have a reason to exist — not padding for the metric.

Asking AI to Review Its Own Tests

AI-generated tests can have blind spots — testing implementation details, writing assertions that always pass, or missing the actual behavior under test. Build review into the workflow:

You

Review these tests critically. Look for:

Tests that would pass even if the implementation were completely wrong
Missing assertions (tests that call the function but don't assert enough about the output)
Tests that depend on implementation details instead of behavior
Edge cases that aren't covered
Duplicate tests that cover the same path

[paste generated tests]

A good AI response to this prompt will find at least one real issue in most generated test suites. Common findings: an assertion like expect(result).toBeDefined() that passes even when result is wrong; a test that mocks so much there's nothing real being tested; a missing assertion on the return value shape.

The Mutation Testing Prompt

For critical functions, ask AI to mentally mutate the implementation and check if the tests would catch it: "If I changed this condition from > to >=, which of these tests would fail? If none would, what test should I add?" This surfaces tests that provide false confidence.

Part 6: Prompt Patterns That Work

The quality of AI-generated tests depends heavily on the prompt. Here are the patterns that consistently produce better output.

Specify the Framework Explicitly

Don't assume AI knows your stack. Always name the framework and version if it matters:

// Good
"Write tests using Vitest and Testing Library."
"Write tests using Jest with ts-jest. We're on Jest 29."
"Write integration tests using Supertest. The app is Express 4."

// Too vague
"Write tests for this function."

Describe What Counts as Success

"Test this function" produces generic tests. "Test that it returns X when Y, throws Z when W, and calls Q with R" produces useful ones:

You

Write tests for applyPromoCode(cart, code). Success means:

Valid SAVE20 code applies a 20% discount to cart total
Discount is applied to items, not shipping
Already-discounted items don't get double-discounted
Expired codes throw PromoExpiredError
Unknown codes throw PromoNotFoundError
Codes are case-insensitive

Use "Given / When / Then" Language

Structured descriptions map cleanly to test cases:

You

Write tests in Given/When/Then structure:

Given an authenticated user, when they delete their own post, then the post is removed and they receive 204
Given an authenticated user, when they try to delete another user's post, then they receive 403
Given an unauthenticated request, when they try to delete any post, then they receive 401
Given an authenticated user, when they try to delete a post that doesn't exist, then they receive 404

Ask for a Test Plan Before the Tests

For complex areas, generate a list of what should be tested first, confirm it looks right, then ask for the code:

You

Before writing tests for the checkout function, list every scenario that should be tested. Include happy paths, failure modes, and edge cases. Don't write code yet.

Review the list. Add anything domain-specific the AI missed. Delete anything irrelevant. Then: "Now write the tests for items 1, 2, 3, 5, 7, and 9."

Testing as a First-Class Workflow

The developers who get the most from AI-assisted testing aren't the ones who use it to skip writing tests — they're the ones who use it to make testing genuinely fast for the first time. The result is codebases with better coverage, more edge cases caught, and a test suite that was actually written by someone who thought about behavior rather than just lines of code.

Write the prompt, review the tests, ship with confidence.

Testing with AI — Summary

Unit tests — Paste the function, ask for tests covering normal inputs, edge cases, and failure modes. Add "anything that could realistically go wrong" to get better coverage.
Edge cases first — For complex functions, ask for a list of edge cases before the tests. Review and prune before code is generated.
Mocking — Tell AI explicitly what to mock and what to verify. Describe the dependency, not just the function under test.
Integration tests — Provide the route, schema, and framework. Use in-memory databases for isolation. Extract helpers to keep tests clean.
TDD — Describe behavior, get tests, then ask for the implementation that passes them. Tests as spec is a natural fit for AI.
Review — Ask AI to critique its own tests. Look for assertions that always pass, missing return value checks, and implementation-detail coupling.
Specificity wins — Name the framework, describe what success looks like, use Given/When/Then language. Vague prompts produce generic tests.

→

Back to Home