The AI Verification Ladder: Confidence Levels Before Merge

TL;DR

Treat verification as levels, not vibes. Tiny mechanical edits can pass with static checks. Behavior changes need targeted tests. Cross-boundary or high-impact changes need integration evidence and rollback confidence. Define the minimum level before AI writes code.

Why Teams Miss Risk With AI Diffs

AI often makes code look finished quickly: neat structure, comments, and plausible tests. That visual polish can trick review into moving too fast.

The real question is not "Does this look good?" but "What confidence level does this change require?" Without that frame, teams under-verify risky diffs and over-verify harmless ones, which is the worst of both worlds.

The Five Levels of the Verification Ladder

Level 0 to Level 4

Level 0 - Static only: lint, typecheck, formatting, and build must pass.
Level 1 - Local behavior check: one focused test or manual path proves intended behavior.
Level 2 - Boundary check: verify API contracts, data shapes, permissions, and error handling at interfaces.
Level 3 - Regression sweep: related flows and edge cases are exercised, not only the happy path.
Level 4 - Release confidence: rollback plan, observability hooks, and post-deploy checks are ready.

Not every change needs Level 4. But every change should have a declared minimum level before implementation starts.

What evidence looks like

Level 0: command output from lint, typecheck, test discovery, or build.

Level 1: one named path exercised locally, with the expected result written down.

Level 2: proof that the changed boundary still accepts, rejects, and reports data correctly.

Level 3: a focused regression list that includes at least one failure or edge case.

Level 4: release notes for rollback, monitoring, owner, and the first post-deploy check.

How to Choose the Minimum Level

Use the smallest level that still covers impact. A typo fix in user copy might stop at Level 0. A DB query change with billing impact usually starts at Level 3 and may require Level 4.

Fast decision rule

One file, no behavior change: Level 0-1

Behavior change in one layer: Level 2

Cross-layer change (UI/API/DB/auth): Level 3

User data, money, or access control: Level 4

This simple mapping prevents arguments after the diff appears. You already know what evidence is required before merge.

A Prompt You Can Reuse

Copy-paste prompt

"Before editing files, assign this task a verification level from 0 to 4 and explain why. Then propose the smallest implementation that can pass that level. Include exact validation steps and stop after the plan. Do not write code yet."

This forces the assistant to reason about risk first, not after it has already produced a broad patch.

Worked Example: Email Update Flow

Weak verification (common)

"Tests pass locally, looks good."

Ladder-based verification

Level selected: 3 (cross-layer behavior change).
Level 0: lint, types, and build pass.
Level 1: manual check that form keeps state on server error.
Level 2: confirm API returns consistent error shape for duplicate email.
Level 3: run related account tests and a regression pass for verification email copy and session display.

Same feature, much clearer confidence. The ladder turns "seems fine" into explicit evidence.

Common Failure Patterns the Ladder Catches

Level skipping: jumping from compile success directly to merge.
Happy-path bias: only testing success, not failures and empty states.
Boundary blindness: verifying UI text while missing API/auth/data contract drift.
No rollback story: shipping risky changes without a reversible path.

Practical rule

If review discussion starts with opinions, return to the declared verification level and ask: which required evidence is still missing?

Pair It With Scope Control

The ladder works best with small diffs. If a change cannot realistically pass its target level in one review, split it. For example: schema safety first, behavior second, cleanup third.

For adjacent practices, combine this with The AI Change Budget, The AI Plan Before Code, and The AI Code Ownership Checklist. Together they define scope, intent, and confidence before merge.

→

Back to Home