AI Change Risk Matrix: Scope, Tests, Rollback Before Merge

TL;DR

Score AI changes by blast radius and reversibility. Low-risk edits need narrow checks. Cross-layer, data, money, auth, or irreversible changes need stronger tests, explicit rollback, and a named owner before merge.

Why AI Diffs Need a Risk Matrix

AI can make a risky change look calm: clean formatting, confident naming, a plausible test, and no obvious red flags. The problem is that visual quality is not production safety.

A risk matrix gives reviewers one shared question before style or taste: what could break if this is wrong, and how quickly can we recover?

The Matrix

Four risk bands

Low: one file, no behavior change, easy revert. Example: copy, comments, small cleanup.
Medium: behavior changes in one layer. Example: UI validation, one endpoint, focused bug fix.
High: cross-layer behavior or shared contracts. Example: UI plus API, auth checks, billing-adjacent logic.
Critical: user data, money, access control, migrations, or hard-to-reverse changes.

If a change fits two bands, choose the higher one. AI-generated changes often hide coupling in files the reviewer did not expect, so conservative classification saves time.

What Each Band Requires

Low risk

Scope: small diff, no hidden behavior change.

Tests: formatting, lint, typecheck, or build as appropriate.

Rollback: ordinary revert is enough.

Medium risk

Scope: one feature path or one layer.

Tests: one focused automated test or a documented manual pass/fail path.

Rollback: revert plan and confirmation that no data shape changed.

High risk

Scope: multiple layers, shared contracts, or permission-sensitive behavior.

Tests: boundary tests, regression checks, and at least one failure case.

Rollback: feature flag, config switch, or tested revert path.

Critical risk

Scope: user data, payments, access control, migrations, or irreversible side effects.

Tests: integration evidence, rollback rehearsal, monitoring, and owner sign-off.

Rollback: explicit runbook with data safety notes and first post-deploy check.

Fast Classification Questions

Blast radius: can this break one screen, one workflow, or many customers?
Reversibility: can we undo it with a normal revert, or does data/state persist?
Coupling: does it cross UI, API, database, auth, queue, or provider boundaries?
Visibility: would monitoring or support quickly show the failure?
Ownership: does one reviewer understand the affected path end to end?

Practical rule

If you cannot explain rollback in one sentence, the change is not low risk.

Worked Examples

Example 1: Button label change

Risk band: Low. The change is visible, reversible, and isolated. Run the relevant UI check or inspect the screen manually.

Example 2: New API validation rule

Risk band: Medium or High. If one endpoint rejects a new invalid input, medium may be enough. If existing clients might send that input, treat it as high and verify contract compatibility.

Example 3: Auth permission fix

Risk band: High. The fix may close access correctly while accidentally blocking legitimate users. Verify allowed, denied, unauthenticated, and wrong-role paths.

Example 4: Migration touching customer data

Risk band: Critical. Test migration up and down on realistic data, document rollback, and add a post-deploy query or dashboard check.

A Prompt You Can Reuse

Copy-paste prompt

"Before editing files, classify this task as low, medium, high, or critical risk. Explain blast radius, reversibility, coupling, required tests, and rollback. If the risk is high or critical, propose a smaller first change. Stop after the plan."

This pairs naturally with The AI Change Budget: the higher the risk band, the smaller the allowed AI diff should be.

Before You Merge

Low: reviewer confirms the diff is small and validation passed.
Medium: reviewer sees one behavior check and one edge case.
High: reviewer sees boundary evidence and rollback notes.
Critical: reviewer sees owner, monitoring, rollback runbook, and post-deploy check.

For more confidence, combine the matrix with The AI Verification Ladder, The AI Code Ownership Checklist, 15 Acceptance Criteria Examples for AI Coding Tasks, and AI Regression Test Plan Template for scoping what to actually test at each risk band.

→

Back to Home