AI Code Review: From Diff to Production Confidence

Why AI Changes Code Review

Traditional code review has a structural problem: reviewers see finished code, not the decisions made along the way. By the time a PR lands in someone's queue, the author has already forgotten half the ambiguity they navigated. The reviewer has no context about what was considered and rejected.

AI-assisted review works best earlier in that cycle — before the PR, when you still remember the choices you made and can give the model exactly the context it needs to find what you missed.

The Core Shift

Use AI to review your own work first. By the time a human reviewer sees your PR, you've already caught the obvious issues — and the review conversation starts at a higher level.

This guide treats AI review as a structured checklist, not a magic wand. You give it the right context, ask specific questions, and interpret the results. It covers six areas where AI adds genuine value: self-review, regression risk, security, test gaps, and review comment quality.

Part 1: The Self-Review Habit

Most developers skip self-review because it feels redundant — you just wrote the code, you know what it does. But that's the problem. You know what you intended. AI reads what's actually there.

The workflow is simple: run git diff main, paste the output, give it the task description, and ask for a fresh read. The constraint is context window — for larger PRs, review by file or logical chunk rather than dumping the entire diff at once.

Self-review prompt

I'm about to open a PR. Here's what I was trying to do: [describe the task in one paragraph].

Here's the diff:

[paste git diff output]

Review this as a careful senior engineer. Flag anything that looks incomplete, risky, or likely to break under non-obvious conditions. Don't summarize what the code does — focus on what could go wrong.

What to expect

The model will surface edge cases you skipped, implicit assumptions, missing error handling, and places where the intent and the implementation diverge. Take everything seriously but verify before acting — AI occasionally flags correct code as suspicious.

What to ask about specifically

Error paths: Does every failure mode produce a useful error, or does it silently succeed?
Null and empty states: What happens if the list is empty, the user is unauthenticated, or the API returns nothing?
Naming: Does the name of this function still match what it actually does?
Leftovers: Debug logging, commented-out code, TODO comments that were meant to be temporary.
Scope creep: Did you change anything that wasn't in the task? Flag it, document it, or revert it.

Keep a self-review template

Create a short prompt template that includes your project conventions — framework, error handling patterns, naming rules. Paste it at the start of every self-review session so the model knows what "correct" looks like for your codebase.

Part 2: Giving AI Enough Context

AI review fails when it has to guess. The model doesn't know your codebase, your team conventions, or why this change is happening. The more of that context you supply, the more specific and useful the feedback will be.

What context to include

The task: What was the PR meant to accomplish? One paragraph is enough.
The constraints: Were there specific things you had to avoid? Performance requirements? Backwards-compatibility concerns?
The key files: For large PRs, include the interfaces or types that the changed code depends on — even if they weren't modified.
Your conventions: How does your team handle errors? What's your logging standard? What patterns are preferred over others?

Context:
- Framework: Express + TypeScript, Node 22
- Error handling: always throw typed errors that extend AppError; never swallow exceptions silently
- Auth: all protected routes use requireAuth() middleware; permissions checked via can(user, action, resource)
- Database: Prisma ORM; always use transactions for multi-table writes
- This PR adds a password reset flow. It was not in the original spec — we added it after user feedback.

Diff:
[paste diff here]

When to break up the diff

For PRs with more than 200–300 lines of changes, reviewing the full diff in one shot produces generic feedback. A better approach:

Review each file independently with its local context.
Then ask separately: "Looking at these files together, what integration risks do you see?"
Finally, ask about the PR as a whole: "Is there anything about the approach itself that concerns you?"

Part 3: Catching Regression Risk

The hardest bugs to catch in review are the ones that don't show up in the changed code — they show up in code that calls it. A function signature change, a renamed field, a reordered parameter: each one can silently break callers the reviewer never thought to check.

AI is good at spotting these risks when you ask directly. The key is to frame the question around downstream impact, not just local correctness.

Regression risk prompt

Here's the diff for a function that's called in several places across the codebase:

[paste diff]

What are the most likely ways this change could break callers that weren't modified? Look for: changed return shapes, removed fields, new required parameters, changed error types, altered side effects, or changed timing/async behavior.

High-risk patterns to always check

Function signature changes: New parameters (especially required ones), reordered parameters, changed return types.
Database schema changes: Renamed columns, dropped fields, altered constraints — will existing queries still work?
Event or message shape changes: If the code emits events, publishes to a queue, or calls webhooks, what consumes them?
Environment dependencies: New environment variables, changed config keys, new external services that aren't in the local setup.
Async behavior changes: Moving from synchronous to async, adding or removing awaits, changed Promise resolution order.

The Invisible Caller Problem

AI can only see what you show it. If callers exist in parts of the codebase you didn't paste, the model will tell you it can't assess impact. That's the right answer — take it as a cue to grep for the changed symbol and check those files manually.

Using AI to find affected callers

Before running an AI review of a shared function, find all callers first. Run a search for the function name, collect the relevant call sites, and paste them alongside the diff. The model can then give you a concrete list of what might break rather than a generic warning.

The function `processOrder` is being changed. Here are the places in the codebase that call it:

--- src/api/orders.ts (line 84) ---
const result = await processOrder(userId, cart, { notify: true });

--- src/jobs/retry-failed.ts (line 31) ---
await processOrder(order.userId, order.cart);

--- src/admin/manual-orders.ts (line 17) ---
processOrder(userId, cart, options).catch(logger.error);

Here's the diff for processOrder:
[paste diff]

Which callers are likely to break, and how?

Part 4: Security-Sensitive Changes

AI is not a security scanner. It doesn't have access to your full dependency tree, it can't run static analysis, and it won't catch everything. But it's genuinely useful for a fast first pass on the categories of security issues that most commonly appear in application code.

The key is to ask about specific categories rather than asking generically whether the code is "secure". Generic questions produce generic answers.

Security review prompt

Review this diff for security issues. Focus specifically on:

Input validation: is user-controlled data validated before use?
Authorization: are permissions checked before accessing or modifying data?
SQL/query injection: is any user input interpolated into queries?
Secrets exposure: are any credentials, tokens, or keys visible in responses or logs?
Insecure direct object references: can a user access another user's data by changing an ID?

[paste diff]

Categories worth a dedicated pass

Authentication changes: New login flows, token handling, session management, or "remember me" features. These are high-stakes and should always get a separate review pass.
New API endpoints: Does every endpoint check the caller's identity and permissions? Are there rate limits?
File uploads or user-generated content: Where does the file go? Is the type validated? Can path traversal reach sensitive locations?
New dependencies: If the diff adds a new package, check it briefly — name, maintainer, download count. Supply chain issues often start with a suspicious new import.
Logging changes: Is anything being logged that shouldn't be — passwords, tokens, full request bodies, PII?

Scope the security question to the change

Asking "is this code secure?" for a diff that adds a button label is noise. Reserve security review prompts for diffs that touch auth, data access, input handling, external calls, or user-generated content. AI review works best when the question matches the risk surface.

Part 5: Generating Test Ideas from a Diff

The best time to write tests is when you understand the risks in the code. A diff encodes exactly that information: here's what changed, here's what could go wrong. AI can read a diff and produce a specific list of test scenarios — not generic ones, but cases derived from what this change actually does.

The key difference from "write tests for this code" is that you're asking for risk-driven test ideas, not coverage-driven ones. Coverage can be satisfied by testing the happy path twice. Risk-driven tests target the places most likely to fail in production.

Test ideas prompt

Here's a diff for a change to the checkout flow:

[paste diff]

What test cases should exist that probably don't yet? Focus on: edge cases in the new logic, error paths that could be silently ignored, race conditions or timing issues, behavior at the boundaries of any new conditions, and scenarios where the before-and-after behavior differs subtly.

Give me a list of test descriptions, not code. I'll write them myself.

What to expect

You'll get a list like: "Empty cart at checkout start", "Payment succeeds but inventory update fails", "User submits twice before first response", "Discount code applied to already-discounted item". These are concrete test titles you can take straight to your test file.

Turning test ideas into tests

Once you have the list, a second prompt can write the tests — but the list itself is the valuable output. Review it first. Delete any that don't reflect real risk. Add any that the model missed. Then ask for the implementation with the curated list as input.

Here are the test scenarios I want to cover:

1. Checkout with empty cart returns 400 with message "Cart is empty"
2. Payment API returns timeout — order is not created, user sees retry option
3. Discount code applied twice to the same order is rejected on the second application
4. Concurrent checkouts from two sessions with the same cart item — only one succeeds

Write these as Jest integration tests for the checkout route at src/api/checkout.ts.
Use the existing test helpers in tests/helpers/db.ts and tests/helpers/auth.ts.

Pairing this approach with the Testing with AI guide gives you a complete workflow from diff to test suite.

Part 6: Writing Better Review Comments

AI is also useful on the other side of the review — when you're reviewing someone else's code and want to give feedback that's precise, constructive, and easy to act on.

The challenge with written review comments is tone. Text without voice reads harsher than intended, and vague comments ("this seems off") create work for the author without helping them understand what to fix. AI can help you turn an observation into a well-framed suggestion.

The three-part comment structure

Effective review comments tend to have three parts: what you noticed, why it matters, and a concrete suggestion. AI can help you fill in whichever part is hardest to articulate.

Comment framing prompt

I want to leave a review comment on this code:

const user = await db.users.findOne({ id: req.params.userId });
await updateUserProfile(user, req.body);

My concern is that there's no null check — if the user doesn't exist, updateUserProfile will throw an unhandled error. Help me write a comment that explains the risk clearly and suggests the fix without sounding critical of the author.

Example output

"Worth adding a null check here — if userId doesn't match an existing user, findOne returns null and updateUserProfile will throw. Something like if (!user) return res.status(404).json({ error: 'User not found' }) before the update call would handle it cleanly."

When to use AI for review comments

Complex technical issues where you want to explain the full reasoning without writing an essay.
Sensitive feedback where tone matters and you want to phrase something constructively.
Alternative approaches — "I'd consider X instead, here's why" is more useful than "this is wrong".
Documenting patterns — if the same issue appears in multiple places, AI can help you write a concise note that names the pattern and explains it once.

Prefer suggestions over corrections

Frame review comments as "I'd consider..." or "One option would be..." rather than "This is wrong" or "You should...". It invites discussion rather than creating defensiveness. AI defaults to this framing naturally — let it.

Prompt Library

Copy-paste prompts for the six scenarios above.

1. Self-review before opening a PR

I'm about to open a PR. Here's what I was trying to accomplish: [describe task].

Here's the diff:
[paste diff]

Review this as a careful senior engineer who hasn't seen this code before. Flag anything that looks incomplete, risky, or likely to break under non-obvious conditions. Don't summarize what the code does — focus on what could go wrong.

2. Context-first review (with conventions)

Review this diff with the following project context:
- [Framework + language + version]
- [Error handling convention]
- [Auth/permissions pattern]
- [Any other relevant rules]

Task this PR is solving: [describe]

Diff:
[paste diff]

Flag issues, edge cases, and deviations from the conventions above.

3. Regression risk — callers

The function [name] is being changed. Here are its call sites:

[paste relevant callers with file and line]

Here's the diff:
[paste diff]

Which callers are likely to break, and how? Look for: changed signatures, altered return shapes, removed fields, new required parameters, changed error types, altered async behavior.

4. Security pass

Review this diff for security issues. Focus on:
- Input validation: is user-controlled data validated before use?
- Authorization: are permissions checked before data access or modification?
- Query injection: is any user input interpolated into queries or commands?
- Secrets exposure: could credentials or tokens appear in responses or logs?
- Insecure direct object references: can a user access another user's data by changing an ID?

Diff:
[paste diff]

5. Test ideas from a diff

Here's a diff for [describe what changed]:
[paste diff]

What test cases should probably exist that likely don't yet? Focus on: edge cases in the new logic, error paths that could be silently swallowed, race conditions, boundary conditions, and scenarios where before/after behavior differs subtly.

Give me test descriptions only — not code.

6. Write a constructive review comment

Help me write a code review comment for this snippet:
[paste code]

My concern: [describe what you noticed and why it matters].

Write the comment so it clearly explains the risk and suggests a fix, without sounding critical of the author. Keep it concise — one short paragraph.

7. Full PR review pass

Perform a complete review of this PR.

Task: [describe]
Key constraints: [list any]

Diff:
[paste diff]

Structure your response as:
1. Summary — what does this change actually do?
2. Issues — anything broken, risky, or incomplete (be specific)
3. Test gaps — what scenarios aren't covered?
4. Minor notes — style, naming, small suggestions
5. Verdict — ready to merge / needs changes / needs discussion