Before AI implementation, define acceptance criteria in plain language: required behavior, non-goals, edge cases, verification steps, and stop conditions. Review then becomes objective: criteria met or not met.
Why Reviews Drift Without Criteria
AI can produce code quickly, but speed can hide ambiguity. If the task is "improve checkout" or "clean up auth," each reviewer interprets success differently.
The result is predictable: long comment threads, mid-review scope changes, and merges that technically pass tests but miss the real outcome users needed.
Acceptance criteria gives you one contract before implementation. The model knows what must be true. The reviewer knows what to check. The team knows when to stop.
What Good Acceptance Criteria Looks Like
- Outcome: one specific behavior that will be true after the change.
- Non-goals: what this task must not change.
- Edge cases: at least two failure or boundary scenarios.
- Verification: exact command(s) or manual path(s) to validate.
- Stop condition: clear rule for when implementation is complete.
Keep criteria short enough to scan. If the list is too broad, split the task before AI writes code.
"Done when tests pass" is not an acceptance criterion. It is a verification method. Criteria should describe behavior and boundaries, not only tooling.
A Reusable Prompt Template
"Before editing code, write acceptance criteria for this task. Include: required outcome, non-goals, at least two edge cases, exact verification steps, and stop condition. Keep it under 10 bullets. Ask for missing context if any criterion cannot be made testable. Do not write code yet."
This prompt forces the model to expose uncertainty early. If criteria cannot be made testable, you have found missing context before any diff exists.
Worked Example: Profile Email Update
Vague request
"Let users change email from settings."
Acceptance criteria version
- Outcome: signed-in users can submit a new email and receive a visible success state.
- Non-goals: no changes to password reset flow, session model, or verification policy.
- Edge case 1: duplicate email shows server error message without clearing form state.
- Edge case 2: malformed email fails client-side validation with existing UI pattern.
- Verification: run account settings tests and perform three manual checks (valid, duplicate, malformed).
- Stop condition: all checks pass and no files outside settings/API boundary were changed.
Now a review can fail for concrete reasons instead of taste. If the form state clears on duplicate email, criteria are unmet regardless of code style.
How Criteria Improves AI Collaboration
- Less scope drift: non-goals reduce opportunistic refactors during implementation.
- Faster reviews: reviewers map changes to explicit checks, not assumptions.
- Better handoffs: another engineer can continue from criteria without replaying prior context.
- Safer merges: stop conditions block "almost done" merges that create hidden cleanup debt.
Criteria does not slow teams down. It removes rework loops caused by unclear task boundaries.
Use Criteria During Review
Once the diff exists, keep the review anchored to the original criteria. Do not ask "Do I like this implementation?" first. Ask whether the change proves the agreed behavior without crossing the agreed boundaries.
- Behavior: does the diff make the required outcome true in the product, API, or workflow?
- Boundaries: did the assistant avoid the listed non-goals and unrelated cleanup?
- Evidence: do the verification notes cover each edge case, not only the happy path?
If any answer is weak, ask the assistant for a smaller follow-up that targets the missing criterion. That keeps review comments tied to observable gaps instead of broad preference.
Combine It With Related Practices
Acceptance criteria works best together with The AI Plan Before Code, The AI Change Budget, The AI Verification Ladder, The AI Code Ownership Checklist, and AI PR Review Checklist Template. Plan defines approach, change budget defines scope, criteria defines done, verification defines confidence, the ownership checklist covers what to inspect at merge time, and the PR template turns all of it into a repo-level artifact.
If your team debates whether a diff is "good enough," you likely have a criteria problem, not a coding problem.