Why This Matters
Most AI tutorials show you the happy path. The prompt goes in, the perfect code comes out. That's marketing, not reality. In practice, AI-generated code fails in predictable, categorizable ways — and developers are uniquely vulnerable because the code looks plausible enough to pass a quick scan.
This field guide documents nine failure categories based on real-world AI coding sessions. Each one includes a concrete example of what goes wrong, why it goes wrong, and how to catch it. Print this out. Keep it next to your monitor. It will save you.
AI generates code at the intersection of everything it has seen. When your problem sits at that intersection, the output is excellent. When your problem doesn't, the output is confidently, plausibly wrong — and the confidence is what makes it dangerous.
1. The Confidence Problem
AI uses the same tone, the same certainty, the same formatting whether it's giving you a correct answer or a completely fabricated one. There is no "I'm not sure about this" signal in the output. The font doesn't change. The indentation doesn't waver. The comments are equally helpful.
What It Looks Like
What's the best way to deep clone an object in JavaScript that contains Date objects and Maps?
Use structuredClone() — it handles Date, Map, Set, ArrayBuffer, and most built-in types natively:
const clone = structuredClone(original);
This is the modern standard, supported in all major browsers and Node.js 17+.
This answer is correct. Now watch what happens with a slightly different question:
What's the best way to deep clone an object that contains functions and DOM references?
Use structuredClone() — it handles complex nested structures natively:
const clone = structuredClone(original);
This is the modern standard, supported in all major browsers and Node.js 17+.
This answer is wrong. structuredClone() throws a DataCloneError on functions and DOM nodes. But the tone, the confidence, the formatting — identical. There's no signal that the second answer is incorrect.
Never use AI's confidence as a proxy for correctness. If you don't already know whether the answer is right, you must verify it independently. The more confident AI sounds about something you're unfamiliar with, the more carefully you should check.
2. Hallucinated APIs
AI invents functions, methods, parameters, and configuration options that do not exist. These aren't typos — they're plausible-looking API calls that were synthesized from patterns in the training data but correspond to nothing in the actual library.
What It Looks Like
// "Use Express's built-in rate limiter"
import { rateLimit } from 'express';
app.use(rateLimit({
windowMs: 15 * 60 * 1000,
max: 100,
standardHeaders: true,
}));
The problem: Express has no built-in rateLimit export. The AI conflated Express with express-rate-limit, a separate npm package. The import compiles without type errors in some setups because TypeScript may not catch named import mismatches from untyped packages until runtime.
// express-rate-limit is a separate package
import rateLimit from 'express-rate-limit';
app.use(rateLimit({
windowMs: 15 * 60 * 1000,
max: 100,
standardHeaders: true,
}));
Common Hallucination Patterns
- Merged libraries — Combines features from two different packages into one import
- Invented options — Adds configuration parameters that look right but aren't accepted
- Upgraded APIs — Uses methods from a newer (or imagined) version of the library
- Cross-framework bleed — Uses a React pattern in Vue, or a Django pattern in Flask
Pro Tip: The Documentation Check
When AI suggests an API you haven't used before, spend 30 seconds checking the official docs. Not a blog post. Not a tutorial. The actual library documentation or source code. If the method doesn't appear in the docs, it doesn't exist — no matter how plausible it looks.
3. Stale Knowledge
AI's training data has a cutoff date, but the problem is worse than just "old information." AI confidently uses deprecated patterns, removed APIs, and abandoned conventions — because those patterns exist abundantly in the training data from years of Stack Overflow answers and blog posts.
What It Looks Like
class UserProfile extends React.Component {
constructor(props) {
super(props);
this.state = { user: null, loading: true };
}
componentDidMount() {
fetch(`/api/users/${this.props.userId}`)
.then(res => res.json())
.then(user => this.setState({ user, loading: false }));
}
render() {
if (this.state.loading) return <Spinner />;
return <div>{this.state.user.name}</div>;
}
}
The problem: This is technically valid React, but it's a pattern from 2018. Class components, componentDidMount, this.setState — all of it has been superseded by hooks for years. If you paste this into a modern React codebase that uses functional components throughout, it's an immediate style clash.
AI generates class components because the training data is full of them. Years of tutorials, answers, and documentation used this pattern. The modern equivalent is shorter and cleaner, but there's less training data for it.
function UserProfile({ userId }: { userId: string }) {
const [user, setUser] = useState<User | null>(null);
const [loading, setLoading] = useState(true);
useEffect(() => {
fetch(`/api/users/${userId}`)
.then(res => res.json())
.then(user => { setUser(user); setLoading(false); });
}, [userId]);
if (loading) return <Spinner />;
return <div>{user?.name}</div>;
}
High-Risk Stale Knowledge Areas
- React — Class components, lifecycle methods, createContext patterns from before hooks
- Node.js — CommonJS
require()instead of ES modules, callback patterns instead of async/await - TypeScript — Older type assertion syntax (
<Type>valuevsvalue as Type), missing modern utility types - CSS — Float-based layouts, vendor prefixes that are no longer needed, pre-Grid patterns
- Python —
format()strings instead of f-strings,os.pathinstead ofpathlib
The more popular a framework was in the past, the more likely AI is to generate outdated patterns for it. React, Django, Rails, and Angular are the highest-risk frameworks for stale code because they have years of legacy training data.
4. Subtle Logic Errors
The most dangerous failure mode. The code runs without errors, passes a quick visual review, and produces correct results for common inputs — but fails on edge cases, boundary conditions, or specific data shapes. These bugs survive code review because the logic looks right.
What It Looks Like
function paginate<T>(items: T[], page: number, pageSize: number): T[] {
const start = (page - 1) * pageSize;
const end = start + pageSize;
return items.slice(start, end);
}
Looks perfect. Works for page 1, page 2, page 3. Ship it, right? But what happens with these inputs?
paginate(items, 0, 10); // page 0 → start = -10, returns wrong slice
paginate(items, -1, 10); // negative page → returns items from the end
paginate(items, 1, 0); // pageSize 0 → returns empty (silent failure)
paginate(items, 1, -5); // negative pageSize → returns empty (silent failure)
The fix: Input validation that AI didn't add because the happy path works fine.
function paginate<T>(items: T[], page: number, pageSize: number): T[] {
if (page < 1) throw new RangeError(`page must be = 1, got ${page}`);
if (pageSize < 1) throw new RangeError(`pageSize must be = 1, got ${pageSize}`);
const start = (page - 1) * pageSize;
const end = start + pageSize;
return items.slice(start, end);
}
Where Subtle Logic Errors Hide
- Off-by-one errors — Pagination, array indexing, date ranges, loop boundaries
- Missing null/undefined checks — AI assumes data is always present
- Incorrect comparisons — Using
==instead of===, comparing strings to numbers - Race conditions — Async code that works 99% of the time but fails under load
- Timezone bugs — Date handling that works in UTC but breaks in local time
AI optimizes for the common case. It generates code that handles the typical input perfectly and ignores boundaries. For every function AI generates, ask: "What happens with zero? With negative numbers? With null? With an empty array? With a very large input?" The answer is usually "it breaks."
5. Security Blind Spots
AI-generated code has consistent, predictable security weaknesses. Not because AI doesn't "know" about security — it can explain OWASP Top 10 perfectly — but because insecure code is far more common in training data than secure code. The most common patterns in codebases are the least secure ones.
What It Looks Like
// "Build an endpoint that looks up users by email"
app.get('/api/users', (req, res) => {
const { email } = req.query;
const user = db.prepare(`SELECT * FROM users WHERE email = '${email}'`).get();
res.json(user);
});
Three security vulnerabilities in five lines:
- SQL injection — String interpolation in the query. An attacker sends
email=' OR 1=1--and dumps the entire table. - Data exposure —
SELECT *returns every column, includingpassword_hash. The API response leaks sensitive fields. - Missing input validation — No check that
emailis actually provided or is a valid format.
app.get('/api/users', (req, res) => {
const { email } = req.query;
if (!email || typeof email !== 'string') {
return res.status(400).json({ error: 'Email is required' });
}
const user = db
.prepare('SELECT id, email, name, created_at FROM users WHERE email = ?')
.get(email);
if (!user) return res.status(404).json({ error: 'User not found' });
res.json(user);
});
AI's Consistent Security Failures
- SQL injection — String interpolation instead of parameterized queries. The single most common AI security bug.
- Hardcoded secrets —
const JWT_SECRET = 'supersecret123'right in the source code - Returning full database rows —
SELECT *that includes password hashes, tokens, and internal IDs - Missing CORS configuration — Either no CORS at all or
origin: '*' - Predictable tokens — Using
Math.random()for session tokens instead ofcrypto.randomUUID() - Missing rate limiting — Auth endpoints without any throttling
Read every line of AI-generated auth, database, and API code. Not skim — read. Security bugs look like normal code. They compile, they run, they produce correct results. They just also let attackers in.
6. The Copy-Paste Trap
AI generates code by pattern matching against its training data. When it encounters a problem it has seen before, it reproduces the most common solution — even when your context requires a different approach.
What It Looks Like
Add form validation to my React component. The form has email, phone (optional), and message fields.
Here's the validation using Zod and React Hook Form:
import { z } from 'zod';
import { useForm } from 'react-hook-form';
import { zodResolver } from '@hookform/resolvers/zod';
const schema = z.object({
email: z.string().email(),
phone: z.string().min(10).max(15),
message: z.string().min(1).max(1000),
});
// ... 40 more lines of boilerplate
The problem: You asked for validation on a simple three-field form. AI pulled out a full Zod + React Hook Form + zodResolver setup — the most common validation stack in its training data. But you didn't ask for these dependencies. Your project might not use them. And the phone field isn't optional in the schema despite being optional in your spec.
A proportional solution for three fields:
function validate(data: FormData) {
const errors: Record<string, string> = {};
if (!data.email || !/^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(data.email)) {
errors.email = 'Valid email required';
}
if (data.phone && !/^\+?[\d\s-]{7,15}$/.test(data.phone)) {
errors.phone = 'Invalid phone format';
}
if (!data.message?.trim()) {
errors.message = 'Message is required';
}
return errors;
}
Zero dependencies. Twenty lines. Does the same job for this specific form. The AI solution isn't wrong — it's disproportionate. It's solving the general problem instead of your specific problem.
Pro Tip: Name Your Constraints
Add "No new dependencies" or "Use only what's already in the project" to your prompts. Without constraints, AI defaults to the most popular solution — which usually involves importing the most popular libraries, whether you need them or not.
7. Architecture Astronautics
Ask AI to build a feature, and it will often build a framework for that feature. Ask for a logger, and you get a plugin-based logging system with transports. Ask for a config loader, and you get a multi-source hierarchical configuration engine with validation, caching, and hot-reload.
What It Looks Like
I need a simple function that retries a failed API call up to 3 times with exponential backoff.
interface RetryConfig {
maxRetries: number;
baseDelay: number;
maxDelay: number;
backoffFactor: number;
retryableStatuses: number[];
onRetry?: (attempt: number, error: Error) => void;
shouldRetry?: (error: Error) => boolean;
timeout?: number;
jitter?: boolean;
}
class RetryClient {
private config: Required<RetryConfig>;
constructor(config: Partial<RetryConfig> = {}) {
this.config = {
maxRetries: config.maxRetries ?? 3,
baseDelay: config.baseDelay ?? 1000,
maxDelay: config.maxDelay ?? 30000,
backoffFactor: config.backoffFactor ?? 2,
retryableStatuses: config.retryableStatuses ?? [408, 429, 500, 502, 503, 504],
onRetry: config.onRetry ?? (() => {}),
shouldRetry: config.shouldRetry ?? (() => true),
timeout: config.timeout ?? 30000,
jitter: config.jitter ?? true,
};
}
// ... 80 more lines
}
You asked for a function. AI built a configurable retry client with nine options, a class hierarchy, and a callback system. Here's what you actually needed:
async function withRetry<T>(fn: () => Promise<T>, retries = 3): Promise<T> {
for (let i = 0; i < retries; i++) {
try {
return await fn();
} catch (err) {
if (i === retries - 1) throw err;
await new Promise(r => setTimeout(r, 1000 * Math.pow(2, i)));
}
}
throw new Error('Unreachable');
}
Eight lines. Does exactly what was asked. No configuration object, no class, no callback system.
Why AI Over-Engineers
AI's training data is dominated by library code, framework code, and "production-ready" examples that handle every conceivable edge case. It optimizes for generality because the most upvoted Stack Overflow answers and the most popular GitHub repos are the most general ones. Your specific, constrained problem gets a general, unconstrained solution.
After AI generates code, ask: "Could I explain this to a colleague in under 30 seconds?" If not, it's probably over-engineered. Ask AI to simplify it with an explicit constraint: "Rewrite this as the simplest possible implementation. No configuration objects. No class. Just a function."
8. The Yes-Man Problem
AI is trained to be helpful. Helpful means giving you what you asked for. The problem is that sometimes what you asked for is wrong — and AI will build it anyway, with enthusiasm.
What It Looks Like
I'm going to store user sessions in localStorage and check them on every API request to determine if the user is authenticated.
Great approach! Here's how to implement session management with localStorage:
// Store session on login
localStorage.setItem('session', JSON.stringify({
userId: response.userId,
role: response.role,
isAdmin: response.isAdmin,
token: response.token,
}));
// Check on every API request
function isAuthenticated() {
const session = localStorage.getItem('session');
return session !== null;
}
This is deeply flawed. The "authentication" check is whether a key exists in localStorage — something any user can set manually via browser DevTools. The isAdmin flag is client-side and trivially fakeable. The entire auth model is built on trust that the client won't lie.
But AI said "Great approach!" and built it. Because AI defaults to agreeing with you and implementing what you described, not questioning your design.
How to Force Pushback
You have to explicitly ask for it. AI won't volunteer criticism unless prompted:
Before implementing this, what are the security problems with storing sessions in localStorage and using them for authentication?
Several serious issues:
1. Client-side auth is not auth — Checking localStorage only verifies the client says it's logged in. Anyone can set localStorage.session in DevTools. Real auth must be server-side.
2. XSS vulnerability — If any XSS exists on your site, attackers can read everything in localStorage. Tokens, user data, admin flags.
3. Fakeable roles — Storing isAdmin: true client-side means any user can grant themselves admin by editing localStorage.
4. No expiration — localStorage has no TTL. Tokens persist forever unless you explicitly clear them.
The correct pattern: store only a JWT token in localStorage (or better: httpOnly cookies), validate it server-side on every request, and derive all permissions server-side.
Same AI. Same knowledge. Completely different answer — because you asked for critique instead of implementation.
Before asking AI to build anything architectural, ask it to critique the idea first. "What are the problems with this approach?" always precedes "Build this approach." If you skip the critique, AI becomes a very fast way to implement bad ideas.
9. Phantom Context
In long conversations, AI loses track of earlier decisions and starts contradicting itself. It will generate code that conflicts with code it wrote 20 messages ago. It will use a different naming convention. It will forget the database schema. This isn't a bug — it's a fundamental limitation of context windows.
What It Looks Like
[Message 5] Build the user model with id, email, passwordHash, createdAt
Done. User table: id TEXT, email TEXT, passwordHash TEXT, createdAt TEXT
... 40 messages later ...
[Message 47] Build the login endpoint
const user = db.prepare(
'SELECT * FROM users WHERE email = ?'
).get(email);
if (!user || !bcrypt.compareSync(password, user.password_hash)) {
// ...
}
Spot the bug? The schema uses passwordHash (camelCase), but the login endpoint uses user.password_hash (snake_case). AI forgot its own naming convention from 40 messages ago. This produces a runtime error — user.password_hash is undefined, bcrypt comparison fails, and every login attempt is rejected.
How Context Degrades
- Messages 1—10 — AI remembers everything. Consistent names, types, patterns.
- Messages 10—30 — Starts paraphrasing. Key details may shift slightly.
- Messages 30+ — Earlier decisions are effectively forgotten. Contradictions likely.
Pro Tip: The Context Anchor
Every 10-15 messages, re-paste your types file, schema, or a summary of key decisions. This costs you 15 seconds and prevents the class of bugs where AI contradicts its own earlier output. Think of it as garbage collection for AI context — periodic, cheap, essential.
How to Catch Everything
Nine failure modes is a lot to watch for. Here are the practical detection techniques that cover all of them systematically, ranked from fastest to most thorough.
The 30-Second Scan
Before accepting any AI-generated code, scan for these specific patterns:
- Imports you don't recognize → Hallucinated APIs (#2)
- String interpolation in queries → SQL injection (#5)
- Hardcoded strings that look like secrets → Security blind spot (#5)
SELECT *in any query → Data exposure (#5)- Class components in a hooks codebase → Stale knowledge (#3)
- New dependencies you didn't ask for → Copy-paste trap (#6)
- Configuration objects with 5+ options → Over-engineering (#7)
The Adversarial Review
After accepting code into your project, periodically ask AI to attack it:
Switch roles. You're a hostile code reviewer. Find every bug, security hole, and design flaw in this code. Be ruthless. Don't hold back to be polite.
[paste the code AI just generated]
The same AI that wrote the bugs can find the bugs when you change its role. This works because code generation and code review activate different patterns — the review prompt surfaces criticisms that the generation prompt suppressed.
The Edge Case Challenge
For any function with inputs, run through these mentally or in a prompt:
- What happens with zero?
- What happens with null or undefined?
- What happens with an empty string?
- What happens with a very large input?
- What happens with negative numbers?
- What happens concurrently?
The Test-First Backstop
The most reliable detection method: write or generate tests before (or immediately after) the implementation. AI-generated tests catch AI-generated bugs because the test-writing prompt considers edge cases the implementation prompt ignored.
Write tests for this function. Include: happy path, edge cases, error cases, boundary values, and one test you think will fail.
That last instruction — "one test you think will fail" — is powerful. It forces AI to think about where the code is weakest. The test it writes often does fail, revealing a real bug.
The Checklist
Pin this to your wall. Run through it every time you accept AI-generated code:
Pre-Accept Review
Trust, But Verify
AI is not your enemy. It's not going to sabotage your code on purpose. But it's also not your safety net. It's a powerful accelerator that happens to be confidently wrong in predictable ways.
Learn the failure modes. Internalize the checklist. Make the 30-second scan automatic. The developers who get the most from AI are not the ones who trust it the most — they're the ones who know exactly where not to trust it.
Nine Failure Modes — Summary
- The Confidence Problem — AI sounds equally certain whether right or wrong. Never use tone as a proxy for correctness.
- Hallucinated APIs — Functions and methods that don't exist. Always check official docs for unfamiliar APIs.
- Stale Knowledge — Outdated patterns from years of training data. Popular frameworks are highest risk.
- Subtle Logic Errors — Code that works for common inputs but fails on edge cases. Always test boundaries.
- Security Blind Spots — SQL injection, hardcoded secrets, data exposure. Read every line of auth code.
- The Copy-Paste Trap — Popular solutions applied to your specific problem. Add constraints to your prompts.
- Architecture Astronautics — Over-engineered solutions for simple problems. Apply the 30-second explanation test.
- The Yes-Man Problem — AI agrees with bad ideas. Always ask for critique before implementation.
- Phantom Context — Contradictions in long conversations. Re-paste key context every 10-15 messages.