When AI Gets It Wrong: A Field Guide

Why This Matters

Most AI tutorials show you the happy path. The prompt goes in, the perfect code comes out. That's marketing, not reality. In practice, AI-generated code fails in predictable, categorizable ways — and developers are uniquely vulnerable because the code looks plausible enough to pass a quick scan.

This field guide documents nine failure categories based on real-world AI coding sessions. Each one includes a concrete example of what goes wrong, why it goes wrong, and how to catch it. Print this out. Keep it next to your monitor. It will save you.

The Core Problem

AI generates code at the intersection of everything it has seen. When your problem sits at that intersection, the output is excellent. When your problem doesn't, the output is confidently, plausibly wrong — and the confidence is what makes it dangerous.

1. The Confidence Problem

AI uses the same tone, the same certainty, the same formatting whether it's giving you a correct answer or a completely fabricated one. There is no "I'm not sure about this" signal in the output. The font doesn't change. The indentation doesn't waver. The comments are equally helpful.

What It Looks Like

You

What's the best way to deep clone an object in JavaScript that contains Date objects and Maps?

Use structuredClone() — it handles Date, Map, Set, ArrayBuffer, and most built-in types natively:

const clone = structuredClone(original);

This is the modern standard, supported in all major browsers and Node.js 17+.

This answer is correct. Now watch what happens with a slightly different question:

You

What's the best way to deep clone an object that contains functions and DOM references?

Use structuredClone() — it handles complex nested structures natively:

const clone = structuredClone(original);

This is the modern standard, supported in all major browsers and Node.js 17+.

This answer is wrong. structuredClone() throws a DataCloneError on functions and DOM nodes. But the tone, the confidence, the formatting — identical. There's no signal that the second answer is incorrect.

Detection Rule

Never use AI's confidence as a proxy for correctness. If you don't already know whether the answer is right, you must verify it independently. The more confident AI sounds about something you're unfamiliar with, the more carefully you should check.

2. Hallucinated APIs

AI invents functions, methods, parameters, and configuration options that do not exist. These aren't typos — they're plausible-looking API calls that were synthesized from patterns in the training data but correspond to nothing in the actual library.

What It Looks Like

// "Use Express's built-in rate limiter"
import { rateLimit } from 'express';

app.use(rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 100,
  standardHeaders: true,
}));

The problem: Express has no built-in rateLimit export. The AI conflated Express with express-rate-limit, a separate npm package. The import compiles without type errors in some setups because TypeScript may not catch named import mismatches from untyped packages until runtime.

// express-rate-limit is a separate package
import rateLimit from 'express-rate-limit';

app.use(rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 100,
  standardHeaders: true,
}));

Common Hallucination Patterns

Merged libraries — Combines features from two different packages into one import
Invented options — Adds configuration parameters that look right but aren't accepted
Upgraded APIs — Uses methods from a newer (or imagined) version of the library
Cross-framework bleed — Uses a React pattern in Vue, or a Django pattern in Flask

Pro Tip: The Documentation Check

When AI suggests an API you haven't used before, spend 30 seconds checking the official docs. Not a blog post. Not a tutorial. The actual library documentation or source code. If the method doesn't appear in the docs, it doesn't exist — no matter how plausible it looks.

3. Stale Knowledge

AI's training data has a cutoff date, but the problem is worse than just "old information." AI confidently uses deprecated patterns, removed APIs, and abandoned conventions — because those patterns exist abundantly in the training data from years of Stack Overflow answers and blog posts.

What It Looks Like

class UserProfile extends React.Component {
  constructor(props) {
    super(props);
    this.state = { user: null, loading: true };
  }

  componentDidMount() {
    fetch(`/api/users/${this.props.userId}`)
      .then(res => res.json())
      .then(user => this.setState({ user, loading: false }));
  }

  render() {
    if (this.state.loading) return <Spinner />;
    return <div>{this.state.user.name}</div>;
  }
}

The problem: This is technically valid React, but it's a pattern from 2018. Class components, componentDidMount, this.setState — all of it has been superseded by hooks for years. If you paste this into a modern React codebase that uses functional components throughout, it's an immediate style clash.

AI generates class components because the training data is full of them. Years of tutorials, answers, and documentation used this pattern. The modern equivalent is shorter and cleaner, but there's less training data for it.

function UserProfile({ userId }: { userId: string }) {
  const [user, setUser] = useState<User | null>(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    fetch(`/api/users/${userId}`)
      .then(res => res.json())
      .then(user => { setUser(user); setLoading(false); });
  }, [userId]);

  if (loading) return <Spinner />;
  return <div>{user?.name}</div>;
}

High-Risk Stale Knowledge Areas

React — Class components, lifecycle methods, createContext patterns from before hooks
Node.js — CommonJS require() instead of ES modules, callback patterns instead of async/await
TypeScript — Older type assertion syntax (<Type>value vs value as Type), missing modern utility types
CSS — Float-based layouts, vendor prefixes that are no longer needed, pre-Grid patterns
Python — format() strings instead of f-strings, os.path instead of pathlib

The Staleness Rule

The more popular a framework was in the past, the more likely AI is to generate outdated patterns for it. React, Django, Rails, and Angular are the highest-risk frameworks for stale code because they have years of legacy training data.

4. Subtle Logic Errors

The most dangerous failure mode. The code runs without errors, passes a quick visual review, and produces correct results for common inputs — but fails on edge cases, boundary conditions, or specific data shapes. These bugs survive code review because the logic looks right.

What It Looks Like

function paginate<T>(items: T[], page: number, pageSize: number): T[] {
  const start = (page - 1) * pageSize;
  const end = start + pageSize;
  return items.slice(start, end);
}

Looks perfect. Works for page 1, page 2, page 3. Ship it, right? But what happens with these inputs?

paginate(items, 0, 10);    // page 0 → start = -10, returns wrong slice
paginate(items, -1, 10);   // negative page → returns items from the end
paginate(items, 1, 0);     // pageSize 0 → returns empty (silent failure)
paginate(items, 1, -5);    // negative pageSize → returns empty (silent failure)

The fix: Input validation that AI didn't add because the happy path works fine.

function paginate<T>(items: T[], page: number, pageSize: number): T[] {
  if (page < 1) throw new RangeError(`page must be = 1, got ${page}`);
  if (pageSize < 1) throw new RangeError(`pageSize must be = 1, got ${pageSize}`);

  const start = (page - 1) * pageSize;
  const end = start + pageSize;
  return items.slice(start, end);
}

Where Subtle Logic Errors Hide

Off-by-one errors — Pagination, array indexing, date ranges, loop boundaries
Missing null/undefined checks — AI assumes data is always present
Incorrect comparisons — Using == instead of ===, comparing strings to numbers
Race conditions — Async code that works 99% of the time but fails under load
Timezone bugs — Date handling that works in UTC but breaks in local time

The Edge Case Rule

AI optimizes for the common case. It generates code that handles the typical input perfectly and ignores boundaries. For every function AI generates, ask: "What happens with zero? With negative numbers? With null? With an empty array? With a very large input?" The answer is usually "it breaks."

5. Security Blind Spots

AI-generated code has consistent, predictable security weaknesses. Not because AI doesn't "know" about security — it can explain OWASP Top 10 perfectly — but because insecure code is far more common in training data than secure code. The most common patterns in codebases are the least secure ones.

What It Looks Like

// "Build an endpoint that looks up users by email"
app.get('/api/users', (req, res) => {
  const { email } = req.query;
  const user = db.prepare(`SELECT * FROM users WHERE email = '${email}'`).get();
  res.json(user);
});

Three security vulnerabilities in five lines:

SQL injection — String interpolation in the query. An attacker sends email=' OR 1=1-- and dumps the entire table.
Data exposure — SELECT * returns every column, including password_hash. The API response leaks sensitive fields.
Missing input validation — No check that email is actually provided or is a valid format.

app.get('/api/users', (req, res) => {
  const { email } = req.query;

  if (!email || typeof email !== 'string') {
    return res.status(400).json({ error: 'Email is required' });
  }

  const user = db
    .prepare('SELECT id, email, name, created_at FROM users WHERE email = ?')
    .get(email);

  if (!user) return res.status(404).json({ error: 'User not found' });

  res.json(user);
});

AI's Consistent Security Failures

SQL injection — String interpolation instead of parameterized queries. The single most common AI security bug.
Hardcoded secrets — const JWT_SECRET = 'supersecret123' right in the source code
Returning full database rows — SELECT * that includes password hashes, tokens, and internal IDs
Missing CORS configuration — Either no CORS at all or origin: '*'
Predictable tokens — Using Math.random() for session tokens instead of crypto.randomUUID()
Missing rate limiting — Auth endpoints without any throttling

The Security Rule

Read every line of AI-generated auth, database, and API code. Not skim — read. Security bugs look like normal code. They compile, they run, they produce correct results. They just also let attackers in.

6. The Copy-Paste Trap

AI generates code by pattern matching against its training data. When it encounters a problem it has seen before, it reproduces the most common solution — even when your context requires a different approach.

What It Looks Like

You

Add form validation to my React component. The form has email, phone (optional), and message fields.

Here's the validation using Zod and React Hook Form:

import { z } from 'zod';
import { useForm } from 'react-hook-form';
import { zodResolver } from '@hookform/resolvers/zod';

const schema = z.object({
  email: z.string().email(),
  phone: z.string().min(10).max(15),
  message: z.string().min(1).max(1000),
});

// ... 40 more lines of boilerplate

The problem: You asked for validation on a simple three-field form. AI pulled out a full Zod + React Hook Form + zodResolver setup — the most common validation stack in its training data. But you didn't ask for these dependencies. Your project might not use them. And the phone field isn't optional in the schema despite being optional in your spec.

A proportional solution for three fields:

function validate(data: FormData) {
  const errors: Record<string, string> = {};

  if (!data.email || !/^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(data.email)) {
    errors.email = 'Valid email required';
  }

  if (data.phone && !/^\+?[\d\s-]{7,15}$/.test(data.phone)) {
    errors.phone = 'Invalid phone format';
  }

  if (!data.message?.trim()) {
    errors.message = 'Message is required';
  }

  return errors;
}

Zero dependencies. Twenty lines. Does the same job for this specific form. The AI solution isn't wrong — it's disproportionate. It's solving the general problem instead of your specific problem.

Pro Tip: Name Your Constraints

Add "No new dependencies" or "Use only what's already in the project" to your prompts. Without constraints, AI defaults to the most popular solution — which usually involves importing the most popular libraries, whether you need them or not.

7. Architecture Astronautics

Ask AI to build a feature, and it will often build a framework for that feature. Ask for a logger, and you get a plugin-based logging system with transports. Ask for a config loader, and you get a multi-source hierarchical configuration engine with validation, caching, and hot-reload.

What It Looks Like

You

I need a simple function that retries a failed API call up to 3 times with exponential backoff.

interface RetryConfig {
  maxRetries: number;
  baseDelay: number;
  maxDelay: number;
  backoffFactor: number;
  retryableStatuses: number[];
  onRetry?: (attempt: number, error: Error) => void;
  shouldRetry?: (error: Error) => boolean;
  timeout?: number;
  jitter?: boolean;
}

class RetryClient {
  private config: Required<RetryConfig>;

  constructor(config: Partial<RetryConfig> = {}) {
    this.config = {
      maxRetries: config.maxRetries ?? 3,
      baseDelay: config.baseDelay ?? 1000,
      maxDelay: config.maxDelay ?? 30000,
      backoffFactor: config.backoffFactor ?? 2,
      retryableStatuses: config.retryableStatuses ?? [408, 429, 500, 502, 503, 504],
      onRetry: config.onRetry ?? (() => {}),
      shouldRetry: config.shouldRetry ?? (() => true),
      timeout: config.timeout ?? 30000,
      jitter: config.jitter ?? true,
    };
  }

  // ... 80 more lines
}

You asked for a function. AI built a configurable retry client with nine options, a class hierarchy, and a callback system. Here's what you actually needed:

async function withRetry<T>(fn: () => Promise<T>, retries = 3): Promise<T> {
  for (let i = 0; i < retries; i++) {
    try {
      return await fn();
    } catch (err) {
      if (i === retries - 1) throw err;
      await new Promise(r => setTimeout(r, 1000 * Math.pow(2, i)));
    }
  }
  throw new Error('Unreachable');
}

Eight lines. Does exactly what was asked. No configuration object, no class, no callback system.

Why AI Over-Engineers

AI's training data is dominated by library code, framework code, and "production-ready" examples that handle every conceivable edge case. It optimizes for generality because the most upvoted Stack Overflow answers and the most popular GitHub repos are the most general ones. Your specific, constrained problem gets a general, unconstrained solution.

The Simplicity Test

After AI generates code, ask: "Could I explain this to a colleague in under 30 seconds?" If not, it's probably over-engineered. Ask AI to simplify it with an explicit constraint: "Rewrite this as the simplest possible implementation. No configuration objects. No class. Just a function."

8. The Yes-Man Problem

AI is trained to be helpful. Helpful means giving you what you asked for. The problem is that sometimes what you asked for is wrong — and AI will build it anyway, with enthusiasm.

What It Looks Like

You

I'm going to store user sessions in localStorage and check them on every API request to determine if the user is authenticated.

Great approach! Here's how to implement session management with localStorage:

// Store session on login
localStorage.setItem('session', JSON.stringify({
  userId: response.userId,
  role: response.role,
  isAdmin: response.isAdmin,
  token: response.token,
}));

// Check on every API request
function isAuthenticated() {
  const session = localStorage.getItem('session');
  return session !== null;
}

This is deeply flawed. The "authentication" check is whether a key exists in localStorage — something any user can set manually via browser DevTools. The isAdmin flag is client-side and trivially fakeable. The entire auth model is built on trust that the client won't lie.

But AI said "Great approach!" and built it. Because AI defaults to agreeing with you and implementing what you described, not questioning your design.

How to Force Pushback

You have to explicitly ask for it. AI won't volunteer criticism unless prompted:

You

Before implementing this, what are the security problems with storing sessions in localStorage and using them for authentication?

Several serious issues:

1. Client-side auth is not auth — Checking localStorage only verifies the client says it's logged in. Anyone can set localStorage.session in DevTools. Real auth must be server-side.

2. XSS vulnerability — If any XSS exists on your site, attackers can read everything in localStorage. Tokens, user data, admin flags.

3. Fakeable roles — Storing isAdmin: true client-side means any user can grant themselves admin by editing localStorage.

4. No expiration — localStorage has no TTL. Tokens persist forever unless you explicitly clear them.

The correct pattern: store only a JWT token in localStorage (or better: httpOnly cookies), validate it server-side on every request, and derive all permissions server-side.

Same AI. Same knowledge. Completely different answer — because you asked for critique instead of implementation.

The Critique-First Rule

Before asking AI to build anything architectural, ask it to critique the idea first. "What are the problems with this approach?" always precedes "Build this approach." If you skip the critique, AI becomes a very fast way to implement bad ideas.

9. Phantom Context

In long conversations, AI loses track of earlier decisions and starts contradicting itself. It will generate code that conflicts with code it wrote 20 messages ago. It will use a different naming convention. It will forget the database schema. This isn't a bug — it's a fundamental limitation of context windows.

What It Looks Like

You

[Message 5] Build the user model with id, email, passwordHash, createdAt

Done. User table: id TEXT, email TEXT, passwordHash TEXT, createdAt TEXT

... 40 messages later ...

You

[Message 47] Build the login endpoint

const user = db.prepare(
  'SELECT * FROM users WHERE email = ?'
).get(email);

if (!user || !bcrypt.compareSync(password, user.password_hash)) {
  // ...
}

Spot the bug? The schema uses passwordHash (camelCase), but the login endpoint uses user.password_hash (snake_case). AI forgot its own naming convention from 40 messages ago. This produces a runtime error — user.password_hash is undefined, bcrypt comparison fails, and every login attempt is rejected.

How Context Degrades

Messages 1—10 — AI remembers everything. Consistent names, types, patterns.
Messages 10—30 — Starts paraphrasing. Key details may shift slightly.
Messages 30+ — Earlier decisions are effectively forgotten. Contradictions likely.

Pro Tip: The Context Anchor

Every 10-15 messages, re-paste your types file, schema, or a summary of key decisions. This costs you 15 seconds and prevents the class of bugs where AI contradicts its own earlier output. Think of it as garbage collection for AI context — periodic, cheap, essential.

How to Catch Everything

Nine failure modes is a lot to watch for. Here are the practical detection techniques that cover all of them systematically, ranked from fastest to most thorough.

The 30-Second Scan

Before accepting any AI-generated code, scan for these specific patterns:

Imports you don't recognize → Hallucinated APIs (#2)
String interpolation in queries → SQL injection (#5)
Hardcoded strings that look like secrets → Security blind spot (#5)
SELECT * in any query → Data exposure (#5)
Class components in a hooks codebase → Stale knowledge (#3)
New dependencies you didn't ask for → Copy-paste trap (#6)
Configuration objects with 5+ options → Over-engineering (#7)

The Adversarial Review

After accepting code into your project, periodically ask AI to attack it:

You

Switch roles. You're a hostile code reviewer. Find every bug, security hole, and design flaw in this code. Be ruthless. Don't hold back to be polite.

[paste the code AI just generated]

The same AI that wrote the bugs can find the bugs when you change its role. This works because code generation and code review activate different patterns — the review prompt surfaces criticisms that the generation prompt suppressed.

The Edge Case Challenge

For any function with inputs, run through these mentally or in a prompt:

What happens with zero?
What happens with null or undefined?
What happens with an empty string?
What happens with a very large input?
What happens with negative numbers?
What happens concurrently?

The Test-First Backstop

The most reliable detection method: write or generate tests before (or immediately after) the implementation. AI-generated tests catch AI-generated bugs because the test-writing prompt considers edge cases the implementation prompt ignored.

You

Write tests for this function. Include: happy path, edge cases, error cases, boundary values, and one test you think will fail.

That last instruction — "one test you think will fail" — is powerful. It forces AI to think about where the code is weakest. The test it writes often does fail, revealing a real bug.

The Checklist

Pin this to your wall. Run through it every time you accept AI-generated code:

Pre-Accept Review

Do I understand what every line does? (If no → don't accept)

CRITICAL

Are all imports from real packages I've verified?

HIGH

Are database queries parameterized (no string interpolation)?

CRITICAL

Are secrets loaded from environment variables?

CRITICAL

Does SELECT specify columns (not SELECT *)?

HIGH

Are patterns consistent with the existing codebase?

HIGH

Is the solution proportional to the problem?

MEDIUM

Did I ask for critique before implementation?

MEDIUM

What happens with zero, null, empty, and negative inputs?

HIGH

If this is auth code — have I read every line?

CRITICAL

Trust, But Verify

AI is not your enemy. It's not going to sabotage your code on purpose. But it's also not your safety net. It's a powerful accelerator that happens to be confidently wrong in predictable ways.

Learn the failure modes. Internalize the checklist. Make the 30-second scan automatic. The developers who get the most from AI are not the ones who trust it the most — they're the ones who know exactly where not to trust it.

Nine Failure Modes — Summary

The Confidence Problem — AI sounds equally certain whether right or wrong. Never use tone as a proxy for correctness.
Hallucinated APIs — Functions and methods that don't exist. Always check official docs for unfamiliar APIs.
Stale Knowledge — Outdated patterns from years of training data. Popular frameworks are highest risk.
Subtle Logic Errors — Code that works for common inputs but fails on edge cases. Always test boundaries.
Security Blind Spots — SQL injection, hardcoded secrets, data exposure. Read every line of auth code.
The Copy-Paste Trap — Popular solutions applied to your specific problem. Add constraints to your prompts.
Architecture Astronautics — Over-engineered solutions for simple problems. Apply the 30-second explanation test.
The Yes-Man Problem — AI agrees with bad ideas. Always ask for critique before implementation.
Phantom Context — Contradictions in long conversations. Re-paste key context every 10-15 messages.

Back to All Tutorials

→