Field Guide Honest

When AI Gets It Wrong

AI is confidently wrong with the same tone it uses when it's right. This guide catalogs every major failure mode — with real examples, real fixes, and practical techniques for catching errors before they ship.


Why This Matters

Most AI tutorials show you the happy path. The prompt goes in, the perfect code comes out. That's marketing, not reality. In practice, AI-generated code fails in predictable, categorizable ways — and developers are uniquely vulnerable because the code looks plausible enough to pass a quick scan.

This field guide documents nine failure categories based on real-world AI coding sessions. Each one includes a concrete example of what goes wrong, why it goes wrong, and how to catch it. Print this out. Keep it next to your monitor. It will save you.

The Core Problem

AI generates code at the intersection of everything it has seen. When your problem sits at that intersection, the output is excellent. When your problem doesn't, the output is confidently, plausibly wrong — and the confidence is what makes it dangerous.


1. The Confidence Problem

AI uses the same tone, the same certainty, the same formatting whether it's giving you a correct answer or a completely fabricated one. There is no "I'm not sure about this" signal in the output. The font doesn't change. The indentation doesn't waver. The comments are equally helpful.

What It Looks Like

You

What's the best way to deep clone an object in JavaScript that contains Date objects and Maps?

AI

Use structuredClone() — it handles Date, Map, Set, ArrayBuffer, and most built-in types natively:

const clone = structuredClone(original);

This is the modern standard, supported in all major browsers and Node.js 17+.

This answer is correct. Now watch what happens with a slightly different question:

You

What's the best way to deep clone an object that contains functions and DOM references?

AI

Use structuredClone() — it handles complex nested structures natively:

const clone = structuredClone(original);

This is the modern standard, supported in all major browsers and Node.js 17+.

This answer is wrong. structuredClone() throws a DataCloneError on functions and DOM nodes. But the tone, the confidence, the formatting — identical. There's no signal that the second answer is incorrect.

Detection Rule

Never use AI's confidence as a proxy for correctness. If you don't already know whether the answer is right, you must verify it independently. The more confident AI sounds about something you're unfamiliar with, the more carefully you should check.


2. Hallucinated APIs

AI invents functions, methods, parameters, and configuration options that do not exist. These aren't typos — they're plausible-looking API calls that were synthesized from patterns in the training data but correspond to nothing in the actual library.

What It Looks Like

// "Use Express's built-in rate limiter"
import { rateLimit } from 'express';

app.use(rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 100,
  standardHeaders: true,
}));

The problem: Express has no built-in rateLimit export. The AI conflated Express with express-rate-limit, a separate npm package. The import compiles without type errors in some setups because TypeScript may not catch named import mismatches from untyped packages until runtime.

// express-rate-limit is a separate package
import rateLimit from 'express-rate-limit';

app.use(rateLimit({
  windowMs: 15 * 60 * 1000,
  max: 100,
  standardHeaders: true,
}));

Common Hallucination Patterns

Pro Tip: The Documentation Check

When AI suggests an API you haven't used before, spend 30 seconds checking the official docs. Not a blog post. Not a tutorial. The actual library documentation or source code. If the method doesn't appear in the docs, it doesn't exist — no matter how plausible it looks.


3. Stale Knowledge

AI's training data has a cutoff date, but the problem is worse than just "old information." AI confidently uses deprecated patterns, removed APIs, and abandoned conventions — because those patterns exist abundantly in the training data from years of Stack Overflow answers and blog posts.

What It Looks Like

class UserProfile extends React.Component {
  constructor(props) {
    super(props);
    this.state = { user: null, loading: true };
  }

  componentDidMount() {
    fetch(`/api/users/${this.props.userId}`)
      .then(res => res.json())
      .then(user => this.setState({ user, loading: false }));
  }

  render() {
    if (this.state.loading) return <Spinner />;
    return <div>{this.state.user.name}</div>;
  }
}

The problem: This is technically valid React, but it's a pattern from 2018. Class components, componentDidMount, this.setState — all of it has been superseded by hooks for years. If you paste this into a modern React codebase that uses functional components throughout, it's an immediate style clash.

AI generates class components because the training data is full of them. Years of tutorials, answers, and documentation used this pattern. The modern equivalent is shorter and cleaner, but there's less training data for it.

function UserProfile({ userId }: { userId: string }) {
  const [user, setUser] = useState<User | null>(null);
  const [loading, setLoading] = useState(true);

  useEffect(() => {
    fetch(`/api/users/${userId}`)
      .then(res => res.json())
      .then(user => { setUser(user); setLoading(false); });
  }, [userId]);

  if (loading) return <Spinner />;
  return <div>{user?.name}</div>;
}

High-Risk Stale Knowledge Areas

The Staleness Rule

The more popular a framework was in the past, the more likely AI is to generate outdated patterns for it. React, Django, Rails, and Angular are the highest-risk frameworks for stale code because they have years of legacy training data.


4. Subtle Logic Errors

The most dangerous failure mode. The code runs without errors, passes a quick visual review, and produces correct results for common inputs — but fails on edge cases, boundary conditions, or specific data shapes. These bugs survive code review because the logic looks right.

What It Looks Like

function paginate<T>(items: T[], page: number, pageSize: number): T[] {
  const start = (page - 1) * pageSize;
  const end = start + pageSize;
  return items.slice(start, end);
}

Looks perfect. Works for page 1, page 2, page 3. Ship it, right? But what happens with these inputs?

paginate(items, 0, 10);    // page 0 → start = -10, returns wrong slice
paginate(items, -1, 10);   // negative page → returns items from the end
paginate(items, 1, 0);     // pageSize 0 → returns empty (silent failure)
paginate(items, 1, -5);    // negative pageSize → returns empty (silent failure)

The fix: Input validation that AI didn't add because the happy path works fine.

function paginate<T>(items: T[], page: number, pageSize: number): T[] {
  if (page < 1) throw new RangeError(`page must be = 1, got ${page}`);
  if (pageSize < 1) throw new RangeError(`pageSize must be = 1, got ${pageSize}`);

  const start = (page - 1) * pageSize;
  const end = start + pageSize;
  return items.slice(start, end);
}

Where Subtle Logic Errors Hide

The Edge Case Rule

AI optimizes for the common case. It generates code that handles the typical input perfectly and ignores boundaries. For every function AI generates, ask: "What happens with zero? With negative numbers? With null? With an empty array? With a very large input?" The answer is usually "it breaks."


5. Security Blind Spots

AI-generated code has consistent, predictable security weaknesses. Not because AI doesn't "know" about security — it can explain OWASP Top 10 perfectly — but because insecure code is far more common in training data than secure code. The most common patterns in codebases are the least secure ones.

What It Looks Like

// "Build an endpoint that looks up users by email"
app.get('/api/users', (req, res) => {
  const { email } = req.query;
  const user = db.prepare(`SELECT * FROM users WHERE email = '${email}'`).get();
  res.json(user);
});

Three security vulnerabilities in five lines:

app.get('/api/users', (req, res) => {
  const { email } = req.query;

  if (!email || typeof email !== 'string') {
    return res.status(400).json({ error: 'Email is required' });
  }

  const user = db
    .prepare('SELECT id, email, name, created_at FROM users WHERE email = ?')
    .get(email);

  if (!user) return res.status(404).json({ error: 'User not found' });

  res.json(user);
});

AI's Consistent Security Failures

The Security Rule

Read every line of AI-generated auth, database, and API code. Not skim — read. Security bugs look like normal code. They compile, they run, they produce correct results. They just also let attackers in.


6. The Copy-Paste Trap

AI generates code by pattern matching against its training data. When it encounters a problem it has seen before, it reproduces the most common solution — even when your context requires a different approach.

What It Looks Like

You

Add form validation to my React component. The form has email, phone (optional), and message fields.

AI

Here's the validation using Zod and React Hook Form:

import { z } from 'zod';
import { useForm } from 'react-hook-form';
import { zodResolver } from '@hookform/resolvers/zod';

const schema = z.object({
  email: z.string().email(),
  phone: z.string().min(10).max(15),
  message: z.string().min(1).max(1000),
});

// ... 40 more lines of boilerplate

The problem: You asked for validation on a simple three-field form. AI pulled out a full Zod + React Hook Form + zodResolver setup — the most common validation stack in its training data. But you didn't ask for these dependencies. Your project might not use them. And the phone field isn't optional in the schema despite being optional in your spec.

A proportional solution for three fields:

function validate(data: FormData) {
  const errors: Record<string, string> = {};

  if (!data.email || !/^[^\s@]+@[^\s@]+\.[^\s@]+$/.test(data.email)) {
    errors.email = 'Valid email required';
  }

  if (data.phone && !/^\+?[\d\s-]{7,15}$/.test(data.phone)) {
    errors.phone = 'Invalid phone format';
  }

  if (!data.message?.trim()) {
    errors.message = 'Message is required';
  }

  return errors;
}

Zero dependencies. Twenty lines. Does the same job for this specific form. The AI solution isn't wrong — it's disproportionate. It's solving the general problem instead of your specific problem.

Pro Tip: Name Your Constraints

Add "No new dependencies" or "Use only what's already in the project" to your prompts. Without constraints, AI defaults to the most popular solution — which usually involves importing the most popular libraries, whether you need them or not.


7. Architecture Astronautics

Ask AI to build a feature, and it will often build a framework for that feature. Ask for a logger, and you get a plugin-based logging system with transports. Ask for a config loader, and you get a multi-source hierarchical configuration engine with validation, caching, and hot-reload.

What It Looks Like

You

I need a simple function that retries a failed API call up to 3 times with exponential backoff.

AI
interface RetryConfig {
  maxRetries: number;
  baseDelay: number;
  maxDelay: number;
  backoffFactor: number;
  retryableStatuses: number[];
  onRetry?: (attempt: number, error: Error) => void;
  shouldRetry?: (error: Error) => boolean;
  timeout?: number;
  jitter?: boolean;
}

class RetryClient {
  private config: Required<RetryConfig>;

  constructor(config: Partial<RetryConfig> = {}) {
    this.config = {
      maxRetries: config.maxRetries ?? 3,
      baseDelay: config.baseDelay ?? 1000,
      maxDelay: config.maxDelay ?? 30000,
      backoffFactor: config.backoffFactor ?? 2,
      retryableStatuses: config.retryableStatuses ?? [408, 429, 500, 502, 503, 504],
      onRetry: config.onRetry ?? (() => {}),
      shouldRetry: config.shouldRetry ?? (() => true),
      timeout: config.timeout ?? 30000,
      jitter: config.jitter ?? true,
    };
  }

  // ... 80 more lines
}

You asked for a function. AI built a configurable retry client with nine options, a class hierarchy, and a callback system. Here's what you actually needed:

async function withRetry<T>(fn: () => Promise<T>, retries = 3): Promise<T> {
  for (let i = 0; i < retries; i++) {
    try {
      return await fn();
    } catch (err) {
      if (i === retries - 1) throw err;
      await new Promise(r => setTimeout(r, 1000 * Math.pow(2, i)));
    }
  }
  throw new Error('Unreachable');
}

Eight lines. Does exactly what was asked. No configuration object, no class, no callback system.

Why AI Over-Engineers

AI's training data is dominated by library code, framework code, and "production-ready" examples that handle every conceivable edge case. It optimizes for generality because the most upvoted Stack Overflow answers and the most popular GitHub repos are the most general ones. Your specific, constrained problem gets a general, unconstrained solution.

The Simplicity Test

After AI generates code, ask: "Could I explain this to a colleague in under 30 seconds?" If not, it's probably over-engineered. Ask AI to simplify it with an explicit constraint: "Rewrite this as the simplest possible implementation. No configuration objects. No class. Just a function."


8. The Yes-Man Problem

AI is trained to be helpful. Helpful means giving you what you asked for. The problem is that sometimes what you asked for is wrong — and AI will build it anyway, with enthusiasm.

What It Looks Like

You

I'm going to store user sessions in localStorage and check them on every API request to determine if the user is authenticated.

AI

Great approach! Here's how to implement session management with localStorage:

// Store session on login
localStorage.setItem('session', JSON.stringify({
  userId: response.userId,
  role: response.role,
  isAdmin: response.isAdmin,
  token: response.token,
}));

// Check on every API request
function isAuthenticated() {
  const session = localStorage.getItem('session');
  return session !== null;
}

This is deeply flawed. The "authentication" check is whether a key exists in localStorage — something any user can set manually via browser DevTools. The isAdmin flag is client-side and trivially fakeable. The entire auth model is built on trust that the client won't lie.

But AI said "Great approach!" and built it. Because AI defaults to agreeing with you and implementing what you described, not questioning your design.

How to Force Pushback

You have to explicitly ask for it. AI won't volunteer criticism unless prompted:

You

Before implementing this, what are the security problems with storing sessions in localStorage and using them for authentication?

AI

Several serious issues:

1. Client-side auth is not auth — Checking localStorage only verifies the client says it's logged in. Anyone can set localStorage.session in DevTools. Real auth must be server-side.

2. XSS vulnerability — If any XSS exists on your site, attackers can read everything in localStorage. Tokens, user data, admin flags.

3. Fakeable roles — Storing isAdmin: true client-side means any user can grant themselves admin by editing localStorage.

4. No expiration — localStorage has no TTL. Tokens persist forever unless you explicitly clear them.

The correct pattern: store only a JWT token in localStorage (or better: httpOnly cookies), validate it server-side on every request, and derive all permissions server-side.

Same AI. Same knowledge. Completely different answer — because you asked for critique instead of implementation.

The Critique-First Rule

Before asking AI to build anything architectural, ask it to critique the idea first. "What are the problems with this approach?" always precedes "Build this approach." If you skip the critique, AI becomes a very fast way to implement bad ideas.


9. Phantom Context

In long conversations, AI loses track of earlier decisions and starts contradicting itself. It will generate code that conflicts with code it wrote 20 messages ago. It will use a different naming convention. It will forget the database schema. This isn't a bug — it's a fundamental limitation of context windows.

What It Looks Like

You

[Message 5] Build the user model with id, email, passwordHash, createdAt

AI

Done. User table: id TEXT, email TEXT, passwordHash TEXT, createdAt TEXT

... 40 messages later ...

You

[Message 47] Build the login endpoint

AI
const user = db.prepare(
  'SELECT * FROM users WHERE email = ?'
).get(email);

if (!user || !bcrypt.compareSync(password, user.password_hash)) {
  // ...
}

Spot the bug? The schema uses passwordHash (camelCase), but the login endpoint uses user.password_hash (snake_case). AI forgot its own naming convention from 40 messages ago. This produces a runtime error — user.password_hash is undefined, bcrypt comparison fails, and every login attempt is rejected.

How Context Degrades

Pro Tip: The Context Anchor

Every 10-15 messages, re-paste your types file, schema, or a summary of key decisions. This costs you 15 seconds and prevents the class of bugs where AI contradicts its own earlier output. Think of it as garbage collection for AI context — periodic, cheap, essential.


How to Catch Everything

Nine failure modes is a lot to watch for. Here are the practical detection techniques that cover all of them systematically, ranked from fastest to most thorough.

The 30-Second Scan

Before accepting any AI-generated code, scan for these specific patterns:

The Adversarial Review

After accepting code into your project, periodically ask AI to attack it:

You

Switch roles. You're a hostile code reviewer. Find every bug, security hole, and design flaw in this code. Be ruthless. Don't hold back to be polite.

[paste the code AI just generated]

The same AI that wrote the bugs can find the bugs when you change its role. This works because code generation and code review activate different patterns — the review prompt surfaces criticisms that the generation prompt suppressed.

The Edge Case Challenge

For any function with inputs, run through these mentally or in a prompt:

The Test-First Backstop

The most reliable detection method: write or generate tests before (or immediately after) the implementation. AI-generated tests catch AI-generated bugs because the test-writing prompt considers edge cases the implementation prompt ignored.

You

Write tests for this function. Include: happy path, edge cases, error cases, boundary values, and one test you think will fail.

That last instruction — "one test you think will fail" — is powerful. It forces AI to think about where the code is weakest. The test it writes often does fail, revealing a real bug.


The Checklist

Pin this to your wall. Run through it every time you accept AI-generated code:

Pre-Accept Review

Do I understand what every line does? (If no → don't accept)
CRITICAL
Are all imports from real packages I've verified?
HIGH
Are database queries parameterized (no string interpolation)?
CRITICAL
Are secrets loaded from environment variables?
CRITICAL
Does SELECT specify columns (not SELECT *)?
HIGH
Are patterns consistent with the existing codebase?
HIGH
Is the solution proportional to the problem?
MEDIUM
Did I ask for critique before implementation?
MEDIUM
What happens with zero, null, empty, and negative inputs?
HIGH
If this is auth code — have I read every line?
CRITICAL

Trust, But Verify

AI is not your enemy. It's not going to sabotage your code on purpose. But it's also not your safety net. It's a powerful accelerator that happens to be confidently wrong in predictable ways.

Learn the failure modes. Internalize the checklist. Make the 30-second scan automatic. The developers who get the most from AI are not the ones who trust it the most — they're the ones who know exactly where not to trust it.


Nine Failure Modes — Summary

Back to All Tutorials