Massive context windows are a safety net, not a primary tool. Sending irrelevant code to an LLM dilutes its attention, increases the odds that it copies deprecated patterns, and drives up API costs. The most effective developers curate context surgically: they give the AI exactly what it needs, and nothing more.
The Million-Token Illusion
In 2026, frontier LLMs routinely accept a million tokens or more — large enough to swallow huge slices of a repository. Tools like Cursor, Copilot, and Claude Code make it frictionless to pull in broad workspace context with one command.
It feels like magic. Why spend time identifying the five relevant files when the model can read all 5,000?
Because context dumping is fundamentally flawed. While some models can accept extremely large prompts, reliability still depends on signal quality. The larger the haystack, the more deliberate you must be about where the needle sits. In software work, that shows up in subtle and expensive ways.
The Three Costs of Context Dumping
1. Attention Dilution and "Lost in the Middle"
When you send 50 files to an LLM to fix a bug in one React component, the model has to weigh the relevance of every single line. Long-context evaluations have shown a "lost in the middle" effect: models overweight the beginning and end of their context while struggling to use facts buried in the middle. Research through 2025–2026 confirms the effect persists even on million-token models — it is architectural, rooted in how positional encodings decay over distance, not something eliminated by simply widening the context window.
If the crucial interface definition is file #25 out of 50, the AI may invent a structure that looks plausible based on the surrounding noise. The result is familiar: type errors, broken builds, and a fix that seems confident until you run it.
2. Pattern Pollution from Legacy Code
Your codebase is not uniform. It contains the clean module you wrote yesterday and the deeply coupled legacy system from three years ago. When you use global context, the AI sees everything.
If you ask it to generate a new data-fetching hook, and it sees 40 examples of deprecated Redux boilerplate beside 2 examples of your newer React Query setup, which pattern do you think it will confidently reproduce? Context dumping can fight your attempts to modernize a codebase by anchoring the model to technical debt.
3. The Literal Financial Cost
Even with lower API prices, large prompts cost money and time. Processing a 500k-token prompt takes much longer than a 5k-token prompt — delays that compound quickly when you are iterating.
Prompt caching — now standard across providers, with Anthropic offering up to 90% and OpenAI 50% discounts on cached prefixes — changes the cost story for stable, repeated context. It does not help much with context dumping: if you are sending a different set of files each turn, cache hit rates stay low and you pay full input price on the variation. Caching rewards structured, deliberate context; dumping undermines it.
The Art of Context Curation
The alternative to context dumping is context curation: acting as an editorial director for your AI assistant and defining the exact boundaries of the problem space.
- Targeted Files: Include only the specific files being modified and their immediate dependencies, such as the component, its CSS module, and the shared types file.
- The "One Good Example" Rule: Instead of letting the AI hunt for a pattern, provide the single best modern example in your codebase: "Follow the pattern used in
AuthService.ts." - Exclude the Noise: Leave out
package-lock.json, generated files, massive test fixtures, compiled assets, and old migration snapshots unless they are truly relevant. - Provide Intent, Not Just Code: The best context is often plain English. Explain why the code is structured a certain way rather than making the AI infer intent from implementation details.
- Lock Scope Per Turn: End each prompt with explicit boundaries: "Do not change files outside X, Y, and Z." This reduces accidental broad edits.
A 30-Second Context Filter
Before you send a prompt, run this quick filter: Which files will change? Which file defines the key contract or type? Which single modern example should the model imitate? If you cannot answer all three, do not send yet.
When to Use Global Context
Does this mean you should never use global context features? No. Global context is powerful for discovery and analysis, rather than synthesis.
Use @codebase when you are asking:
- "Where in the codebase do we handle Stripe webhooks?"
- "What is the impact of changing the
Userinterface on our database migrations?" - "Find all instances where we are using the deprecated
LegacyButtoncomponent."
Once you have your answers, switch back to curated context for the actual implementation.
Conclusion
As AI models become more capable, the differentiator will not be raw generation speed. It will be how well you direct model attention. Stop throwing the haystack at the model. Hand it the needle.
Google Gemini's long-context documentation describes 1M+ token context windows and the latency tradeoffs that come with very large prompts. The paper Lost in the Middle (Liu et al., 2023) remains a foundational reference — and follow-up research through 2025–2026 confirms the position bias it identified persists on current frontier models, even those with million-token windows. Widening the window does not neutralise the effect.