Claude Opus 4.8: 1M Context, Adaptive Thinking, Dreaming Agents

TL;DR

Opus 4.8 is the first Anthropic model where 1M context is the default, not a preview. Adaptive thinking cuts wasted reasoning tokens. Dynamic workflows let a single agent spawn and coordinate 10–100 subagents for large tasks. Dreaming is scheduled background memory consolidation — agents that improve between sessions without you writing memory code. Browser agent accuracy hits 84%, which is production territory.

1M Token Context: Default, Not Preview

Previous Opus models offered extended context as an opt-in feature with caveats. Opus 4.8 ships with 1M tokens as the standard context window on the Claude API, Amazon Bedrock, and Vertex AI (200k on Microsoft Foundry). Max output is 128k tokens.

What this changes practically: you can now pass an entire large codebase in a single call without hitting limits or switching to a chunking strategy. A 50,000-line Python project fits comfortably. A full API specification plus all its consumer code fits. This is the context window that makes true "understand the whole codebase" workflows viable.

import anthropic

client = anthropic.Anthropic()

# Read entire codebase
with open("codebase_dump.txt") as f:
    full_codebase = f.read()  # Can be 500k+ tokens

response = client.messages.create(
    model="claude-opus-4-8-20260501",
    max_tokens=8192,
    messages=[{
        "role": "user",
        "content": f"{full_codebase}\n\nFind all places where database connections are not properly closed."
    }]
)

Adaptive Thinking: Reasoning That Knows When to Stop

Previous extended thinking models triggered deep reasoning on every turn, even simple ones. Opus 4.8's adaptive thinking evaluates whether a turn actually needs extended reasoning before spending tokens on it.

The result: at the same effort level, Opus 4.8 uses fewer tokens than Opus 4.7 to solve the same problems. For production workloads with mixed complexity, this compounds into significant cost reduction without changing the output quality on hard tasks.

You can still set an explicit thinking budget if your task always warrants deep reasoning. But for most agentic workflows — where agents mix trivial tool calls with complex planning steps — adaptive mode is the right default.

response = client.messages.create(
    model="claude-opus-4-8-20260501",
    max_tokens=16000,
    thinking={
        "type": "auto",  # Adaptive — triggers only when needed
        # "type": "enabled", "budget_tokens": 10000  # Force extended thinking
    },
    messages=[{"role": "user", "content": task}]
)

Dynamic Workflows: One Agent Spawning Dozens

Dynamic workflows let a single Opus 4.8 agent orchestrate 10–100 subagents in the background for tasks that are too large for a single context. The orchestrating agent breaks down the problem, assigns subtasks to subagents, tracks progress, and assembles the result.

This is the architecture that was previously custom infrastructure work — you had to build the task decomposition, queue, and aggregation yourself. Dynamic workflows surface it as a first-class API feature.

response = client.messages.create(
    model="claude-opus-4-8-20260501",
    max_tokens=4096,
    # Dynamic workflows enabled via beta header
    extra_headers={"anthropic-beta": "dynamic-workflows-2026-05"},
    messages=[{
        "role": "user",
        "content": """
          Audit this entire 200-file codebase for:
          1. Security vulnerabilities (OWASP Top 10)
          2. Performance bottlenecks
          3. Test coverage gaps

          Produce a prioritized report for each category.
        """
    }]
)
# Claude decomposes and orchestrates subagents automatically

Fast mode is available for dynamic workflows at 2× the standard rate, delivering approximately 2.5× the speed — useful when a multi-agent task is on the critical path.

Dreaming: Agents That Improve Between Sessions

Dreaming is a scheduled background process — not a real-time feature — that runs between agent sessions and consolidates memory. It reviews past sessions, surfaces recurring patterns, pulls out reusable workflows, and updates the agent's memory store with curated context.

The practical effect: an agent you deploy today and use for a week will, by end of week, have distilled your codebase conventions, your team's preferences, and common error patterns into its memory without you explicitly managing that memory yourself.

This is the first time persistent agent improvement has been built into the API layer rather than something you bolt on. For production agents running on a regular schedule — nightly code review, daily dependency audits, weekly test runs — Dreaming gives them a memory that compounds over time.

Browser Agents Hit 84% in Production Territory

Opus 4.8 scores 84% on Online-Mind2Web, the benchmark for browser agent accuracy on real-world tasks. That's roughly four times less likely to let code flaws pass unremarked compared to Opus 4.7.

Why this matters practically: 84% on a diverse real-world benchmark means a browser-controlling agent — running end-to-end tests, filling forms, scraping live data — is now reliable enough to run in production without constant human babysitting. The fail rate is low enough to build a retry strategy around rather than a human-in-the-loop strategy.

One caveat on all the figures above: benchmark numbers are a release snapshot, not a standing ranking. They reflect a specific model version, harness, and date, and they shift as newer models and evaluations arrive — treat them as a directional signal and validate against your own workloads.

Opus 4.8 — Migration Checklist

Update model ID — switch from claude-opus-4-7-* to claude-opus-4-8-20260501 in your API calls. No other changes required for basic usage.
Switch thinking to "auto" — if you have "type": "enabled" for all calls, switch to "type": "auto" and let adaptive thinking pick its budget. Monitor token usage for 48h.
Evaluate dynamic workflows — for any task that currently requires custom multi-agent orchestration, test the dynamic workflows beta. It may replace significant infrastructure code.
Enable Dreaming for production agents — any agent running on a schedule longer than a week is a candidate. The memory consolidation is automatic once enabled.
Remove manual context chunking — if you built chunking logic to stay under previous context limits, test removing it. 1M default context makes many of these strategies obsolete.

Sources