Claude Managed Agents Dreaming Explained (2026)

Your Claude agent finished a 4-hour legal document review session. It made three mistakes, developed a workaround for a tricky filetype bug, and learned that you prefer bullet summaries over prose. Then the session ended — and every single one of those lessons vanished.

That was the state of AI agents until May 6, 2026. At its Code with Claude developer event in San Francisco, Anthropic launched dreaming — a scheduled background process that reviews past agent sessions, extracts patterns, and curates memory stores so agents improve between runs. Alongside it: outcomes (public beta), multiagent orchestration for up to 20 parallel specialists (public beta), and webhooks. Together, these are the biggest infrastructure upgrades Claude Managed Agents has shipped since launch.

Here's exactly what dreaming does, how it differs from memory, how outcomes and multiagent orchestration work, and what this means for developers building production agents.

What Is Claude Dreaming?

Claude dreaming is a scheduled process that runs between agent sessions, reviewing past conversation transcripts, identifying recurring patterns, and curating the agent's memory stores — without touching the original session data.

Think of it this way: while you sleep, your brain consolidates the day's experiences into long-term memory, discards noise, and surfaces what actually mattered. Claude dreaming does the same thing for AI agents. It looks across multiple sessions, finds what the agent consistently got wrong, what workflows it converged on, and what preferences were shared across a team — then writes structured updates to memory.

Developers control how much autonomy dreaming gets. You can set it to update memory automatically after each session, or you can require human review before any changes land. Either way, dreaming never modifies the original session transcripts — it only updates memory stores.

This is especially powerful for long-running agentic workflows where agents handle the same category of task repeatedly. A customer support agent, a document review pipeline, a code review bot — these are exactly the use cases where an agent that gets smarter every session is worth dramatically more than one that resets.

Dreaming is currently in research preview. You need to request access via the Claude Platform to use it.

Dreaming vs Memory: What Is Actually Different?

Memory and dreaming solve related but distinct problems. Memory is what happens during a session — the agent captures context, writes notes, stores learnings in real time as it works. Dreaming is what happens after the session ends.

Here is the most important thing to understand: dreaming surfaces patterns that a single agent running a single session cannot see. A customer support agent in session 47 does not know it has made the same classification error 12 times over the past month. Dreaming does. It reads across all those sessions, detects the pattern, and writes a targeted memory update: "when the customer mentions X, do Y."

If you've already built on Claude Managed Agents Memory, dreaming is the layer that makes that memory progressively more accurate over time. Memory is the storage system. Dreaming is the curation engine. You need both for a genuinely self-improving agent.

The practical difference matters most in multiagent environments. When 20 subagents are all working in the same domain, dreaming can aggregate what they collectively learned and publish shared insights to a team-wide memory store — something no individual agent session could produce on its own.

How Outcomes Work: Rubric-Driven Self-Correction

Outcomes is the other major capability that moved to public beta alongside multiagent orchestration on May 6. The concept is straightforward: you write a rubric describing what success looks like, and the agent works toward it.

What makes outcomes different from just writing a better prompt is the grader architecture. A separate Claude instance evaluates the agent's output against your rubric in its own context window — meaning it is not influenced by the agent's reasoning or the trajectory of how the output was produced. If the output fails the rubric, the grader identifies exactly what needs to change and the agent takes another pass. This loop continues until the output meets the bar.

In Anthropic's internal benchmarks, outcomes improved task success rates by up to 10 percentage points over a standard prompting loop, with the largest gains on the hardest tasks. File generation specifically saw +8.4% on .docx outputs and +10.1% on .pptx — which matters enormously for enterprise document workflows.

Outcomes works for subjective quality too. Spiral by Every uses it to enforce their editorial voice: each AI-generated draft is scored against a rubric of their editorial principles and the user's writing style pulled from memory. Only drafts that clear that bar are returned. For a deeper look at how multi-agent Claude Code review systems use similar evaluation architectures, that guide covers the five-agent parallel evaluation pattern.

Hot take: outcomes is the feature that should finally put to rest the idea that you need to iterate on prompts manually to improve agent output quality. Define the rubric once. Let the agent iterate. That's the right division of labor.

Multiagent Orchestration: Up to 20 Specialists in Parallel

Multiagent orchestration is the third major capability now in public beta. The architecture is a coordinator-subagent model: a lead agent decomposes a complex task, delegates pieces to up to 20 specialist subagents running in parallel, and synthesizes their outputs.

Each subagent runs in its own isolated session thread with its own context window and conversation history. They share a common filesystem, which means a security agent and a documentation agent can both read and write to the same codebase without stepping on each other. The coordinator can send follow-up messages to any subagent mid-workflow — and that subagent retains everything from its previous turns, so context is not lost between exchanges.

The full trace is visible in the Claude Console: which agent did what, in what order, and why. That level of observability is what separates a production multiagent system from an experimental one.

The YAML configuration is concise. You declare the coordinator model, set the multiagent.agents property with a list of up to 20 subagent IDs, and the coordinator decides at runtime when to delegate and to whom. If you want to run the patterns yourself, the LangGraph multi-agent swarm cookbook in the gen-ai-experiments repository covers the equivalent orchestration architecture in LangGraph — useful context before migrating to the native Managed Agents API.

One architecture detail worth highlighting: the coordinator can only delegate to one level of subagents. Depth greater than 1 is ignored. This is a deliberate constraint that keeps the system predictable and traceable. If your workflow genuinely requires hierarchical sub-orchestrators, you'll need to design around it.

Real-World Results from Harvey, Netflix, Wisedocs, and Spiral

Anthropic shared four production case studies at the Code with Claude event. These are worth examining closely because they make the abstract capabilities concrete.

Harvey (Legal AI)

Harvey uses Managed Agents to coordinate complex legal work including long-form drafting and document creation. With dreaming enabled, their agents remember filetype workarounds and tool-specific patterns between sessions. Completion rates went up approximately 6x in their internal tests — not from a model change, but purely from the agents carrying institutional knowledge across sessions.

Netflix

Netflix's platform team built an analysis agent that processes logs from hundreds of builds across different sources. Their problem was signal-to-noise: with changes affecting thousands of applications, what matters is the patterns that recur across many builds, not individual failures. Multiagent orchestration lets the agent analyze batches in parallel and surface only the recurring patterns worth acting on.

Wisedocs

Wisedocs built a document quality check agent using outcomes to grade each review against their internal guidelines. Reviews now run 50% faster while remaining aligned with team standards. This is the clearest demonstration that outcomes is not just about accuracy — it is also about throughput. For more on how Claude agents compare to frameworks like LangGraph and CrewAI for this kind of enterprise workflow, that comparison guide covers the decision framework.

Spiral by Every

Spiral uses Haiku as coordinator and Opus as the writing subagents for parallel draft generation. When a user requests multiple drafts, subagents run in parallel. Each draft is then scored by the outcomes grader against a rubric of Every's editorial principles and the user's writing voice — both pulled from memory. Only drafts that clear the rubric are returned to the user.

Who Should Use Claude Dreaming Right Now?

Not every agent workload benefits equally. Here is an honest breakdown:

Dreaming is worth prioritizing if your agent runs the same category of task repeatedly — document review, customer support, code analysis, content generation pipelines. Agents that run once and are done do not accumulate enough session history for dreaming to add much.

Multiagent orchestration is worth it when your tasks genuinely benefit from parallel specialization — security + documentation + test generation running simultaneously, or log analysis across hundreds of sources. For a single-pass task that fits in one context window, a single agent with a well-designed prompt is cheaper and simpler.

Outcomes is worth enabling for any task where quality is subjective or where you have well-defined acceptance criteria. If you can write a rubric — and for most enterprise workflows, you can — you should be using outcomes.

If you are just getting started with the platform, the Claude Managed Agents complete review covers the full pricing breakdown ($0.08/runtime hour + model costs), setup, and which early adopters have shipped production systems.

Honest caveat: dreaming is in research preview with gated access. If you are planning a production deployment around it, build the memory architecture first (which is in public beta), and plan dreaming as the upgrade layer once you have access.

How to Get Access

The access path depends on which capability you want:

Dreaming: Research preview. Request access at claude.com/form/claude-managed-agents. Gated — not immediately available.
Outcomes, multiagent orchestration, memory: Public beta. Available to all developers via the Claude Platform API with the managed-agents-2026-04-01 beta header. No separate access request required.
Webhooks: Public beta. Available alongside outcomes and multiagent orchestration.

The Claude Platform documentation at platform.claude.com/docs/en/managed-agents has the full API reference including the YAML config for multiagent sessions, dreaming schedule configuration, and outcomes rubric format. For a hands-on implementation starting point, the Claude-powered RAG from scratch cookbook in the gen-ai-experiments repository demonstrates the Claude API integration patterns that transfer directly to Managed Agents builds.

Frequently Asked Questions

What exactly does Claude dreaming do to memory?

Dreaming reads across multiple past agent sessions, identifies recurring patterns — repeated mistakes, converging workflows, shared preferences — and writes structured updates to memory stores. It merges duplicate entries, removes outdated context, and restructures memory to stay high-signal as it grows. It does not modify original session transcripts; it only updates the memory layer.

Is Claude dreaming the same as Claude's memory feature?

No. Memory captures what an agent learns during a session, in real time, as it works. Dreaming is a separate scheduled process that runs after sessions end. It reads across sessions, surfaces cross-session patterns, and curates the memory stores that in-session memory creates. You need both: memory is the write layer, dreaming is the curation layer.

How many agents can I run in parallel with multiagent orchestration?

The Claude Platform supports up to 20 unique agent IDs in the multiagent.agents coordinator configuration. The coordinator can call multiple copies of each agent, so the total number of active agent instances can exceed 20 — but the roster of distinct agent types is capped there. Orchestration depth is limited to one level; coordinators cannot spawn sub-orchestrators.

How much does Claude Managed Agents cost?

Managed Agents bills at $0.08 per agent runtime hour on top of standard Claude model usage costs. A 10-hour session costs $0.80 in infrastructure fees plus model tokens consumed. Claude Sonnet 4.6 runs at approximately $3 per million input tokens and $15 per million output tokens. There is no separate cost for dreaming, outcomes, or webhooks beyond the runtime hour billing.

What is the outcomes loop and how is it different from just prompting the agent twice?

Outcomes uses a dedicated grader agent that evaluates output against your rubric in a completely separate context window. Unlike asking the same agent to self-critique (which is influenced by how it produced the output), the grader has no knowledge of the agent's reasoning path. This independence is what drives the 10-point improvement in task success — it is a genuinely different evaluation, not a rephrasing of the same context.

Do I need dreaming to use multiagent orchestration?

No. They are independent features. Multiagent orchestration is in public beta and available now via the standard API header. Dreaming is in separate research preview. You can build and run full multiagent pipelines today without dreaming access.

Which use cases benefit most from Claude dreaming?

Dreaming provides the most value for agents running the same task category repeatedly over many sessions — document review pipelines, customer support bots, code review systems, content generation agents. One-off or low-frequency agents do not accumulate enough session history to benefit significantly. The Harvey legal AI result (6x completion rate improvement) is representative of high-frequency, high-stakes repetitive workflows.

Recommended Blogs

References