Multi-Agent AI Systems: Why Memory Matters in Production

In partnership with

Keep up with tech in 5 minutes

TLDR is the free daily email with summaries of the most interesting stories in startups, tech, and programming. The stuff worth knowing, minus the doomscrolling.

Issues are curated by ex-Google and Anthropic engineers and land in your inbox before your morning coffee. A 5-minute read, and you walk into the day already knowing what your team is still catching up on.

Tech is just the start. We also cover AI, marketing, dev, and more. Pick the briefs that match your work.

Free, daily, and read by 7M+ subscribers. Subscribe and let the experts do the digging for the tech news that matters.

Subscribe for Free

❝

Welcome to Grind Engineer, your guide to becoming a better software engineer! No fluff. Pure engineering insights.

❝

Added Job Opening in the end of the article!

Claude Code does not use one agent. It uses multiple. One reads your codebase. One plans the changes. One writes the code. One runs the tests. One reviews the diff. Each agent is specialized, and an orchestrator decides who works on what and when.

This is not unique to Anthropic. Salesforce Agentforce, Intercom Fin, and Zendesk AI all run multi agent systems in production. The pattern is the same everywhere: one agent cannot do everything well, so you split the work across specialists and coordinate them.

❝

TL;DR: Single agents hit a wall on complex tasks. Multi agent systems solve this by splitting work across specialized agents coordinated by an orchestrator. This article covers the three main patterns (supervisor, router, swarm), how agent handoff works, why shared memory is the hardest problem, and a real world example of a coding agent team.

Why One Agent Hits a Wall

A single agent works fine for simple tasks. "Summarize this document." "Search the web for X." "Write a function that does Y." One perceive act loop. One set of tools. Done.

Complex tasks break this model in three ways.

Context overload. A single agent doing research, coding, testing, and review needs all the tools and all the context in one prompt. The prompt gets enormous. The model starts losing track of instructions buried in the middle.

Conflicting objectives. The agent writing code wants to move fast. The agent reviewing code wants to find bugs. These goals conflict. One agent trying to do both produces mediocre results at each.

No specialization. A generalist prompt cannot match a specialist one. A code review prompt tuned with specific review criteria will catch more bugs than a general purpose agent asked to "also review the code when you are done."

Problem	Single Agent	Multi Agent
Context overload	One massive prompt with everything	Each agent gets focused context
Conflicting goals	Compromises between objectives	Each agent optimizes for one goal
Specialization	Generic instructions for all tasks	Tuned prompts per agent role
Debugging	One opaque reasoning chain	Trace each agent separately
Failure recovery	Entire task fails	Only the failing agent retries

❝

💡 Key Insight: Multi agent systems are not about making agents smarter. They are about making each agent's job simpler. A specialist with a narrow focus beats a generalist with a broad one, every time.

The Three Orchestration Patterns

Microsoft documented these patterns in their Azure Architecture Center. They show up in every production multi agent system.

[DIAGRAM 2: Three Orchestration Patterns]

Pattern 1: Supervisor. One coordinator agent receives the task, breaks it into subtasks, routes each subtask to a specialist agent, and aggregates the results. The supervisor controls the flow. Specialists never talk to each other directly.

This is the most common pattern. It works because the supervisor has a clear, simple job: decide what needs doing, who should do it, and when it is done. Adding a new specialist means registering a new worker without modifying the orchestrator.

Pattern 2: Router. The router classifies the incoming request and sends it to the right specialist. Unlike the supervisor, the router does not plan or aggregate. It just decides who handles the request, and that specialist handles it end to end.

Customer support systems use this heavily. "Is this a billing question, a technical issue, or a feature request?" Route to the billing agent, tech support agent, or product agent. Each specialist has its own tools and context.

Pattern 3: Parallel Swarm. The coordinator splits a large task into independent subtasks, dispatches them to multiple agents simultaneously, and merges the outputs. Instead of processing 50 documents sequentially, fan out to 50 agents and merge in a fraction of the time.

Pattern	Control Flow	Agents Talk To	Best For
Supervisor	Centralized	Only the supervisor	Complex tasks needing coordination
Router	Single handoff	Only the user	Request classification and routing
Parallel Swarm	Fan out, merge	Only the coordinator	Batch processing, parallel work

Most production systems are supervisor pattern. It gives you predictable control flow, centralized observability, and clean separation of concerns.

How Agent Handoff Works (and How It Breaks)

Agent handoff is when one agent passes a task to another. The handoff must transfer three things: the task description, the context accumulated so far, and any constraints on what the receiving agent should do.

The most common failure mode? Infinite handoff loops. Agent A passes to Agent B. Agent B disagrees with A's approach and passes back. Agent A adjusts and passes forward again. This continues until your token budget runs out or an alert fires.

The fixes are simple but easy to forget:

Cap handoffs per run. Set a hard limit (12 is a reasonable ceiling). When the cap trips, return the best result so far with a confidence flag.
Make handoffs one directional when possible. Agent A passes to Agent B. Agent B finishes or escalates to a human. Never back to A.
Add a third agent as an arbiter. When two agents disagree, a referee agent makes the final call.

class Orchestrator:
    def __init__(self, agents: dict, max_handoffs: int = 12):
        self.agents = agents
        self.max_handoffs = max_handoffs

    def run(self, task: str) -> str:
        current_agent = "planner"
        context = {"task": task, "history": []}
        handoffs = 0

        while handoffs < self.max_handoffs:
            agent = self.agents[current_agent]
            result = agent.execute(context)

            if result.status == "done":
                return result.output

            if result.status == "handoff":
                context["history"].append({
                    "agent": current_agent,
                    "output": result.output,
                })
                current_agent = result.next_agent
                handoffs += 1

        return f"Max handoffs reached. Best result: {result.output}"

Shared Memory: The Hardest Problem in Multi Agent Systems

Most multi agent systems fail not because agents cannot communicate, but because they cannot remember. Each agent operates on a different version of reality.

Agent A discovers that the user's API uses OAuth 2.0. Agent B, running in parallel, assumes API key auth because it never received that context. Agent B writes code with the wrong authentication. The orchestrator merges both outputs. The result is broken.

Shared memory is the fix. All agents read from and write to a common state store.

Memory Approach	How it works	When to use
Python dict in memory	Shared dictionary passed between agents	Single process, fast tasks under 30 seconds
Redis	Fast key value store, agents read/write state	Short lived sessions, caching intermediate results
Postgres	Full relational database with history	Production systems needing audit trails
Vector store	Semantic search over shared knowledge	When agents need to find relevant past context

Postgres is the boring default and the right one for almost every team. You want history. You want queryable state. You want durability. Redis is fine for short lived caches but breaks as a primary store because you lose state on restarts.

The architecture looks like this: the orchestrator writes the task and context to shared state. Each agent reads what it needs, does its work, and writes results back. The orchestrator reads all results and decides the next step.

Real World Example: The Coding Agent Team

Here is how a multi agent coding system works in practice. Five agents, one orchestrator.

Agent	Role	Tools	Reads from shared state	Writes to shared state
Planner	Breaks the task into steps	None (reasoning only)	Task description	Step by step plan
Coder	Writes or modifies code	File read, file write	Plan, codebase context	Code changes
Tester	Runs tests, reports results	Test runner, terminal	Code changes	Test results, failures
Reviewer	Reviews code for quality and bugs	File read, diff tools	Code changes, test results	Review comments, approval
Orchestrator	Coordinates agents, handles failures	All of the above (fallback)	Everything	Final decision

The flow: Orchestrator receives "add pagination to the API." Planner breaks it into steps. Coder writes the code. Tester runs the tests. If tests fail, Orchestrator routes back to Coder with the failure details. Coder fixes. Tester reruns. When tests pass, Reviewer checks quality. If Reviewer finds issues, back to Coder. When Reviewer approves, Orchestrator marks the task complete.

Every agent reads from and writes to shared state. No agent talks to another directly. The Orchestrator is always in the middle.

What This Means For Engineers

1. Start with one agent. Only split into multiple when you hit a clear wall: the prompt is too long, the objectives conflict, or specialization would measurably improve quality. Multi agent adds coordination overhead you do not want to pay unless you need to.

2. Use the supervisor pattern by default. It is the most debuggable, the most observable, and the easiest to extend. You can always graduate to a swarm when throughput demands it.

3. Design shared memory first, agents second. The most common failure in multi agent systems is agents operating on stale or conflicting context. If you nail the shared state architecture, the rest follows.

❝

Job Openings

Software Engineer, New Grad @Stripe: Apply Here
Software Engineer, Payments and Risk @Stripe: Apply Here
Software Engineer, Data & AI @Stripe: Apply Here
Software Engineer (1+ YOE) @Stripe: Apply Here
Software Engineer 2, iOS @Uber: Apply Here

Follow me on Youtube · LinkedIn · X · Instagram to stay updated.

See you in the next one!
Scortier, Signing Off!

Multi Agent Systems: How AI Agents Work Together in Production

Keep up with tech in 5 minutes

Why One Agent Hits a Wall

The Three Orchestration Patterns

How Agent Handoff Works (and How It Breaks)

Shared Memory: The Hardest Problem in Multi Agent Systems

Real World Example: The Coding Agent Team

What This Means For Engineers

Reply

Keep Reading

Subscribe to Grind Engineer

Multi Agent Systems: How AI Agents Work Together in Production

Keep up with tech in 5 minutes

Why One Agent Hits a Wall

The Three Orchestration Patterns

How Agent Handoff Works (and How It Breaks)

Shared Memory: The Hardest Problem in Multi Agent Systems

Real World Example: The Coding Agent Team

What This Means For Engineers

Subscribe to keep reading

Reply

Keep Reading

Subscribe to Grind Engineer