Agentic Patterns

The best agentic system is the simplest one that meets the requirement

An agentic pattern is a reusable control-flow shape for orchestrating LLM calls plus tools. They span a spectrum. At one end, a workflow routes the model through predefined code paths — the developer decides the steps. At the other, an agent lets the model decide its own steps and tool calls at runtime. Workflows are cheaper, faster, testable; agents handle open-ended problems but cost more and compound errors.

At bottom every agent is a loop over an augmented LLM — a model with tools, memory, and retrieval. The patterns here nest and compose: an orchestrator's worker may run a ReAct loop whose answer is then refined by an evaluator-optimizer pass. The interviewer's signal — and the engineering one — is whether you match a pattern to a problem rather than reaching for the most autonomous design by reflex. Every step of added autonomy buys flexibility at the cost of latency, token spend, and debuggability.

The through-line

Pick the least autonomy that solves the problem. If a single well-prompted call (optionally with retrieval) suffices, use that. If the steps are predictable, a workflow is cheaper and testable. Reserve true agentic loops for genuinely open-ended paths — and be able to say out loud why each loop, hand-off, or extra model call earns its keep.

“The most successful implementations use simple, composable patterns, not complex frameworks.” — Anthropic [1]

same LLM, reused a different model / agent tool · runtime · memory · code human

In the diagrams below, color shows model identity, not role — so you can see at a glance whether a pattern reuses one model or coordinates several.

Part IThe agent loop

Reasoning, acting, and refining

The foundational patterns: how a model reasons, grounds that reasoning in real tool output, and improves its own work. Everything downstream nests these.

1.1ReAct

Interleave Thought, Action, and Observation in a loop until the goal is met.

ReAct [2] is the canonical agent loop: the model reasons a step (Thought), takes an Action (a tool call), reads the Observation (the result), and repeats. Its value is that reasoning is grounded in real tool feedback, not the model's priors. The failure mode to volunteer: with no stopping discipline it loops or oscillates, and the growing trace bloats the context window. ReAct's "world model" is just whatever fits in context, with no validation of observations and no long-term learning [4] — which is exactly why a raw loop is brittle and needs hardening.

ReAct — one model loops, grounded in tool feedback; the danger is an unbounded loop.

Thought: I need ACME's latest revenue figure.
Action: search("ACME 2025 annual revenue")
Observation: $4.2B (FY2025 report)
Thought: that answers the question.
→ Final Answer: $4.2B

Use whenThe path is open-ended and each step depends on the result of the last.

CostLatency and tokens per step; without a budget it loops, oscillates, or acts on bad observations.

1.2Tool Use

The agent emits a structured call the runtime executes, returning the result to context.

Tool Use [3][7] is the bridge between a text generator and the real world: the model emits a structured call — JSON matching a declared schema — which the runtime executes, feeding the result back into context. It is called foundational because every higher pattern's reliability rests on tool design: tight schemas, typed parameters, clear errors. Toolformer [7] showed models can even learn when to call a tool. The classic trap is a vague, overloaded tool the model misuses — no amount of prompt-engineering the loop fixes a bad tool boundary. In GoF terms it is a Proxy/Adapter over an external capability.

Tool Use — a typed, schema-validated call; reliability rides on the tool boundary.

{
  "name": "get_weather",
  "arguments": {"city": "Warsaw", "units": "metric"}
}

Use whenThe agent must act on or read from any system outside the model — always, for real work.

CostReliability is only as good as the schema; a vague or overloaded tool gets misused.

1.3Reflection

Generation–Critique–Refinement: the model evaluates and revises its own output.

Reflection [3][5] is a draft → critique-against-criteria → revise cycle; Self-Refine [5] formalised the iterative self-feedback loop. It pays off when there is a clear correctness signal the critique can latch onto — code that must pass tests, a proof, a constraint-checked translation — because evaluating is a different, often easier task than one-shot authoring. The cost: each round is at least one extra call, so cap iterations (commonly 1–3) and exit early when the critique reports no issues.

Reflection — same model critiques itself; best with a clear correctness signal.

draft = generate(task)
for _ in range(3):                # hard iteration cap
    notes = critique(draft, rubric)
    if notes.ok: break            # early-exit on pass
    draft = revise(draft, notes)

Use whenThere is a checkable signal — tests, a rubric, a constraint — that a critic can score against.

CostEvery round is an extra call; without a cap it spends freely for diminishing gains.

1.4Plan-and-Execute

A Planner decomposes the goal into ordered steps before an Executor acts.

A Planner decomposes the goal into ordered sub-steps up front; an Executor carries them out, each step often a tool call or small ReAct loop [3]. The difference from plain ReAct is commitment timing: ReAct decides one step at a time reacting to each observation, while Plan-and-Execute commits to a structure first. The advantage is that long-horizon, multi-system tasks surface hidden complexity early. The trade-offs: an upfront planning call adds latency, and a rigid plan goes stale — mature systems add re-planning when a step fails.

Plan-and-Execute — planner and executor (often one model, sometimes a cheaper second); re-plan when reality diverges.

plan = planner(goal)                  # ordered sub-steps, up front
for step in plan:
    result = executor(step)           # often a small ReAct loop
    if result.failed:
        plan = replan(goal, done)     # a rigid plan goes stale

Use whenLong-horizon, multi-system tasks where surfacing structure early beats reacting step by step.

CostAn upfront planning call; plans drift unless you re-plan on failure.

1.5Agentic RAG / Adaptive Retrieval

Retrieval as a runtime decision: decide whether to fetch, grade what came back, and re-query on weak evidence.

Static RAG fetches once and stuffs the context — correct when a fixed lookup answers the question, and not an agent. Agentic RAG is the loop that decides whether to retrieve, which source, evaluates what returned, and re-queries on weak or missing evidence (Self-RAG, Corrective RAG / CRAG, Adaptive RAG). That reasoning loop is a genuine agent loop — and it is the exact case a simplicity-biased autonomy gate misreads as "just preprocessing." It is the dominant 2026 production RAG pattern, so name it plainly: if answer quality depends on evidence the model must gather iteratively, this is an agent, not a single call.

docs = retrieve(query)                     # first pass
while weak(grade(docs, query)) and budget.left():
    query = refine(query, docs)            # re-query on weak evidence
    docs = retrieve(query)                 # Self-RAG / CRAG / Adaptive RAG
answer = generate(query, docs)

Use whenAnswer quality depends on evidence gathered iteratively — not a single fixed fetch.

Cost≈ 3–10× tokens and 2–5× latency vs. static RAG — route easy queries to a one-shot fast path, reserve the loop for hard ones.

PrerequisiteGet advanced static RAG sound first (hybrid dense + BM25, a reranker, an eval harness) — an agent looping over a weak retriever just pays more to be wrong.

Also in this family

Chain-of-Thought [6]

Elicit step-by-step internal reasoning before answering — no external action.

In practiceCoT [6] is the reasoning substrate ReAct acts on: pure thinking, no tools. Reach for it when a single call just needs to reason more carefully, not act.

answer = llm(question + "\nLet's think step by step.")
# reasoning in the open; ReAct adds the Action/Observation step

Part IIWorkflow shapes

Developer-steered control flow

When the steps are predictable, you don't need an agent — you need a workflow: composable shapes where developer code, not the model, decides the path. Cheaper, faster, testable.

2.1Prompt Chaining

Decompose a task into a fixed sequence of steps, each feeding the next.

Prompt chaining [1] breaks a task into an ordered series of LLM calls where each step's output is the next step's input — outline, then draft, then polish. Because the structure is fixed, you can drop a programmatic gate between steps (a check that fails fast). Use it when a task cleanly factors into stable sub-steps and you want each one simpler and more reliable than one giant prompt. The trade-off is latency: the calls are serial by construction.

Prompt chaining — one model, different prompts down a fixed pipeline; an optional gate fails fast between steps.

outline = llm(brief)                  # step 1
if not gate(outline): return reject  # programmatic check
draft = llm(outline)                  # step 2 feeds on step 1
final = llm(draft, "polish for tone")  # step 3

Use whenA task factors into stable, ordered sub-steps and you want each call simple and checkable.

CostSerial latency; a fixed chain can't adapt to inputs that don't fit the shape.

2.2Routing

A classifier labels an input and dispatches it to a specialised follow-up or model.

Routing [1] puts a lightweight classifier (often a cheap LLM) at the front that labels the input and dispatches it to a specialised handler or model. Use it for distinct input categories — support triage, or sending easy queries to a small model and escalating hard ones for cost control. The characteristic failure is that a routing error propagates: a misclassification means the rest of the pipeline confidently solves the wrong problem, invisibly. Mitigate with a default/uncertain route, confidence thresholds, and logged route decisions.

Routing — specialise by category; a misroute fails silently downstream.

label = classify(query)               # cheap model, up front
handler = ROUTES.get(label, default_route)   # always have a default
return handler(query)               # misroute = confident wrong answer

Use whenInputs fall into distinct categories, or you want to send easy cases to a cheaper model.

CostRouting errors propagate invisibly; needs a default route and confidence thresholds.

2.3Parallelization

Run independent LLM calls concurrently and aggregate — by sectioning or by voting.

Parallelization [1] runs independent calls at once and aggregates. Two shapes: sectioning splits a task into independent subtasks run in parallel; voting runs the same task several times and aggregates — majority vote, or "flag if any run flags it," which is useful for guardrails. It is the static counterpart to orchestrator-workers: parallelization fans out to a fixed, known set, while an orchestrator decides the fan-out dynamically. Use it when subtasks are independent and latency matters, or when independent looks raise confidence. Cost is linear in calls, and voting only helps if errors are uncorrelated across runs.

Parallelization — the same model fanned out; sectioning splits work, voting raises confidence.

# sectioning: independent subtasks, concurrently
parts = await gather(*(llm(s) for s in sections))
answer = aggregate(parts)
# voting: run the same task N times, then majority / "flag if any flags"

Use whenSubtasks are independent and latency matters, or repeated looks raise confidence.

CostLinear in calls; voting only helps when errors are uncorrelated across runs.

2.4Evaluator-Optimizer

One model generates while a separate evaluator scores against criteria, in a loop.

Evaluator-Optimizer [1] separates roles: one call generates, a distinct evaluator scores the output against explicit, often external criteria, looping until the bar is met. It is close to Reflection but the distinction is who judges and against what — Reflection is typically the same model critiquing itself; here the critic is separated from the generator's framing. The separation helps when you have clear criteria and want an unbiased judge — literary translation against a rubric, high-stakes reasoning. Both buy quality through iteration and get expensive; guard with a hard iteration cap and early-exit on "pass."

Evaluator-Optimizer — a separate critic against explicit criteria; cap the rounds.

draft = generator(task)
while True:
    score = evaluator(draft, criteria)   # distinct critic, external rubric
    if score.passes: break
    draft = generator(task, score.feedback)

Use whenYou have explicit criteria and want the critic free of the generator's framing.

CostExpensive over many rounds; needs a hard cap and early-exit on a pass.

Part IIIMulti-agent topologies

When one agent isn't enough — who holds control?

More agents multiply cost, latency, and communication failure surface, so a single good agent often wins. When you genuinely need several, the design question is who holds control.

3.1Orchestrator-Workers

A lead LLM dynamically decomposes a task, delegates to workers, and synthesises.

Orchestrator-Workers [1]: a lead model decomposes a task dynamically at runtime and spins up workers for sub-tasks it discovers — a coding agent finding which files to edit, for instance — then synthesises their results. The contrast with a Supervisor is whether sub-tasks are known in advance: here decomposition is unpredictable. Use it when you can't enumerate the sub-tasks up front. The shared risk: the lead is a coordination bottleneck and single point of failure.

Orchestrator-Workers — fan-out decided at runtime; the lead is the bottleneck.

subtasks = orchestrator.decompose(goal)   # decided at runtime
results  = [worker(t) for t in subtasks]  # spun up per discovered task
return orchestrator.synthesize(results)

Use whenDecomposition is unpredictable — you can't enumerate the sub-tasks before running.

CostCoordination overhead; the orchestrator is a single point of failure.

3.2Supervisor (Hierarchical)

A supervisor routes work to named specialists and collects their answers.

A Supervisor is a fixed topology: a router over named specialists — coder, tester, reviewer — that holds control, delegates outward, and collects results. Use it when roles are stable and you want easy observability and a central view. Versus Orchestrator-Workers, the difference is that the roster is known in advance rather than discovered. Versus a Swarm, a central agent always sees the whole task, which makes global guardrails and audit far easier. The cost: the supervisor is the same bottleneck and single point of failure.

Supervisor — a fixed roster, centrally observable; the lead remains a bottleneck.

SPECIALISTS = {"code": coder, "test": tester, "review": reviewer}
while not done:
    who = supervisor.route(state)     # fixed roster, central view
    state = SPECIALISTS[who](state)    # control returns to supervisor

Use whenRoles are stable and you want observability, a central view, and easy guardrails.

CostThe supervisor is a bottleneck and single point of failure.

3.3Handoff (Swarm)

Peer agents transfer control by calling a hand-off tool — no central orchestrator.

In a Handoff/Swarm [16], peers transfer control via a hand-off tool (Triage → Billing → Refunds); no one orchestrates. The hand-off is just a tool call that swaps the active agent and its instructions. Decentralised control wins when the flow is a chain of specialists that each fully own their segment and a global view matters less than low coordination overhead. The trade-off is sharp: no single agent ever sees the whole task, which makes end-to-end reasoning and global guardrails harder.

Handoff/Swarm — low coordination overhead, but no global view.

agent = triage
while agent:
    reply, handoff = agent.run(conversation)
    agent = handoff   # a tool call swaps the active agent + its instructions

Use whenThe flow is a chain of specialists that each own a segment; global view matters less.

CostNo agent sees the whole task, so end-to-end reasoning and guardrails are harder.

Also a topology

Multi-Agent Mesh [13]

Specialised agents communicate over an event backbone using agent-to-agent protocols.

In practiceAgents publish and subscribe over an event bus (Kafka) via A2A protocols [13] — best for AI-native rebuilds (pricing, fraud). Fully decentralised and scalable, but the communication surface is the failure surface.

bus.publish("price.updated", event)   # agents react over Kafka / A2A

Part IVSystem-theoretic patterns

The subsystem lens — Dao et al.

To move past convenience-based lists, Dao et al. [4] deconstruct an agent into five subsystems — Reasoning & World Model, Perception & Grounding, Action Execution, Learning & Adaptation, Inter-Agent Communication — and derive patterns that each fix a specific subsystem failure. Plain ReAct implements these implicitly and monolithically, which is exactly why it's brittle. The engineering value is a diagnostic vocabulary: name which subsystem a failure lives in, and which pattern fixes it.

Foundational — perception & memory

Integrator

Validate incoming information in Perception & Grounding before it enters reasoning.

FixesCognitive data quality — a stale or wrong observation acted on as fact. The first thing to add when hardening a raw ReAct loop.

value = fetch(); assert fresh(value)   # validate before reasoning consumes it

Retriever

A simplified, context-aware interface to memory — read the relevant slice, not everything.

FixesInefficient context retrieval and "lost in the middle" degradation. The read side of memory.

ctx = store.search(query, k=5)   # relevant slice, not the whole history

Recorder

Capture and externalise reasoning/world-model state so it can be restored.

FixesState saving & restoring — survives the context window so a long run can be resumed. The write side of memory.

store.save(agent.state)   # survives the context window; restore later

Context Compaction

Summarize or prune the running trace so a long loop doesn't degrade ("lost in the middle") or overflow.

FixesManaging what stays in the window — distinct from Retriever/Recorder, which persist state out of it. Related: give a sub-agent only the slice it needs, not the whole parent trace.

if tokens(trace) > threshold:
    trace = summarize(trace)   # compact the loop; don't append forever

Cognitive & decisional — the planning stack

Selector

Prioritise and adapt goals — a Mediator over competing objectives.

FixesTactical goal selection: decide which objective to pursue now. GoF: Mediator.

goal = select(active_goals)   # prioritise which objective to pursue now

Deliberator

Select the optimal concrete action at each step.

FixesDynamic action adaptation: the action-level layer below Selector (which goal) and Planner (which route).

action = choose(candidates, state)   # best concrete next move

Execution & interaction

Executor

Reliably execute dispatched actions and collect feedback.

FixesExecution reliability and error recovery — the disciplined counterpart to a raw tool call. (Tool Use is the shared mechanism; see 1.2.)

result = run(action); recover(result.errors)   # reliable dispatch + feedback

Coordinator

Manage structured inter-agent communication.

FixesCommunication breakdowns — message contracts, who-talks-to-whom, shared-state rules. The antidote to a multi-agent system that "forgets" context or deadlocks.

msg = Contract(to="billing", payload=...)   # structured who-talks-to-whom

Adaptive & learning

Reflector

Analyse outcomes to infer causality and adjust strategy.

FixesCausal learning/adaptation — unlike Reflection (which revises one output), the Reflector learns across whole trajectories so the agent stops repeating mistakes.

lesson = analyse(trajectory)   # infer causality, adjust future strategy

Controller

Continuously monitor behaviour for alignment — an Observer.

FixesValue alignment & transparency. An always-on runtime guardrail, not a one-time eval. GoF: Observer. Central to governance (see Part VI).

if violates(policy, action): halt()   # always-on Observer over behaviour

The rest of the catalogue

Planner — Decompose a chosen goal into ordered strategic steps. The strategy layer between Selector (which goal) and Deliberator (which action); in practice often the same Planner as Plan-and-Execute.
Skill Build — Discover and refine reusable, executable skills from experience and bank them in a growing library (Voyager [8]). The path from retrieval to genuine experience — reserve for long-lived agents.

Part VIGovernance & human oversight

The line between a demo and something you'd let touch money

Production agents need layered defence: a human checkpoint on the narrow set of irreversible actions, an always-on policy monitor, and observability that turns a 15-step failure from an unreadable stack into a diagnosable trace.

6.1Human-in-the-Loop

A control point for human approval before irreversible or high-risk actions.

HITL [10] inserts a checkpoint before irreversible/high-risk actions — payments, account changes, deploys. Magentic-UI [10] is instructive: it defines six mechanisms — co-planning, co-tasking, multi-tasking, action guards, answer verification, and long-term memory — so human involvement is low-cost and targeted rather than a blanket gate. Scope HITL tightly to genuinely dangerous steps (or reviewers rubber-stamp), summarise clearly at the checkpoint, and give a "reject + feedback" path the agent can act on, not just approve/deny. The cost is latency and human time. Pair it with a Controller (always-on policy monitor) and full tracing for governance.

HITL — action guards gate only the dangerous tail; the routine path stays fast.

action = agent.propose()
if action.risk == "high":                  # action guard
    if not human.approve(summary(action)):  # clear summary at the checkpoint
        return agent.revise(human.feedback)  # reject + feedback, not just deny
execute(action)

Use whenA few actions are irreversible or high-stakes — refunds, deploys, account changes.

CostLatency and human time; scope it tightly or reviewers rubber-stamp.

Pairs with

Observability

Trace every Thought / Action / Observation, tokens, cost, and tool latency.

In practiceMakes Controller and HITL auditable and turns a long failure into a diagnosable trace. Pair with structured exception handling [9] and hard loop/budget caps so a misbehaving agent fails safe.

trace(step, thought, action, tokens, latency)   # every step, auditable

Input/Output Guardrails

A lightweight guard screens inputs (jailbreak, PII, off-scope) and outputs (safety, schema, groundedness) around the agent.

In practiceCheap, always-on, and orthogonal to the others: the Controller watches behaviour, HITL gates irreversible actions, Guardrails filter what crosses the boundary. Reach for it whenever input is user-facing or untrusted.

x = guard_in(user_input)      # block jailbreak / PII / off-scope
y = guard_out(agent(x))       # check safety / schema / groundedness

References

Sources behind the patterns; [n] markers throughout point here. Compiled from “Q&A: Agentic Patterns” (A. Krysztopa), verified June 2026.

[1]Anthropic (2024) — Building Effective AI Agents. Workflows vs. agents; routing, parallelization, orchestrator-workers, evaluator-optimizer. anthropic.com/research/building-effective-agents
[2]Yao et al. (2022) — ReAct: Synergizing Reasoning and Acting in Language Models. arXiv:2210.03629.
[3]Ng, A. (2024) — Agentic Design Patterns: Reflection, Tool Use, Planning, Multi-Agent Collaboration. DeepLearning.AI, “The Batch”.
[4]Dao, M.-D. et al. (2026) — Agentic Design Patterns: A System-Theoretic Framework. 5 subsystems, 12 ADPs. NeurIPS 2025 LAW workshop. arXiv:2601.19752.
[5]Madaan et al. (2023) — Self-Refine: Iterative Refinement with Self-Feedback. arXiv:2303.17651.
[6]Wei et al. (2022) — Chain-of-Thought Prompting Elicits Reasoning in LLMs. arXiv:2201.11903.
[7]Schick et al. (2023) — Toolformer: LMs Can Teach Themselves to Use Tools. arXiv:2302.04761.
[8]Wang et al. (2023) — Voyager: Open-Ended Embodied Agent — automatic curriculum + skill library. arXiv:2305.16291.
[9]Zhou et al. (2025) — SHIELDA: Structured Handling of Exceptions in LLM-Driven Agentic Workflows; 36 exception types / 12 artifacts. arXiv:2508.07935.
[10]Mozannar et al. (2025) — Magentic-UI: Towards Human-in-the-loop Agentic Systems; six mechanisms incl. action guards. arXiv:2507.22358.
[11]Luo et al. (2026) — From Storage to Experience: Survey on the Evolution of LLM Agent Memory. arXiv:2605.06716.
[12]Kandasamy, S. (2025) — Control Plane as a Tool: A Scalable Design Pattern for Agentic AI Systems. arXiv:2505.06817.
[13]Sharma, D. (Happiest Minds) — Software with Agentic AI: Sidecar, Cognitive Middleware, Multi-Agent Mesh.
[14]Gamma et al. (1995) — Design Patterns: Elements of Reusable OO Software (GoF). Mediator, Proxy/Adapter, Observer mappings.
[15]Russell & Norvig (2010) — AIMA: the perceive–reason–act–learn cognitive cycle.
[16]OpenAI (2024) — Swarm: reference implementation of agent hand-offs and routines.