From Assist to Execute: The Reference Architecture Implications Microsoft's Playbook Doesn't Draw (2026)
The Assist-to-Execute shift in Microsoft's Agentic Patterns Playbook is the right conceptual move. This is the reference architecture implications the playbook stops short of drawing.
Microsoft’s 2026 Agentic Transformation Patterns Playbook opens with a conceptual shift: AI agents are moving from assisting humans to executing work. The playbook names the operating-model consequences (ownership, risk, lifecycle, governance) clearly. What it stops short of drawing is the reference architecture beneath the shift. The architecture beneath Execute is substantially different from the architecture beneath Assist, and the gap is where most enterprise designs will quietly break.
This is the architectural companion to the practitioner decode of the playbook itself. The decode walks the operating-model framework; this piece walks the reference architecture implications.
The Conceptual Shift in One Paragraph
In Assist mode, the agent supports a human decision. The human remains accountable. The agent’s failures are corrected by the human in the loop. The architecture is straightforward: a single application surface, the user, the agent, and one or two backing systems. Most failures are caught by the human reviewing the output before acting on it.
In Execute mode, the agent performs work across systems. It writes to systems of record, triggers downstream workflows, and operates on behalf of the user without the user reviewing each action. The human shifts to overseeing outcomes rather than producing them. The architecture has to assume that the human will not catch most failures in real time. Everything that was implicit in Assist (the human will notice, the human will correct, the human is accountable) has to become explicit in Execute (the system has to notice, the system has to correct, accountability has to be designed in).
Microsoft names the operating-model consequences (ownership, risk, lifecycle, governance) in the Agentic Transformation Patterns Playbook, where the Assist-to-Execute framing is set out. The architectural consequences are larger, and they are the author’s extension below.
What Changes Architecturally at the Boundary
Crossing from Assist to Execute changes seven things in the reference architecture. None of them are individually difficult. Together they constitute a different system shape, and a design that worked in Assist will not survive contact with Execute without explicit rework.
| Architectural concern | Assist mode design | Execute mode design |
|---|---|---|
| Authority surface | User authority delegated to the agent for the duration of one interaction | Persistent agent identity (Entra Agent ID per agent version) with scoped OBO tokens and least-privilege permissions |
| Deterministic boundary | Implicit; human inspects each output before acting | Explicit; high-stakes calculations go to deterministic tools (Logic Apps, code-interpreted scripts, schema-validated APIs) not to the model |
| Schema contract | Loose; the agent and one or two backing systems can negotiate shape per call | Canonical; a named owner controls the work-object schema across all systems the agent touches |
| Failure detection | Human-in-the-loop; the user notices wrong output and rejects it | Telemetry + continuous evaluation; the system samples production behaviour against quality, safety, and cost baselines |
| State persistence | Session-scoped; conversation context lives in the chat surface | Case-scoped; durable state lives in a system of record (Cosmos DB, Dataverse, or equivalent) with a change feed |
| Rollback semantics | Re-run the prompt; user iterates manually | Switch the active agent-version pointer; downstream systems must tolerate the rollback without state corruption |
| Auditability | Chat transcript + manual export | Per-action audit log keyed to agent-version, user identity, and source system; retention policies aligned to compliance regime |
Each row is a substantive design change. Together they mean that the architecture you used for Pattern 1 (Employee Enablement) does not extend to Pattern 3 or Pattern 4 without explicit rework. The transition is not a scale-up of the same design; it is a different design.
The Authority Surface Shifts From User to Agent
In Assist mode, the user is logged in, the agent runs in the user’s session, and any system call inherits the user’s permissions. This is the standard delegated-permissions pattern. Most M365 Copilot integrations work this way today.
In Execute mode, the agent often acts when the user is not logged in (overnight batch, asynchronous workflows, multi-step processes that span hours or days). The agent needs its own identity. Microsoft’s Entra Agent ID is the surface I would reach for here. My read is that you provision a distinct agent identity per deployed agent version and grant it scoped permissions through standard Azure RBAC. The playbook does not specify the identity mechanics; this is the architecture I would draw under it, and it produces a stable identity per agent-version that rollback and audit can key off.
The architectural implication is that every downstream system the agent writes to has to support a service-principal-style caller, not just a user-on-behalf-of caller. Many legacy SaaS systems still require interactive user sessions or have token lifetimes that do not accommodate long-running asynchronous workflows. These systems become integration friction at the Execute boundary. The AI orchestration over legacy systems reference architecture describes the integration pattern in detail (the Operational Front Door + Logic Apps as MCP servers + APIM enforcement).
A second implication is least-privilege design at the agent-version level. Each deployed agent version gets a dedicated identity. That identity should be granted only the permissions the specific agent version needs, not a permissive role that covers all possible future agents. When the agent’s tool set changes, the new agent version gets a new identity with new permissions; the old version retains its old identity until rolled out of production. This is operationally heavier than a single shared service principal and is the right cost to pay for auditability and rollback safety.
The Deterministic Boundary Must Be Drawn Explicitly
In Assist mode, when the agent does a calculation, the user sees the answer and can verify it. The calculation can live anywhere: in the model, in a tool call, in a code interpreter, in a chained reasoning step. If it is wrong the user catches it.
In Execute mode, the agent performs the calculation and acts on the result. Wrong calculations produce wrong actions. The architectural rule that emerges is: any calculation with material consequence must run on deterministic substrate, not on the model. This means:
- Financial calculations (totals, allocations, tax, fees, interest) go to code, not to the model. The model can orchestrate; the math runs in Python or in a schema-validated API.
- Eligibility and routing decisions with regulatory or contractual implication go to declarative rules engines (Logic Apps with explicit conditions, Pega rules, or equivalent), not to the model’s inference.
- Schema-validated transformations (data mapping, format conversion, structural validation) go to explicit transformation steps, not to the model’s “format this as JSON” reasoning.
The model orchestrates the workflow. The model does not perform the high-stakes operation. Drawing this line explicitly during design is the practitioner discipline the playbook implies but does not name. The companion piece on AI orchestration over legacy systems covers the deterministic-tooling integration patterns in depth.
The Schema Contract Becomes Load-Bearing
In Assist mode, the agent often touches one or two backing systems per interaction. The shape of the data can be negotiated per call. If the system returns slightly different shapes for different cases, the model’s flexibility absorbs the variance.
In Execute mode, the agent orchestrates across many backing systems per workflow (in the workflows I have seen, roughly four to ten). Each system has its own canonical shape for the same business concept (a “customer” in CRM, a “client” in billing, an “account” in support, an “entity” in compliance). Without a canonical shape that the agent operates on, the agent ends up doing reconciliation logic in the model, which is exactly where it should not be.
The architectural pattern that works is a canonical work-object schema (a “case,” an “incident,” a “claim,” an “order”) owned by a named person with cross-team authority. The systems map to and from the canonical shape at the integration boundary. The agent operates on the canonical shape only. When systems change their internal shapes (which happens), the mapping layer changes; the agent does not.
The political precondition for this pattern is that the canonical schema has an owner with authority to require upstream systems to conform to the mapping. In our experience this is the single most underweighted prerequisite for Execute-mode work. Without schema ownership with authority, the agent ends up doing the schema reconciliation in inference, which is brittle, costly, and unauditable.
Failure Detection Moves From Human to System
In Assist mode, the user is the failure detector. Wrong outputs get rejected and the agent learns (or the user iterates). The detection signal is implicit in user behaviour: low task-completion rate, high re-prompt rate, low user satisfaction.
In Execute mode, the user is not in the loop for most actions. Failure detection has to move to the system. Two layers are required:
Continuous evaluation on production traffic. A sampled subset of agent actions is run through the evaluation suite in production, with results trended against the CI/CD baseline. When quality, safety, or cost metrics drift from baseline, alerts fire. The AgentOps reference architecture decode covers the continuous-evaluation surface in detail.
Outcome monitoring keyed to business KPIs. The agent’s actions produce outcomes in downstream systems. Those outcomes are observable: claims processed correctly, support tickets resolved within SLA, financial transactions reconciled. Continuous monitoring of these outcomes catches failures that evaluation alone misses, especially edge cases the eval dataset does not cover.
The architectural change is that every Execute-mode agent must wire telemetry into two surfaces: the AI evaluation surface (token usage, hallucination rate, grounded-response rate) AND the business outcome surface (the downstream system’s metrics). OpenTelemetry traces forwarded to Azure Monitor handle the AI surface; the existing business observability handles the outcome surface; the correlation across the two is a deliberate engineering effort.
State Persistence Becomes Case-Scoped
In Assist mode, conversation context lives in the chat surface. When the session ends, the context is gone (or summarised into a transcript). The agent does not maintain durable state across sessions.
In Execute mode, the agent often operates on a case that persists across days, weeks, or longer (a claim, an incident, a procurement workflow, a customer onboarding). The case has durable state that the agent reads and updates. The state lives in a system of record (Cosmos DB, Dataverse, or equivalent) and is queryable by other agents, by reporting systems, and by the human overseer.
A change feed on the durable state (Cosmos change feed, Dataverse change tracking) is the architectural pattern that lets multiple agents collaborate on the same case without coordination bottlenecks. Agent A updates the case; the change feed notifies Agent B that something it cares about changed; Agent B reacts. This is the architecture pattern that lets multi-agent workflows scale.
The implication is that Execute-mode architectures need an explicit case-state-store choice early in the design. Retrofitting case state after the agent is in production is expensive. Make the choice during the Assist-to-Execute transition.
Rollback Semantics Must Be Designed In
In Assist mode, rollback is trivial. The user re-prompts. The previous output is discarded.
In Execute mode, rollback is a system property. The agent has taken actions in downstream systems. Reverting to a previous agent version does not automatically undo those actions. The architectural pattern that works is to treat agent actions as event-sourced where possible: the agent emits intent (write this record, trigger this workflow, send this notification); a separate workflow layer executes the intent and records the result. When a rollback is needed, the events are replayable or compensable independently of the agent.
For systems where event sourcing is not feasible (legacy systems with imperative write APIs), the rollback pattern is compensating transactions: each agent action is paired with an explicit compensating action, and the agent’s downstream workflow records what compensations would need to fire on rollback. This is operationally heavy and is one reason why integrations with legacy systems are the slowest part of Execute-mode adoption.
The architectural rule: every Execute-mode action that writes to a downstream system needs a defined rollback path. Either the action is event-sourced (replay or skip), or the action is paired with a compensating action (recorded for rollback), or the action is explicitly tagged as non-reversible (and gated with additional approval). Without this discipline, the agent-version rollback that the AgentOps reference architecture describes works for the agent runtime but does not roll back the downstream side effects.
Auditability Becomes Per-Action
In Assist mode, the audit surface is the chat transcript. The user’s interactions with the agent are logged at the conversation level. This satisfies most compliance regimes for low-stakes work.
In Execute mode, the audit surface is per-action. For each action the agent takes, the audit log must record: which agent version executed it, on whose authority (user OBO or agent identity), against which source-of-record systems, with what input, producing what output, with what confidence signals from the evaluation surface. This audit log is keyed to enable both ex-post review (a regulator asks: did this agent action comply with policy on this date?) and ex-ante drilldown (an outcome looks wrong: trace the agent action that produced it).
The architectural pattern is structured action-logs forwarded to a queryable store with retention policies aligned to your compliance regime (often several years for financial and healthcare records, but confirm the exact period against your own counsel, it varies by regime and by record type). Application Insights with extended retention handles most non-regulated cases; for regulated workloads the audit log often needs its own immutable store (Azure Data Lake with WORM retention, or equivalent).
The Architecture Boundary Diagram
The seven changes above add up to a different system shape at the Execute boundary. The reference architecture for an Execute-mode agent maps those seven concerns onto eight named layers (the seven concerns plus the agent runtime itself) that an Assist-mode agent does not need:
This is not theoretical. The AI orchestration over legacy systems reference architecture is one specific instantiation of the Execute-mode pattern. The AgentOps CI/CD reference architecture is the deployment pipeline that ships Execute-mode agents. Together with this architectural read, those three documents are the practitioner reference for the Assist-to-Execute boundary.
The Honest Engineering Cost
The Assist-to-Execute transition is the single largest engineering investment in an enterprise AI program. In our experience the cost breaks down approximately as:
- Authority surface refactor (Entra Agent Identity adoption, OBO patterns, downstream system permission audits): 4-8 engineer-weeks per agent integrated
- Deterministic-tooling extraction (moving high-stakes calculations out of the model and into code or rules): 4-12 engineer-weeks per major workflow
- Canonical schema definition and ownership negotiation (the political work, not the technical work): 6-12 weeks of cross-functional time, often longer
- Case-state-store stand-up and change-feed wiring: 4-8 engineer-weeks
- Continuous evaluation telemetry integration and continuous-monitoring dashboards: 4-8 engineer-weeks
- Per-action audit log design and compliance review: 4-12 engineer-weeks depending on regulatory exposure
Total: 26-60 engineer-weeks for the architectural foundation, before the first Execute-mode agent ships. The playbook treats this work as preconditions to Pattern 3 and Pattern 4 deployment. It is. Treat the transition as a quarter-plus of dedicated engineering investment, not as a side effect of building the next agent.
When to Make the Transition
The transition is right when:
- You have a specific business workflow where human-in-the-loop review is the bottleneck (most claims processing, most IT helpdesk first-line, most expense routing)
- The workflow has a measurable outcome (resolution time, accuracy rate, throughput) that justifies the foundation investment
- The systems the agent will write to have stable APIs and named owners willing to engage on schema discussions
- Your organisation has the political maturity to name a schema owner with cross-team authority
- Your security and compliance functions are integrated with software delivery enough to accept service-principal-style callers and per-action audit logs
The transition is premature when:
- The agent’s value proposition is “save users a few minutes” and that value does not justify, in our experience, 6+ months of architectural foundation work
- The downstream systems require interactive user sessions and refactoring them is out of scope
- Schema ownership cannot be politically assigned (the named owner role is contested or undefined)
- The compliance regime requires controls (full per-action attestation, real-time human review, segregation of duties) that the Execute pattern cannot satisfy
The architectural test is straightforward: can you draw the seven Execute-mode concerns on a whiteboard, map them onto the eight layers, and name the owner for every component? If yes, the transition is feasible. If no, name the gaps and decide whether to address them or to stay in Assist mode for this workflow.
Read Next
- The Six Agentic Adoption Patterns: A Practitioner Decode of Microsoft’s New Playbook (2026). The operating-model framing that names the Assist-to-Execute shift. Read in parallel with this architectural piece.
- AI Orchestration for Legacy Systems: The Operational Front Door Pattern (2026). The reference architecture for Execute-mode agents over enterprise legacy stacks. This piece names the shift; that piece walks the full system design.
- AgentOps on Microsoft Foundry: A Practitioner Decode of the New CI/CD Reference Architecture (2026). The deployment pipeline that ships Execute-mode agents. The per-action audit log and the agent-version rollback patterns live in that piece.
- Don’t Build an AI Center of Excellence Until You Read This (2026). The operating-model layer (Executive Sponsor authority, CoE structure, schema-owner role) that has to wrap this architecture for it to survive contact with the enterprise.
- Source: Microsoft Agentic Transformation Patterns Playbook (PDF, 52 pages). The framework this article extends architecturally.
Stay in the loop
Get new posts delivered to your inbox. No spam, unsubscribe anytime.
Related articles
Don't Build an AI Center of Excellence Until You Read This (2026)
Critical practitioner read of Microsoft's AI CoE framework: the seven assumptions the playbook makes about Executive Sponsors, role authority, and CoE evolution that fail in most real enterprises.
Risk-Tiered Agent Governance: Microsoft's Tier 1/2/3 Model Annotated for Real Deployments (2026)
Microsoft's 2026 playbook gives you a 3-tier risk model for AI agents. This is the practitioner annotation: concrete controls, tooling, cadence, and the Tier 0 the playbook does not name.
The Scale-Breaker Microsoft Doesn't Name: Why Your AI Program Stalls Where the Playbook Doesn't Look (2026)
Microsoft's 2026 Agentic Patterns Playbook names five capability drivers. The scale-breaker most enterprises actually hit is the sixth one the framework doesn't measure.