Skip to content

AI Orchestration for Legacy Systems: The Operational Front Door Pattern (2026)

AI orchestration over 5-7 legacy systems with the systems of record unchanged. The Operational Front Door pattern, two-layer RAG, vendor-portable. CIO reference architecture.

Alex Pechenizkiy 25 min read
AI Orchestration for Legacy Systems: The Operational Front Door Pattern (2026)

The pitch the CIO keeps hearing goes something like this. Replace the five legacy systems with one vendor’s modern stack. Layer Copilot on top. Buy the seats. Watch the productivity numbers.

The pitch is not wrong, exactly. It is just expensive and slow, and it assumes the CIO has 24 to 36 months and a board willing to fund a migration before any AI value lands. Many of them do not. They have legacy systems that work, contracts that do not expire, teams that have built their daily craft around those tools, and a board asking why AI is still a slide rather than a system.

There is a third option, and it is the one stuck CIOs tend to underrate. An AI orchestration layer that sits on top of the existing legacy stack, owns intake and routing, and leaves the systems of record where they are. The legacy systems do not change. The work that moves between them does.

Editorial illustration of the Operational Front Door pattern. A warm wood-paneled service-counter lobby with two visible zones. Left: a customer at the front desk is being helped by a small friendly AI agent (compact cream-and-teal humanoid), with a clean dashboard screen on the counter. Middle: a senior architect in a navy blazer, calm and matter-of-fact, with one hand resting on an open sliding wooden partition, gesturing to show what is behind it. Right (visible through the open partition): four back-office stations, each operated by a small AI agent matching the front-desk one. The stations include an old CRT terminal with file cabinet, a card-index cabinet, and a wall of archive books. Soft amber ribbons of light flow from the front counter to each back station, showing the orchestration routing. The customer is unaware of the back office; the architect's reveal is for the viewer. Nothing was replaced. Everything is preserved.

What Is the Operational Front Door?

I call this pattern The Operational Front Door. It frames AI as the orchestration layer over legacy systems. One AI surface unifies intake from every channel (web form, Teams, email, customer portal, partner portal). One orchestrator owns the canonical work-item schema and the state machine that moves work between teams. One connector layer reaches the legacy systems through standardized contracts. The AI provider plane sits behind the orchestrator and is interchangeable.

A note before the architecture section. The pattern (AI orchestration over legacy systems with the orchestrator owning the schema contract) is vendor-portable. The reference architecture below is its Azure-stack realization: Foundry Agent Service, Copilot Studio, Azure AI Search, Cosmos DB, Logic Apps as MCP servers, Entra ID, Agent 365. AWS and GCP renderings exist and would substitute Bedrock Agents + Step Functions + OpenSearch + EventBridge, or Vertex AI Agent Builder + Workflows + Vector Search + Pub/Sub. The architecture is the discipline; the specific Microsoft components named throughout this article are one valid implementation. If your primary AI cloud is not Azure, the layer responsibilities still apply; the component map changes.

The pattern preserves three things the CIO does not want to lose: the legacy investment, the institutional process knowledge, and the option to swap AI providers as the market shifts. It adds two things the CIO has been told they need: a unified AI front door for users, and an orchestration plane that scales with the work volume rather than the team count.

It is not the only pattern. The vendor-locked alternatives (ServiceNow Otto, Salesforce Agentforce Operations, Pega Agent Experience) ship the same idea bundled with a system of record. Where your data already lives in one of those stacks, the bundle is the right choice. Where your data lives across 5-7 systems with no single dominant vendor, the bundle becomes a rip-and-replace migration first. That is the gap this pattern fills.

A Scale Precedent, Before AI Orchestration Was Viable

Between 2018 and 2020 I worked on the case-management backbone of a global multi-region IT support operation. Multilingual customer base, multiple intake channels (enterprise customer-submitted, third-party-vendor-submitted, internal first-party escalations), regional staffing across multiple time zones with local-business-hours coverage. The operation worked.

AI orchestration was not part of it. The architecture I am about to describe is what the same problem deserves now, with current tools. The 2018-2020 engagement supplies scale credibility, not AI receipts.

All of this ran on humans, templates, and process discipline. No AI. What the operation did by hand is exactly what AI orchestration is for:

  • Classified cases by severity, product, and customer tier
  • Deduplicated against an open backlog deep enough that no one could hold it in working memory
  • Routed by team ownership, recent skill, and which time zone could pick up the next shift
  • Drafted customer communications from templates, then edited
  • Summarized case state across legacy systems for shift-to-shift handoff
  • Coordinated cross-org work through email, partner portals, internal CRM, and a code/work tracker, with the case manager translating between them

Every one of those steps is now an obvious AI orchestration candidate. The reference architecture in the rest of this article is the answer to “what would the same operation look like if you built it new in 2026 with Foundry Agent Service, Azure AI Search agentic retrieval, Cosmos DB change feed, Logic Apps as MCP servers, and a swappable model plane?”

The 2026 Reference Architecture for AI Over Legacy Systems

The architecture has six horizontal layers and two cross-cutting concerns. Each layer has a single responsibility. The orchestrator owns the canonical work-item schema. Everything else either feeds into the orchestrator, is queried by it, or executes on its behalf.

Architecture diagram of The Operational Front Door. Top-to-bottom layered stack: Channels (web form, Teams app, email intake, customer portal, partner/ISV portal), AI Front Door (Copilot Studio), Orchestrator (Foundry Agent Service, owning the canonical work-item schema), Two-Layer RAG side-by-side (Layer 1 Azure AI Search agentic retrieval, slow/curated/weekly index refresh + Layer 2 Cosmos DB change-feed materialized view, fast/derived/sub-minute freshness), Connector Layer (Logic Apps as MCP, custom MCP servers, API Center), Legacy Systems of Record (Internal CRM, ITSM, Work tracker, Customer portal, Partner portal, Knowledge base, Telemetry, all marked unchanged). Identity + Governance band on the left (Entra ID OAuth OBO at every hop, APIM rate-limit/auth/observability, Agent 365 registry and control plane). AI Provider Plane on the right with dashed orange border indicating swappable (Foundry Model Router, Claude, OpenAI, open-weights, with the annotation: swap equals deploy config). Vertical flow arrows between every main-column layer; dashed orange arrow from the AI Provider Plane to the Orchestrator; dashed gray annotations from the identity band to each layer.

Layer 1: Channels

Work enters through whichever surface the user prefers. A web submission form for partners. A Microsoft Teams app for internal teams. Email intake for legacy customers. Customer-facing and partner/ISV portals for organizations that have already invested in those surfaces. The Operational Front Door does not require migrating channels; it requires connecting them.

Layer 2: AI Front Door (Copilot Studio)

The front door is the slot in the architecture that hands a structured work item to the orchestrator. Multiple tools can fill it: the Teams native bot framework, a custom web UI, M365 Copilot extensibility, or Copilot Studio. Each is the right answer in some shops. In M365-heavy shops Copilot Studio is a strong default because it ships M365 SSO, Teams-app integration, citizen-customizable topics, and connector access without custom UI work, and the citizen-developer customization layer matters more than front-end flexibility for most non-power-users. Where M365 is not the user surface, the same slot belongs to whichever tool gives you citizen-customization plus the identity story.

The front door does four things: intake (capture the user’s input across modalities), classification (severity, product, tier, language), deduplication (is this the same as an existing open case), and first-touch dialog (ask the clarifying questions that turn an ambiguous request into a routable work item).

It does not make routing decisions. That is the orchestrator’s job. The front door produces a structured work item conformant to the canonical schema and hands it off.

Layer 3: Orchestrator (Foundry Agent Service)

Microsoft Foundry Agent Service is the orchestration backbone. (For the Foundry vs Azure OpenAI decision specifically, see Azure AI Foundry vs Azure OpenAI: The 2026 Decision.) It hosts the agents that own the canonical work-item schema, the state machine that tracks work-item progression, and the routing logic that decides which team, geography, and engineer the work goes to next. Each agent has a dedicated Microsoft Entra identity for scoped resource access, and OAuth On-Behalf-Of passthrough is supported when downstream tools need the calling user’s identity rather than the agent’s.

The orchestrator does not own the data. The legacy systems own the data. The orchestrator owns the schema contract against which the legacy systems are addressed. This separation is what makes the rest of the architecture vendor-portable. The canonical-schema discipline is the same idea explored from the data side in Dataverse MCP, Business Skills, and Coding Agents for stacks where Dataverse is the system of record.

Multi-agent composition in Foundry Agent Service lets a main agent delegate to specialized sub-agents (triage, routing, summarization, customer-comms drafting). Foundry Agent Service itself is GA; the multi-agent workflows API that replaces the deprecated classic Connected Agents primitive is preview as of 2026, so net-new builds should compose at the GA agent + A2A-tool layer where the design permits and treat the workflows orchestration surface as a preview dependency.

Layer 4: Two-Layer RAG

The orchestrator queries two RAG layers with different freshness budgets and different update mechanics. The next section breaks them down.

Layer 5: Connector Layer (MCP Gateway)

Legacy systems are reached through Model Context Protocol servers. Standard Logic Apps now expose workflows as remote MCP servers with access to over 1,400 connectors (Dataverse, SQL, SharePoint, SAP, ServiceNow, and most of the enterprise integration surface). Easy Auth must be configured to enable the default OAuth 2.0 endpoint; the MCP endpoint speaks Streamable HTTP and Server-Sent Events.

Maturity note: Logic Apps as MCP servers is in public preview as of 2026. Microsoft’s documented limits today (one connector per MCP server, one action per tool, built-in service provider-based and custom connectors not yet supported) constrain the initial connector matrix. Expect API shifts; pin Logic App versions and budget a migration sprint per quarter.

Custom MCP servers cover what the Logic Apps connector library does not. Azure Functions hosting and OpenAPI 3.0 tool definitions are the canonical paths.

API Center registers and governs the MCP server inventory. This is where compliance reviewers see what tools exist, who can call them, and what they do. API Center’s MCP integration is also in preview as of 2026.

The whole point of this layer is that the orchestrator does not know it is talking to ITSM, CRM, or a partner portal. It is talking to a tool that conforms to the schema contract. Swap the underlying legacy system, rewire the connector, the orchestrator code does not change.

Layer 6: Legacy Systems of Record (Unchanged)

The 5-7 legacy systems stay where they are. Typical inventory:

  • Internal CRM / case management (the work-item system of record)
  • ITSM and ticketing for incident escalation
  • Code or work tracker (ADO/Jira-class) for engineering coordination
  • Customer-facing submission portal (read-back surface)
  • Partner / ISV portal (read-back surface)
  • Knowledge base of historical fixes and workarounds
  • Telemetry or test-result feed

The architecture does not require any of them to change. It requires them to expose stable enough APIs that a connector can survive the next two years.

If a legacy system is on its way out, plan the connector lifetime accordingly. If a legacy system is stable, the connector is a long-lived asset.

Side: AI Provider Plane (Interchangeable)

To the right of the orchestrator sits the AI provider plane. Foundry Model Router routes prompts across 18+ models including GPT, Claude, DeepSeek, Llama, and Grok. The agent code calls the standard Chat Completions API; the router decides which model handles each request based on the configured mode (Balanced, Quality, Cost) and the eligible-model subset.

Claude is the right pick for tough policy interpretation and ambiguous-severity cases where the reasoning needs to be defensible (just understand the marketplace billing trap before you commit Foundry credits to it). OpenAI’s function-calling discipline shines on structured-output paths where the orchestrator needs clean JSON for a system-of-record update. Open-weights models (Llama, DeepSeek) absorb the high-volume classification work where token cost dominates. The router decides; the orchestrator is agnostic to model choice.

Model swap significantly reduces (it does not eliminate) the work involved in changing providers, provided the prompt-portability and tool-call discipline from the Six Rules article has been followed at build time. Even with that discipline, family-level swaps (GPT to Claude to Llama) tend to surface coupling at the prompt and tool-definition layer that no architecture eliminates fully. The honest claim is “the swap drill is a sprint, not a rewrite,” not “config change only.” This is the test of vendor independence and the reason the orchestrator owns the schema.

Cross-Cutting: Identity + Governance

Down the left side of the diagram runs the identity and governance band. Microsoft Entra ID propagates the calling user’s identity through every tool invocation via OAuth OBO. APIM sits between the orchestrator and external endpoints for rate limiting, authentication enforcement, and observability. Agent 365 is Microsoft’s new enterprise control plane for agents: registry, access control, visualization, interoperability, and security. Foundry-built and Copilot-Studio-built agents auto-register; registry sync for non-Microsoft frameworks (Bedrock, Vertex AI) is in preview as of 2026. The complement to the registered-agent control plane is the discovery side of governance: see Shadow AI Governance for Microsoft Enterprises for the unregistered-agent problem the registry does not solve on its own.

This band is not glamorous and tends to get cut in slides. It is the load-bearing piece. If identity does not propagate cleanly through every hop, the orchestrator either runs as a single super-user (compliance disaster) or fragments into per-system service accounts that nobody can audit. Neither survives an enterprise security review. For the broader networking + identity foundation this sits on, see The 2026 Azure AI Landing Zone Reference Architecture.

The Two-Layer RAG, in Detail

Two-layer RAG zoom. Top: orchestrator query 'who should triage this and what's their current load?' Left column: Layer 1 RAG on Azure AI Search agentic retrieval, slow and curated, weekly index refresh, returns content/references/activity. Right column: Layer 2 RAG on Cosmos DB change-feed materialized view, fast and derived, sub-minute freshness. Bottom: join at query time produces the triage recommendation.

The single most overlooked design decision in enterprise RAG is freshness budget. Different knowledge has different update cadences. Bundling everything into one index optimizes for nothing and breaks both extremes: the slow knowledge becomes stale, or the fast knowledge does not make it in.

This pattern splits RAG into two layers. Layer 1 covers what changes weekly or slower. Layer 2 covers what changes per minute or faster. They use different stores, different update mechanics, and different query mechanics. They are joined at orchestrator query time, not at index time.

Layer 1: Org / Skills / Process (Slow, Curated)

Layer 1 knows the things that change weekly or slower. Which team owns which product. Which engineers have which skills. What the escalation policy says. What SLA tier each customer segment has. The content of resolved cases that have been promoted into the knowledge base.

The right substrate is Azure AI Search agentic retrieval. Extractive retrieval and the core knowledge-source types are GA in the 2026-04-01 REST API. The query-planning, answer-synthesis, and configurable-reasoning-effort surfaces this architecture leans on for the audit story are preview-only in 2025-11-01-preview and have already gone through two rounds of breaking changes in eight months. Plan for an API migration during a 6-9 month build.

The pipeline takes the orchestrator’s question, uses an LLM as a query planner to decompose it into focused subqueries, runs the subqueries in parallel across the indexed corpus, applies semantic reranking, and returns a three-part response: content (the grounding passages), references (the source documents and chunks with citable URIs), and activity (the query plan, subqueries, ranking scores, and token-cost trace).

The three-part response is what makes Layer 1 audit-defensible in principle. Compliance can trace every claim back to its source and see which queries the system ran. This is the Microsoft surface heading toward the “regulated workload” audit story, with the caveat that the audit-relevant surfaces are still maturing. For the governance scaffolding that wraps around the audit story, see AI Governance Framework for Microsoft Enterprises.

Update mechanic: scheduled reindex of canonical corpora (knowledge base, org chart export, skills database, SLA policy documents). Cadence: weekly or on-publish. Freshness budget: days.

Layer 2: Live Case State (Fast, Derived)

Layer 2 knows the things that change per minute. Open case states across every legacy system. Recent state transitions in the last 24-72 hours. Similar-case resolutions in the last 14 days. Current queue depth per team per geography. Active assignments per engineer.

The right substrate is a Cosmos DB change-feed materialized view. Each legacy system emits state changes into a Cosmos container (via Logic App webhooks, Service Bus, or direct CDC). The change feed processor consumes those events at-least-once and rebuilds a derived view optimized for the orchestrator’s read patterns: by team, by geography, by similar-case fingerprint.

The query mechanic is deterministic, not LLM-mediated. A point lookup and a range filter against the materialized view return the live state in milliseconds. No retrieval reasoning effort, no token cost, no probabilistic anything in the read path.

Update mechanic: change-feed processor with at-least-once delivery. Freshness budget: sub-minute.

The Join at Query Time

When the orchestrator asks “who should triage this case,” it asks Layer 1 the planning question (“which team owns this product, what does the escalation policy say, what skills does the work require”) and Layer 2 the state question (“which engineers on that team have current availability, who recently resolved a similar case”). The orchestrator combines the two before deciding.

Layer 1 alone tells you who should do the work in principle. Layer 2 alone tells you who can do the work right now. The triage answer needs both.

Joining at query time, rather than at index time, is the load-bearing choice. It prevents fast-changing operational state from forcing re-index cycles across slower-moving organizational knowledge. The freshness budgets stay isolated. The governance domains stay separable. Per-layer SLAs stay realistic. Bundling them into one index optimizes for nothing and breaks both extremes when the volume scales.

A note on terminology. The phrase “two-layer RAG” appears in other architecture writing with a different sense, typically partitioning along access-control or enterprise-wrapper concerns rather than the freshness axis used here. Both partitionings can coexist in the same system; the enterprise-wrapper concerns (auth, guardrails, evaluation harness, monitoring) apply equally to both layers in this pattern.

This split is one applied realization of the broader Enterprise Context Architecture framework, where informational context lives in Layer 1, operational context lives in Layer 2, authorization context propagates through Entra OBO, and environmental context lives in the orchestrator configuration.

Vendor Independence, in Practice

Three serious vendors ship the locked-stack version of this pattern as of 2026. All three are credible. All three work very well when your case data already lives on their platform. All three combine orchestration, system of record, and AI provider plane in one stack. The lock-in is across all three layers simultaneously.

VendorProductWhat gets bundledRight pick when
ServiceNowOtto (Knowledge 2026)Now Assist + Moveworks under one experience. AI agent studio, orchestrator, control tower.ServiceNow is your system of record and you have no migration appetite. Cross-system orchestration centered on ServiceNow data.
SalesforceAgentforce OperationsMulti-Agent Orchestration with shared context across channels. Specialized agents in Slack, Teams, IT service desk.Your case data already lives in Salesforce. Back-office process automation as the primary use case.
PegaAgent ExperienceAny Pega workflow becomes an agentic engine. Main agent orchestrates specialized Pega flows.Your business processes are already modeled in Pega. Workflow-centric orchestration over Pega-owned state.

The strongest counter from each vendor’s senior architect is worth steel-manning. A ServiceNow architect would argue the Operational Front Door reinvents workflow governance externally: orchestration without process ownership becomes integration spaghetti, and enterprise AI requires governed workflows, approvals, SLAs, and audit state that Otto provides natively. A Salesforce architect would argue Data Cloud plus Agentforce reduces the need for an external orchestration layer because the unified operational data already lives in the platform. A Pega architect would argue the orchestration problem was solved a decade ago via case management; AI is merely another decisioning input, so why build a parallel orchestration plane that has to be defended at every architecture review?

The pattern’s answer in each case is the same. When no single vendor owns more than half the case data, none of those platforms is the system of record for the work. Their bundled orchestration is excellent inside their stack and awkward outside it. The orchestrator (which you own) sits at the center precisely because no single vendor’s gravity dominates the 5-7 systems.

That vendor independence is real, but it only earns its keep for organizations mature enough to operationalize evaluation, routing, and governance across providers. Without that operating model, the swappable plane sits unused and the bundle’s speed-to-value wins by default.

The honest trade between bundled and unbundled: speed and fewer FTEs on one side; orchestrator ownership compounding over years on the other. The bundle ships in 90 days because the vendor made the schema decisions for you. The unbundled pattern buys 18 months of build pain to escape lock-in that may not bind for 5+ years. Pick the trade you can defend at the board.

The Operational Front Door pattern unbundles those three. The orchestrator (which you own) is at the center. Legacy systems of record stay where they are. AI providers swap behind the orchestrator’s schema contract. The architectural difference is small in any single diagram and large over the project’s lifetime.

The test of vendor independence is the swap drill. The day you replace your primary model in production is the day you find out whether the architecture is actually portable, or whether the model has crept into the schema, the prompts, the tool definitions, and the evaluation harness. The Six Rules for LLM-Agnostic AI Agents lay out the discipline that keeps the swap clean: abstract behind a gateway, route per task complexity rather than per agent, keep prompts portable, tier your traffic, pin model versions in production, run a quarterly swap drill. This article is the case-application of those rules across 5-7 systems.

Where Microsoft, Copilot Studio, and Foundry Actually Fit

The CIO question this answers: do I buy Copilot Studio, or do I build on Foundry? The honest answer is both, plus custom code for the schema contracts. The decision matrix:

Component
End-user front door
Microsoft surface
Copilot Studio
Why it fits there
M365 SSO, Teams app, citizen-customizable topics. The non-power-user majority talks to AI through Copilot Studio, not through a custom UI.
Component
Orchestrator backbone
Microsoft surface
Foundry Agent Service
Why it fits there
Per-agent Entra identity, OAuth OBO, agentic retrieval, evaluators, MCP gateway. The audit and telemetry surface lives here.
Component
Cross-provider routing
Microsoft surface
Foundry Model Router
Why it fits there
Cheap models for triage, frontier for policy interpretation. Model swap as deployment config rather than code change.
Component
Layer 1 RAG (slow)
Microsoft surface
Azure AI Search agentic retrieval
Why it fits there
Three-part response (content + references + activity) gives compliance the audit story when paired with the preview query-planning surface.
Component
Layer 2 RAG (fast)
Microsoft surface
Cosmos DB change feed materialized view
Why it fits there
Sub-minute freshness for live case state. At-least-once delivery via change feed processor.
Component
Connectors to legacy
Microsoft surface
Logic Apps as MCP servers + API Center
Why it fits there
1,400+ connectors. Easy Auth required. Streamable HTTP + SSE. API Center governs the inventory.
Component
Identity propagation
Microsoft surface
Entra ID + APIM
Why it fits there
Load-bearing across every tool call. The thing that breaks first when shortcuts get taken.
Component
Enterprise governance
Microsoft surface
Agent 365
Why it fits there
Registry, access control, visualization, security across every agent, regardless of where it was built.
Component
Tough policy reasoning
Microsoft surface
Claude (via Foundry or direct)
Why it fits there
Ambiguous severity, cross-system reconciliation, defensible reasoning chains.
Component
Structured output paths
Microsoft surface
OpenAI (via Foundry or direct)
Why it fits there
Function-calling discipline for system-of-record updates where schema adherence matters.
Component
Schema contracts + state machine
Microsoft surface
Custom code
Why it fits there
The canonical work-item schema, the orchestrator state machine, the identity-propagation glue. The things you do not license to a vendor.

The Mobilezone case study published by Microsoft is a public demonstration of the front-door portion of this pattern at production. Two agents, “Supporto” for internal IT support and “Mia” for customer service, both built on Copilot Studio with an MCP server providing curated data access. The team’s banked lesson is the one this architecture takes seriously: keep workflows simple before introducing AI. The full Operational Front Door pattern is the same shape at larger scale with the orchestration plane made explicit.

What This Does Not Do

Ten honest limits worth naming before the budget conversation:

  1. It does not replace your ITSM tool. ITSM stays the system of record for incidents. The orchestrator routes through it, not over it. If the goal is ITSM consolidation, that is a different project with different math.

  2. It does not fix bad case-management process. AI on a broken process makes broken faster. The Mobilezone team published this lesson in their case study: keep workflows simple before introducing AI. If the severity scale in one office does not match the severity scale in another, AI orchestration amplifies the inconsistency in plausible-sounding outputs.

  3. It does not survive identity sprawl. If your offices have separate identity providers and no Entra (or single-IdP) discipline, the orchestrator’s tool-call identity propagation breaks at the first geography boundary. Fix identity before fixing orchestration.

  4. It does not ship in 90 days. Six to nine months is the realistic minimum for the connector layer, the two-layer RAG, the schema standardization, and the governance build, plus a planned migration sprint per quarter to absorb preview-API churn (Logic Apps MCP, API Center MCP, agentic retrieval query-planning surface) through 2027. The CIO who is told otherwise is buying a demo, not a system.

  5. It does not survive without a schema owner. Someone has to own the canonical work-item schema and have the political capital to enforce it across teams. Without that role, every connector becomes its own decision, the schema fragments into per-team variants, and the orchestration plane decays.

  6. It does not eliminate model hallucination. Two-layer RAG reduces grounding errors and the Layer 2 deterministic-lookup path keeps live state out of the LLM read path. Neither eliminates the model failure modes that come with delegated reasoning. High-severity case classification still routes through a human reviewer; the architecture is what makes that review tractable, not what replaces it.

  7. It does not replace vendor-stack solutions where they fit. If your case data already lives in ServiceNow, Salesforce, or Pega and you have no migration appetite, their bundled orchestration is the right answer. The Operational Front Door pattern is for stacks with no single dominant vendor, not for stacks where one already owns the system of record.

  8. It does not auto-resolve data quality and semantic mismatch. The canonical work-item schema assumes you can map the 5-7 legacy systems to it cleanly. If the same customer is keyed three different ways across CRM, ITSM, and the partner portal, the orchestrator will inherit the ambiguity and route the same case to three different teams. Master-data hygiene precedes orchestration; if it is not in place, budget for an MDM workstream as a prerequisite, not as a follow-on.

  9. It does not give you free observability. Distributed agent traces across the orchestrator, MCP connectors, two-layer RAG, and the AI provider plane do not assemble themselves into a debuggable picture. The default Foundry observability surface handles the agent side, but correlating across the legacy-system call paths and the preview-API surfaces (which may not emit the trace headers you expect) is its own discipline. Plan observability as load-bearing as the schema discipline, not as a Day-90 add.

  10. It does not solve the organizational and people problems that orbit the architecture. Three real ones: (a) governance fragmentation when the orchestrator’s authority crosses team boundaries that have never shared a canonical schema; (b) talent scarcity, because architects who understand both legacy integration and modern agent frameworks are expensive and rare, and the build needs both; (c) ongoing connector maintenance when legacy vendors ship competing AI features and their APIs drift to accommodate. Budget for the people side as deliberately as you budget for the code side.

Cost Shape (Illustrative)

Order-of-magnitude anchors for the steering-committee conversation. Numbers are illustrative; your scale, channel mix, and case volume will move them.

  • Orchestration plane build (one-time): $1.5–3M across 6–9 months. Roughly 60% on connectors, 25% on the canonical schema + state machine + identity-propagation glue, 15% on evaluator harness and governance plumbing.
  • Per-non-trivial-legacy-system connector: $150–400K depending on API surface stability, auth model, and SLA requirements. Trivial connectors (well-documented REST, OAuth-ready) can land at half that; bespoke legacy systems with no API contract can run double.
  • Two-layer RAG infrastructure: $200–500K across the data preparation, index build, change-feed processor, and evaluation harness, depending on corpus size and update cadence.
  • Run-rate model + retrieval costs: typical case-routing decision burns 2–4K tokens triage + 4–8K tokens retrieval + 2K tokens summarization. At 10K cases/month, expect $3–8K/month on model spend if routing is well-tiered. Cosmos DB RU/s for the change-feed materialized view runs $1–3K/month at this scale.
  • FTE shape (steady state): 1 schema owner / tech lead + 2–3 connector engineers + 1 evaluation lead + 1 SRE (often shared with the platform team). Five to six FTEs total for the orchestration plane plus the existing teams operating each legacy system.

Compare against vendor-locked alternatives at per-seat pricing across the same population. As a heuristic: the build math wins when you have 5–7 unbundled systems and more than 18 months of legacy preservation in front of you. It loses when one vendor already owns the majority of your case data.

When This Breaks at Month 6

The architecture above ships. The team is six months in. A senior architect who has run this pattern at scale will tell you the failures are predictable. Four are common enough to plan for explicitly.

Cascading retries on a sleepy connector. A legacy ITSM endpoint slows from 200ms to 4s under load. The orchestrator’s tool-call timeout fires before the response lands. The retry policy fires a duplicate call. The agent sees two pending tool invocations, classifies the second as an unrelated case, and creates a phantom ticket. By the time someone notices, the queue carries several hundred phantom tickets and the on-call engineer is hunting through trace logs that don’t quite line up.

Stale identity propagation across a token refresh. Entra OBO refreshes a user’s token mid-conversation. The orchestrator caches the prior token in its agent context. The next downstream tool call goes out with the stale token. APIM rejects with 401. The agent retries with the same stale token. The user sees a “case routed to nobody” status and opens a complaint that takes longer to triage than the original case.

Semantic mismatch caught by the materialized view. Layer 2 sees a case update from the CRM with a customer ID that does not match the customer ID the partner portal opened the case under. The change-feed processor handles both as separate records. The orchestrator routes the same case twice. Master-data hygiene was not a Day 1 priority. It is now, retroactively, with backfill.

Provider outage during a routed task. Foundry Model Router falls back from frontier to fast tier when the primary model rate-limits. The fast model classifies the case differently. The orchestrator routes to the wrong team. The user gets escalation feedback at a different SLA tier. The audit trail shows the routing change but no one was watching for it because the routing decision is normally invisible.

None of these are theoretical. All are the kind of operational scar tissue that does not show up in the architecture diagram. Plan an incident-response discipline for each before the system carries production load, and rehearse the response in a non-emergency window so the on-call engineer is not learning the trace tooling at 2am.

CIO Checklist

Seven gating questions to ask before committing budget. If you cannot answer “yes” to at least five of seven, the orchestration project will most likely stall in year two.

  1. Are your 5-7 systems stable enough that the connector layer will outlast 18 months of work?
  2. Is your identity story unified (Entra ID, or a single IdP) across the user populations who will interact with the orchestrator?
  3. Do you have a named person who owns the canonical work-item schema and is empowered to enforce it?
  4. Do you have an evaluation discipline for model outputs that goes beyond “store the logs,” including groundedness scoring, response-completeness checks, and a regression harness?
  5. Do you have legal sign-off for the cross-system data joins the orchestrator will perform?
  6. Is the legacy-system replacement timeline deferred long enough (typically 24-36 months) to amortize the connector investment?
  7. Do you have the political capital to standardize handoff vocabulary and severity definitions across geographies?

The hardest scale lesson, generalizable from any multi-region operation: process standardization is the load-bearing piece, not the AI. AI orchestration sits on top of that discipline. Where the process is consistent, AI amplifies efficiency. Where the severity scale in one region does not match the severity scale in another, AI amplifies the inconsistency. The architecture is the easier half. The standardization is the harder half.

The 24-36 Month Window for AI Over Legacy Systems

Based on the current vendor announcement cadence (Otto, Agentforce Operations, and Agent Experience all landed within twelve months of each other), the next 24 to 36 months look like the window. After that, the locked-stack vendors will most likely sell you a polished version of this pattern with all the schema decisions made for you and all the lock-in costs that implies.

There is a deeper consequence worth naming. The orchestration layer becomes the operational control plane for enterprise work. Once it owns routing, identity propagation, retrieval, and workflow initiation across the legacy stack, it changes ownership politics, funding models, and vendor leverage in ways that outlast any single model rotation. Most enterprise AI programs fail because they treat the model as the architecture. In production, the architecture is identity, orchestration, contracts, and operational ownership. The model is the swappable surface, not the foundation.

Build the schema you own. Build the connectors you can keep. Build the AI plane you can swap. Run the model-swap drill once a quarter so the portability is a measured fact, not an aspiration. Those four are the things that make the architecture defensible at year five, when the third major model rotation lands and the second wave of consolidation pitches arrives.

Stay in the loop

Get new posts delivered to your inbox. No spam, unsubscribe anytime.

Related articles