An AI Governance Framework on Microsoft Azure That Actually Works

An auditor walks into your AI governance review. You hand over a 40-page policy document, a risk register, and a spreadsheet of approved models. They smile politely and ask one question: “Show me where this is enforced at runtime.”

That is the moment checkbox governance dies.

Documents describe intent. They do not stop a developer from calling a raw model endpoint, do not contain an agent that quietly spins up three sub-agents, and do not produce a single line of evidence that any control actually fired. Compliance officers know this. Auditors know this. And the gap between the binder and the running system is exactly where the next breach lives.

Here is the thesis the rest of this article defends.

The single most consequential decision in any AI governance framework on Microsoft Azure is making a mandatory AI gateway the universal control plane. Not a document. Not a committee. A piece of architecture that every model call and every agent action must pass through. Wrap it in policy-as-code paved roads, and compliance stops being a velocity tax and becomes the path of least resistance.

Designing an AI governance framework on Microsoft Azure: the three planes

Microsoft markets a clean three-plane story: identity, data, and model, all interlocking into one mesh. The story is mostly true. It is also incomplete in ways the marketing decks gloss over, and being honest about that is what makes a governance design survive contact with reality.

The three planes map to three products. The gateway sits on top of all three and turns their separate signals into one enforcement and evidence point.

Plane	Product	Native runtime enforcement	Gateway tie-in
Identity	Microsoft Entra	RBAC, Conditional Access, managed and workload identities. The tightest interlock in the stack.	Gateway authenticates every caller as an Entra principal before a model is reachable.
Data	Microsoft Purview	DSPM for AI and audit capture. Detective, not blocking, for custom apps.	Gateway applies PII redaction and emits audit events Purview alone misses on custom code paths.
Model	Azure AI Foundry	Content Safety inline filters and Azure Policy on deployments. Eval gates are DIY.	Gateway enforces model allowlists and token limits no Foundry policy reaches once traffic leaves Azure.

Plane

Identity

Product

Microsoft Entra

Native runtime enforcement

RBAC, Conditional Access, managed and workload identities. The tightest interlock in the stack.

Gateway tie-in

Gateway authenticates every caller as an Entra principal before a model is reachable.

Plane

Data

Product

Microsoft Purview

Native runtime enforcement

DSPM for AI and audit capture. Detective, not blocking, for custom apps.

Gateway tie-in

Gateway applies PII redaction and emits audit events Purview alone misses on custom code paths.

Plane

Model

Product

Azure AI Foundry

Native runtime enforcement

Content Safety inline filters and Azure Policy on deployments. Eval gates are DIY.

Gateway tie-in

Gateway enforces model allowlists and token limits no Foundry policy reaches once traffic leaves Azure.

Read that table closely. The interlock is tight at the Microsoft Entra identity boundary and the Azure Policy deployment layer. It is loose at data-flow interception for custom applications, which is where Microsoft Purview enforcement weakens. That looseness is not a flaw you complain about. It is the design constraint you build around.

The way you build around it is the gateway.

The gateway kills three problems at once

A mandatory AI gateway is one architectural decision that does three jobs no document can.

It kills shadow models. If egress to model endpoints is only permitted through the gateway, an unapproved model is not a policy violation you discover later. It is a network call that does not connect.

It contains agent sprawl. Every agent action routes through a control point that can authenticate, authorize, and log it. The agent cannot self-authorize its way around the chokepoint.

It produces continuous audit evidence. Every call through the gateway is an audit event, so the system emits compliance evidence continuously instead of you manufacturing it at review time.

That is the payoff. One control plane, three of your hardest problems, and a stream of evidence as a byproduct.

Here is the chokepoint as Azure API Management policy, not as an assertion. This fragment forces an Entra-authenticated caller, pins the request to an allowlisted backend, and caps tokens.

<!-- mrd_ai_gateway_inbound_policy -->
<policies>
  <inbound>
    <validate-jwt header-name="Authorization" require-expiration-time="true">
      <openid-config url="https://login.microsoftonline.com/{tenant}/v2.0/.well-known/openid-configuration" />
      <required-claims>
        <claim name="roles" match="any">
          <value>mrd_model_invoke</value>
        </claim>
      </required-claims>
    </validate-jwt>
    <check-header name="x-mrd-model-id" failed-check-httpcode="403"
                  failed-check-error-message="Model not on allowlist" />
    <azure-openai-token-limit tokens-per-minute="20000"
                              counter-key="@(context.Subscription.Id)" />
  </inbound>
</policies>

And the deployment-plane backstop, as an Azure Policy effect that denies any Foundry or Azure OpenAI account exposed on a public endpoint.

{
  "policyRule": {
    "if": {
      "allOf": [
        { "field": "type", "equals": "Microsoft.CognitiveServices/accounts" },
        { "field": "Microsoft.CognitiveServices/accounts/publicNetworkAccess", "equals": "Enabled" }
      ]
    },
    "then": { "effect": "deny" }
  }
}

For the broader pattern of failing violations in the pipeline rather than at a review board, see our walkthrough on Azure Policy as code for AI guardrails.

Map regulation to runtime control, because nobody hands you the crosswalk

Here is a fact that builds more trust than any vendor slide: there is no official Microsoft crosswalk from ISO 42001 clauses or EU AI Act articles to specific Azure controls. Defender for Cloud ships no native regulatory pack for AI. You author the custom initiatives yourself. Treat any claim of a turnkey regulatory pack with suspicion.

So you build the mapping. Below is a defensible starting point. It is not a substitute for your own legal and compliance review, but it is the shape the table should take.

Risk tier	Mandatory runtime controls
High-risk (EU AI Act Annex III)	Azure AI Foundry eval gates (block on fail) + immutable WORM logging + human-in-the-loop + Microsoft Purview lineage retention + dual-LLM pattern for consequential actions
Limited-risk (transparency)	Eval gates (warn) + standard audit logging + Content Safety inline filters + per-function Entra identity
Minimal-risk	Gateway routing + baseline logging + Microsoft Entra RBAC

Risk tier

High-risk (EU AI Act Annex III)

Mandatory runtime controls

Azure AI Foundry eval gates (block on fail) + immutable WORM logging + human-in-the-loop + Microsoft Purview lineage retention + dual-LLM pattern for consequential actions

Risk tier

Limited-risk (transparency)

Mandatory runtime controls

Eval gates (warn) + standard audit logging + Content Safety inline filters + per-function Entra identity

Risk tier

Minimal-risk

Mandatory runtime controls

Gateway routing + baseline logging + Microsoft Entra RBAC

The nuance most articles skip entirely: the provider versus deployer responsibility split changes which controls you actually own. If you fine-tune and serve a model, you carry provider obligations. If you consume someone else’s model in your app, you carry deployer obligations, and the impact assessment burden lands differently.

Resolve this split before you draw a single line of your RACI. Designing controls before you know who is liable for the model’s behavior is how you build a beautiful framework that protects the wrong party.

The four gaps Microsoft’s native tooling does not close

Four gaps the marketing narrative skips. Each one real, each one mitigable, none of them turnkey.

1. Agent identity sprawl (Microsoft Entra)

Entra Agent ID is in preview, and it does not cover third-party frameworks. If your team builds on LangChain or CrewAI, those agents do not get first-class Entra identities, and the agent-to-agent trust chains are not graphed anywhere you can audit.

Mitigate it. Give every agent function a per-function, least-privilege managed identity. Maintain an agent registry as a hard governance artifact, not a wiki page. An agent without a registered identity does not run.

2. Shadow models (Azure AI Foundry)

This one is solvable today, and I want to be blunt about it. Shadow models are not a tooling gap. They are an architecture-discipline failure.

Enforce egress through the gateway and the problem disappears at the network layer. If a model endpoint is only reachable through your control plane, there is no shadow to chase. Stop treating this as a detection problem. It is a routing decision.

3. Prompt injection as a governance failure

Microsoft treats prompt injection as a security concern, and Prompt Shields are a real mitigation. But security framing misses the governance question: who is accountable when an injected instruction causes a breach? That accountability gap is unaddressed in the native stack.

The architectural answer is to keep high-consequence actions outside the LLM’s self-authorization loop. Use dual-LLM or quarantine patterns: the model that reads untrusted content never holds the privilege to act on it. A separate, constrained path authorizes consequential operations. We go deeper on this in containing prompt injection in agentic workflows.

If an injected prompt cannot reach a privileged action without crossing a human or a deterministic gate, you have moved the liability question from “hope the filter caught it” to “the architecture made it impossible.”

4. Cross-tenant and RAG lineage breaks (Microsoft Purview)

Purview lineage degrades the moment data flows through a vector store or crosses a tenant boundary. Your end-to-end lineage graph goes dark exactly where RAG injects retrieved context into a prompt.

Preserve sensitivity labels into the embedding layer. Enforce security trimming at retrieval time so a user never gets context they are not cleared to see. Lineage that stops at the vector store is lineage that fails the audit.

The rollout: three phases, no big bang

You do not flip governance on. You ramp it, or you trigger an organization-wide revolt on day one. Run it in three phases, each tied to specific Azure controls.

1
Discovery (audit-mode policy)

Deploy Azure Policy initiatives and gateway routing in audit mode. Run Microsoft Purview DSPM for AI to surface sensitive data flows. Deny nothing yet. You are building the inventory and finding the shadow models, ungoverned agents, and lineage breaks that already exist. This phase earns you the credibility to enforce.
2
Enforced gates (deny + CI/CD eval gates + PIM)

Flip Azure Policy to deny and tighten Entra Conditional Access on Foundry endpoints. Wire Azure AI Foundry eval gates into CI/CD so non-compliant models fail at PR time, not at deploy time. Put privileged access behind Entra PIM. This is where compliance becomes runtime-enforced instead of aspirational.
3
Continuous evidence (immutable logs + automated workbooks)

Turn on WORM immutable logging and automated compliance workbooks built from Azure Policy state and gateway telemetry. Replace point-in-time approvals with continuous attestation. Now the auditor's runtime question has a live answer.

Pair the phases with a RACI that does not blur ownership. Each row maps to the ISO 42001 clause or EU AI Act article that drives it.

Activity	Architect	MLOps	Compliance	Drives
Risk-tier classification	C	I	A/R	EU AI Act Art. 6, Annex III
Control and policy design	A/R	C	C	ISO 42001 Annex A
Deployment and eval gates	C	A/R	C	EU AI Act Arts. 9-15
Runtime monitoring and drift	I	A/R	C	ISO 42001 Cl. 9.1
Impact assessments	C	I	A/R	ISO 42001 Cl. 6.1 / Act Art. 9
Evidence aggregation	I	C	A/R	ISO 42001 Cl. 9.3

Activity

Risk-tier classification

Architect

MLOps

Compliance

A/R

Drives

EU AI Act Art. 6, Annex III

Activity

Control and policy design

Architect

A/R

MLOps

Compliance

Drives

ISO 42001 Annex A

Activity

Deployment and eval gates

Architect

MLOps

A/R

Compliance

Drives

EU AI Act Arts. 9-15

Activity

Runtime monitoring and drift

Architect

MLOps

A/R

Compliance

Drives

ISO 42001 Cl. 9.1

Activity

Impact assessments

Architect

MLOps

Compliance

A/R

Drives

ISO 42001 Cl. 6.1 / Act Art. 9

Activity

Evidence aggregation

Architect

MLOps

Compliance

A/R

Drives

ISO 42001 Cl. 9.3

When everyone is responsible for governance, no one is. One accountable owner per row, and resolve the provider-versus-deployer split before you commit to it.

Scaling without strangling velocity

The fear in every engineering org is that governance means a review board between a developer and shipping. It does not have to. Four practices keep the guardrails up and the velocity high.

Shift policy-as-code left. A governance violation should fail at PR time, in the developer’s own feedback loop, not in a meeting three weeks later. The earlier the gate, the cheaper the fix and the less it feels like gatekeeping.

Tier the gating. Human review is expensive. Spend it only on high-risk systems. Minimal-risk deployments pass through automated checks and ship. If you put a human in front of every model, your humans become the bottleneck and people route around them.

Federate ownership. A central platform team owns the guardrails. The domain teams own their AI. This is the only model that scales past a handful of use cases without the central team becoming a help desk.

Replace point-in-time approvals with continuous attestation. An approval that was true at deploy time tells you nothing about the system running today. Continuous evidence from the gateway is worth more than a signed form in a folder.

The takeaway

Governance that lives in documents creates false assurance, and the people whose job is to audit it can see straight through the binder. The fix is not more documentation. It is a single architectural decision: make a mandatory AI gateway the universal control plane across Microsoft Entra, Microsoft Purview, and Azure AI Foundry, wrap it in policy-as-code paved roads, and let compliance become the path of least resistance.

Do that, and shadow models stop connecting, agent sprawl hits a chokepoint, and the system emits audit evidence continuously. The auditor’s question stops being a threat and starts being a demo.

An AI Governance Framework on Microsoft Azure That Actually Works

Designing an AI governance framework on Microsoft Azure: the three planes

The gateway kills three problems at once

Map regulation to runtime control, because nobody hands you the crosswalk

The four gaps Microsoft’s native tooling does not close

1. Agent identity sprawl (Microsoft Entra)

2. Shadow models (Azure AI Foundry)

3. Prompt injection as a governance failure

4. Cross-tenant and RAG lineage breaks (Microsoft Purview)

The rollout: three phases, no big bang

Discovery (audit-mode policy)

Enforced gates (deny + CI/CD eval gates + PIM)

Continuous evidence (immutable logs + automated workbooks)

Scaling without strangling velocity

The takeaway

Stay in the loop

Related articles

Beyond Spend Caps: Engineering AI Cost Governance for Azure's LLM Workloads

AI Agent vs Flow: When Not to Build One (2026 Decision Guide)

Your AI Agent Project Is Really a Data Project: The Data-Prep Tax