Can You Trust AI With Dataverse Security? Four Designs, Three Wrong
Two violated documented Dataverse mechanics. One was a category error. The real architectural axis isn't AI vs human, it's what claims are cheap to mechanically verify.
The AI had everything you would want for security work plugged in. Microsoft Learn MCP for authoritative documentation at request time. Agents and skills wired into the development workflow. Full codebase context including the existing schema, the existing flows, the existing components. Recent training data covering Dataverse in 2026.
It produced four candidate designs for one Dataverse security requirement.
Two violated documented Dataverse mechanics. The MCP would have surfaced the correct mechanics in a single query. The mechanics never came up because the orchestration did not force retrieval before composition. A third design did not violate any documented mechanic. It violated a security-engineering principle. That is a different failure class, one the MCP could not have caught even if queried.
That distinction is the article. The deeper question is not “can you trust AI with security code.” It is: which claims are cheap to mechanically verify, and which are not? The architecture you build around AI follows from that line, not from a generic trust statement.
The AI Stack You Would Want for Security Work
Imagine the most sophisticated AI development setup you could plug in for a Dataverse security task.
- Microsoft Learn MCP for authoritative documentation at request time. The AI can query the canonical page on Field Security Profiles, on cascade behavior, on access teams, and pull verbatim mechanics into its reasoning.
- Codebase context loaded. Existing tables. Existing relationships. Existing security roles. Existing flows. Existing PCF components. The AI can see what is already in place before proposing a change.
- Specialized agents and skills. A reviewer agent that grades output against quality gates. A
publish-checkskill that gates commits. Content rules that flag forbidden patterns. - Recent training data. Dataverse 2026. The current security model, the current relationship types, the current Field Security Profile mechanics.
That stack should be enough. In theory the AI can do mechanic-anchored design without missing the mechanics.
In practice, it did not. What follows is a real conversation in which the AI generated four candidate designs for one Dataverse security requirement. Two failed on documented mechanics. One failed on a security-engineering category. Only the fourth was correct, and only after a senior engineer pushed back four times to extract it.
The interesting part is not the failure. The interesting part is that all the tools that should have prevented the mechanic failures were available, and the orchestration did not force their use at the moment that would have mattered. That is a different story than “the LLM was wrong.” It is a story about workflow design.
What This Evidence Does and Does Not Support
Before walking through the designs, an honest scoping. The argument generalizes from one real conversation: one Dataverse security requirement, one AI model (Claude), four candidate designs, four rounds of human pushback. That is N=1.
- What the transcript directly supports. In this specific conversation, the AI generated two designs that violated documented Dataverse mechanics the wired-in Microsoft Learn MCP would have surfaced if queried, and one design that used a structurally illusory security boundary that no canonical docs page covers. The mechanic violations are verifiable against the Microsoft Learn pages cited below.
- What is prior, not finding. The cross-vendor claim (the same shape of failure on GPT, Gemini, and open-weight models) is something I have watched repeatedly in production work but have not run a controlled experiment on. Treat that as an architect’s prior. (mine)
- Where the trust-tier framework comes from. The conversation directly supports the design-with-mechanic-verifier recommendation for this category of work. The boilerplate-OK and never-AI-as-sole-author tiers are shaped by repeated similar experience and are an opinion. (mine)
- The domain this article scopes to. Configuration-shaped platform security: Dataverse RBAC, Salesforce sharing rules, ServiceNow ACLs, similar high-mechanic schema-side security. Other shapes of security work (translating a spec into IAM policy, writing CSP headers, generating Kubernetes NetworkPolicies) have different failure modes and are not addressed here.
- What documentation alone does not solve. Even for mechanic-level errors, MCP queries help only when the mechanic is documented and the documentation covers your case. They do not catch undocumented edge cases, environment-specific behavior, plugin side effects, ALM drift, or tenant configuration variance. The article assumes docs are necessary; it does not assume they are sufficient.
- What would strengthen the evidence. The same requirement run against three or more frontier models, with mechanic-checking by a domain-competent reviewer, recording where each model failed and on which mechanic. I have not done that experiment. If a reader has, I would welcome the data.
With those bounds in place, the rest of the article reads as: what the transcript caught, what the failure classes are, what verification mechanisms exist, and what I would build.
The Test Case
A perfectly ordinary Dataverse security requirement.
There is a parent record. There is a collection of child rows attached to it via a one-to-many relationship. A user, call them the participant, has access to the parent. The participant should also see the chain of child rows, including who created each row, when, and what status the row is in.
There is one piece of sensitive content on the child rows: free-text comments. The participant must NOT see those comments. Other roles (the original creators of those child rows, plus administrators) should see them.
The current production state was over-restricted. A previous fix had cut off the participant’s access to the entire child collection because there was no way to express “see metadata, not text” cleanly. The product owner asked: undo the over-restriction, but keep the comments hidden.
That requirement has exactly two correct shapes in Dataverse, and three plausible-looking shapes that fail. The next sections walk through each wrong shape, but they are deliberately split into two failure classes because the verification work to catch them is different.
Two Failure Classes, Not Three Wrong Designs
The three failed designs are not the same kind of failure. Conflating them is comfortable rhetorically and dangerous architecturally.
- Mechanic errors are claims that contradict documented platform behavior. They are catchable by querying canonical sources. In Dataverse this means the column-security model, the cascade-behavior matrix, the access-team and owner-team semantics, the relationship-type contract. Microsoft Learn covers all of these. If the orchestration forces a retrieval-before-composition step, mechanic errors get caught before they ship. Two of the three failed designs are mechanic errors.
- Category errors are claims that follow documented mechanics correctly but violate a higher-level engineering principle. The principle is not on a docs page. Catching them requires either a human with the relevant judgment or an explicit principle-checker that knows the patterns (security-by-obscurity, implicit trust boundaries, time-of-check-vs-time-of-use, etc.). The MCP does not help. One of the three failed designs is a category error.
The walkthrough below labels each design accordingly. Read the mechanic errors as “the workflow was missing a forced verification step.” Read the category error as “the workflow was missing a human or a principle-checker, and no amount of better doc retrieval would fix it.”
Mechanic Error #1: Field Security Profile Block-By-Role
Three sentences in, the design is wrong. Field Security Profiles are grant-only. They do not take security roles as members. They take users or teams (owner team or access team). The mechanism for hiding a column is the column’s column-security flag (IsSecured = true in the developer API; “Enable column security” in the maker UI), which hides the column from EVERYONE by default. The Field Security Profile is the exception list, granting Read back to the users or teams that should see the column.
So the AI’s design has the polarity reversed. To use FSP for this requirement, you mark the comment column as column-secured, which hides it from everyone. Then you create or maintain a team (owner team if membership is stable; access team if signers are assigned per-record by the signing flow) containing every user who SHOULD see the column. Then you create an FSP with that team as a member, granting Read on the secured column. The grant goes to the people who should see; the deny is implicit in the column-security flag.
Microsoft Learn states this directly on the Column-level security in Dataverse page and on the access teams and owner teams page. A single MCP query would have surfaced the polarity. The MCP was wired in. It was not queried.
The failure mode is not that the AI lacked the information. The failure mode is that the orchestration did not require the AI to verify the mechanic before composing a design that depends on it. Composing a design and verifying a mechanic are separate operations, and the workflow allowed them to be done in either order. In this case, only one of them happened. The compositional step pattern-matched on the shape of similar sentences the AI has seen (“create an FSP that blocks Read for a role”) without conditioning that step on retrieved canonical behavior.
Mechanic Error #2: Parental Relationship to a Comment Child Table
This is the most consequential of the failed designs because it silently exposes data in production.
Parental relationships in Dataverse propagate inherited access rights from parent to child. If a user has Read on the parent of a parental relationship, that Read inherits down to the child rows. The whole point of the design here is that the participant should read one level (the existing child table) but NOT another level (the new comment table). Setting the relationship to parental means the participant’s access to the existing child rows inherits down to the comment rows, defeating the design.
The correct relationship type for this case is referential. Referential relationships do not propagate inherited access. A participant with Read on the existing child rows would not automatically gain Read on a referential-related comment row. Signers (the people who should see comments) need a separate explicit mechanism: their flow that activates a signing step adds them to an access team on the corresponding comment row.
Microsoft Learn documents this directly on the Microsoft Dataverse table relationships page. The cascade behavior matrix is unambiguous: parental relationships cascade Share, Reparent, Assign, Unshare (configurable to Cascade All, Active, or User-Owned). Referential relationships are Cascade None on those same actions. A single MCP query would have surfaced the matrix.
Same three tables. Same participant Read on the parent. Relationship type decides whether comments are exposed.
The MCP was wired in. It was not queried. The AI composed “use a parental relationship and let access inherit” because that string fit the slot in the design. Whether the string corresponded to the actual Dataverse cascade contract was a separate verification step that the workflow did not enforce.
Category Error: JSON Snapshot as Security Boundary
This is a different failure class entirely. Nothing in the design violates a documented Dataverse mechanic. The JSON column behaves exactly as the platform contract says. The participant has Read on the parent, the JSON column is on the parent, the participant reads the column. All of that is correct.
The bug is that structure-based security here is illusory. The “no comments in the JSON” promise has three independent failure paths:
- Serializer drift. The promise relies on the flow building the JSON to never include comment text. That is policy enforced by code review, not architecture. The next developer who adds “include all custom fields” breaks the boundary without realizing it.
- God-mode access. Anyone with read-everywhere privileges reads every column on every row, including the original comment column on the child rows. The JSON column is irrelevant to them.
- Web API bypass. Anyone hitting the Dataverse Web API directly (with table-Read on the child table, which the participant already has by design in this requirement) reads the comment column on the child rows. The JSON column never enters the request.
The AI was confident the JSON design was structurally cleaner than the FSP design. It was not. It just looked cleaner because the policy boundary (the serializer) was implicit. In my read, implicit policy boundaries are the canonical security-by-obscurity pattern. (mine)
This is the failure class that retrieval-augmented generation cannot fix. No Microsoft Learn page says “do not use serialization as a security boundary.” There is no canonical mechanic to query because the issue is not about Dataverse behavior. It is about the architectural difference between a policy enforced by code and a policy enforced by the platform itself. Catching this requires either a human with security-engineering judgment, or an explicit principle-checker that recognizes the pattern. The MCP is the wrong tool for the job.
What Are the Correct Dataverse Security Designs for This Case?
Eventually, after the three wrong designs and four rounds of pushback, the AI produced two correct shapes. Both work. They have different operational trade-offs.
| Aspect | Shape A: Field Security on the column | Shape B: Separate child table, referential relationship |
|---|---|---|
| Where the comment lives | Same column as today, but with column security enabled | New child table with referential relationship to the existing child |
| Who sees it | Members of an Owner Team (membership stable across the engagement) granted Read via FSP | Members of an Access Team added per comment row by the flow that activates each signing step |
| Operational cost | Maintain Owner Team membership when staff change | Per-row Access Team membership management in the signing flow |
| Audit trail | Purview activity logs on column-level reads | Standard row-level audit on the comment table |
| When this shape wins | All-or-nothing visibility (everyone in a fixed set sees all comments) | Per-row visibility (different signers see different comments) |
| Solution & ALM | FSP solution-aware; Owner Team membership is environment-scoped and migrates only via post-deploy scripts | Comment table solution-aware; Access Team templates migrate but per-row memberships are runtime data, not ALM artifacts |
| Mechanism reference | Grant-only model with column-security deny default. See Microsoft Learn. | Referential cascade behavior. See Microsoft Learn cascade matrix. |
The technical content of these two shapes is what the AI eventually generated. The work to find them was the human’s, not the AI’s, even though the human was working with the AI’s full output.
The Real Architectural Axis: What Is Cheap to Mechanically Verify?
The trust-boundary framing (“AI for boilerplate, AI plus critic for design, never AI for security”) is a heuristic. It is useful as a default, but the load-bearing axis underneath is different and sharper.
The architectural decision is not “where to put AI.” It is “which claims your verification step can deterministically check.”
Some classes of claim are cheap to mechanically verify:
- Syntactic claims. Does the file parse? Does the schema validate? Does the JSON conform to a schema? Compilers and parsers answer these in milliseconds and never lie.
- Type-level claims. Does the function signature match the call site? Does the SQL query reference columns that exist? Type checkers and database-aware linters answer these deterministically.
- Build-validity claims. Does the project compile? Does the test suite run? CI answers these.
- Policy-evaluation claims. Given this RBAC configuration, can actor A perform action X on resource R? A policy evaluator (or a sandbox tenant with a few representative actor-action pairs) answers this deterministically.
- Access-matrix claims. Given this Dataverse security model, who sees what? Provisionable in a sandbox tenant, queryable per-actor.
- Unit-test claims. Given this input, does the function return the expected output? Tests answer this.
Some classes of claim are not cheap to mechanically verify:
- Architecture quality. Is this the right abstraction for the next ten use cases? No deterministic check.
- Threat-modeling completeness. Have we considered every attack path? No mechanical answer.
- Semantic appropriateness. Is this the right way to model “approval” in this business?
- Organizational assumptions. Is this control framework the one our auditors actually accept?
- Pattern-level critique. Is this design relying on an implicit boundary that will silently fail later?
The first list is where AI scales well with verification wrapped around it. The second list is where AI helps less and human judgment carries the load.
The mechanic errors in this conversation belong to the first list. Dataverse’s cascade behavior is a policy-evaluation claim. FSP membership is an access-matrix claim. Both are deterministically checkable in a sandbox tenant. The orchestration failure was not building the check.
The category error belongs to the second list. “Is the serializer an acceptable policy boundary?” has no deterministic verifier. Either a human reviewer with security-engineering chops catches it, or an explicit principle-checker that has been taught the pattern catches it, or it ships.
That distinction is the architecture. The trust-boundary tiers below are a heuristic version of it. The real move when you adopt AI on a new task is to ask: which class of claims will this task depend on, and what is my deterministic verifier for the first-list claims? If the answer is “I don’t have one yet,” that is the gap to close first. If the answer is “this task depends mostly on second-list claims,” that is the signal that AI is going to need stronger human oversight than usual.
Why the Tools Didn’t Catch It: Orchestration, Not LLM Failure
The available tooling did not prevent the two mechanic errors. The proximate cause is the LLM composed wrong sentences. The load-bearing cause is the workflow let it.
- AI generates by composition, not by verification. Given a problem and a set of building blocks, the AI assembles candidates that fit the shape of similar examples it has seen. “Create an FSP that blocks Read for a role” is a sentence shape the model has seen in similar slots; the slot got filled. Whether the contents are mechanically correct is a separate operation. (mine)
- Tool invocation is a learned heuristic, not a structural step. MCP, agents, and skills are available, but the AI must trigger them. The decision to query MCP for any given claim depends on context-window weighting and prompt-conditioning, not on any mechanism that recognizes when a load-bearing claim is being made. A confident-sounding compositional step ships without triggering the tool call that would have caught the error.
- Reviewer agents that sit AFTER synthesis-and-commit do not help. Publish-check skills do not gate Dataverse semantics. Content-reviewer agents do not understand FSP membership. Visual-QA agents do not parse cascade behavior. They cover content discipline and rendering quality, not domain-mechanics correctness. Most teams shipping AI-assisted code today have similar gates (lint, tests, render quality, content review) and the same absence of a domain-mechanics gate.
- Documentation lookup is not the same as deterministic verification. Even when the AI does query MCP, it gets back a documentation page. The AI still has to read it correctly, identify the relevant clause, apply it to the design, and conclude correctly. Each of those steps is itself probabilistic. Documentation queries help, but the strongest gate is not “did the AI read the docs,” it is “does the design actually behave as claimed when provisioned in a sandbox.” That gate is deterministic. Documentation is only the cheap approximation.
The conclusion: the verifier has to sit outside the synthesis loop, and the stronger the verifier the better the gate. A deterministic sandbox check is the strongest. A human reviewer who knows the mechanics is the next strongest. A second LLM forced to verify against MCP is the cheap intermediate, weaker than either of the above because the verifier is also probabilistic, but meaningfully better than no separation at all.
What did NOT work in this conversation: the AI itself, with all its tools, generating both the design and validating the design. That is the trust boundary the orchestration has to enforce.
Where Is the Trust Boundary for AI in Microsoft Security Code?
Three concrete tiers, in increasing risk. The framework is mine, shaped by the mechanically-verifiable axis above and by repeated experience including the conversation here. (mine)
| Tier | What to trust AI for | Verification required | Failure mode if skipped |
|---|---|---|---|
| 1. Boilerplate | Standard CRUD operations, copying existing flow patterns, drafting test scaffolding, refactoring within a defined contract. | Read the diff. Run existing tests. | Small, locally visible, easy to roll back. Acceptable risk. |
| 2. Mechanic-checkable design | Architecture proposals, schema decisions, integration patterns, security designs where the claims map to documented platform mechanics. | Deterministic check preferred (sandbox provisioning + access-matrix tests). Human verifier as second-best. LLM mechanics-agent against MCP as cheap intermediate. Workflow halts on disagreement. | Mechanic violations ship silently. Plausible-looking design generates wrong access behavior in production. |
| 3. Never as sole author for category-level security judgment | Pattern-level critique (is this an acceptable boundary?), threat modeling completeness, implicit-trust-boundary detection, organizational policy fit. | Human security reviewer required. Principle-checkers help but do not replace. Documentation queries do not help because no canonical page covers these. | Category errors ship: structure-as-security, implicit policy boundaries, hidden state assumptions. The build succeeds, the tests pass, the design fails on a class of attacks the test suite does not exercise. |
The middle tier is where most teams will spend their time and where the deterministic-vs-LLM-vs-human verifier choice has the most leverage. The split-agent workflow described below is one concrete shape for the LLM-verifier intermediate. A sandbox-based check is stronger and is where this should head as teams mature.
For compliance leads who need to drop this into an AI-Use Policy doc, the three-tier framing maps cleanly to control language. The middle and top tiers operationalize NIST AI RMF MAP-3 (context of AI use, including assigning a verifier role) and MEASURE-2 (evaluation of AI-generated artifacts before deployment). SOC 2 readers can treat the mechanics-agent + human-checkpoint as a CC8.1 change-management control: AI-authored security designs require a documented review-and-approve step before commit, with the verifier output as the audit artifact. If the audit asks how the org discovers a wrong design that did ship, the answer is the same as for any other config-shaped security change: Purview activity logs for column reads, Dataverse access audit for row-level access, and the team’s periodic Managed Environments security review.
This is not about model size or vendor choice. The conversation I describe ran with Claude. I have watched the same shape of failure on GPT, Gemini, and open-weight models across enough delivery work to call the pattern primitive-level rather than vendor-specific, but the receipts in this article are from one stack. Treat the cross-vendor claim as an architect’s prior, not a finding from this transcript. The remedy I argue for is structural: change the workflow, not the model. (mine)
For more on the architecture-rules side of this, see Six Rules for LLM-Agnostic AI Agents on Microsoft Foundry and especially Rule 6 (run a quarterly model-swap drill) which applies the same discipline of “verify mechanics, do not trust spec sheets” to provider-agnostic agent fleets. For the governance-baseline side, see AI Governance Framework for Microsoft Enterprises. Microsoft’s Well-Architected Framework codifies the human-accountability principle in Responsible AI in Azure workloads, which makes the same point at the framework level: oversight is not optional in agentic AI systems.
What I Would Build Now
A verification stack, ordered strongest to cheapest. Deploy the strongest gate your team can afford; use cheaper gates as bridges while the stronger ones are being built.
The strongest gate is a deterministic check against actual platform behavior. Provision the proposed schema in a sandbox tenant. Define a set of representative actor-and-action pairs that exercise the security boundary (participant tries to read comment column, signer tries to read comment column, admin tries to read comment column, participant tries the Web API bypass, etc.). Run them. Compare actual access outcome against intended outcome. Disagreement halts the workflow. This is deterministic because the platform itself is the verifier; there is no probabilistic step. It is the long-term direction for any team taking AI-assisted security design seriously.
The next-strongest gate is a human reviewer who knows the mechanics. When the sandbox check is not yet built, a human who understands FSP polarity, cascade behavior, and access-team semantics is the most reliable verifier. They are slow and they do not scale, but they catch both mechanic errors and category errors in one pass. Most engagements should have at least one of them on the AI-assisted security path.
The cheap intermediate is a split-agent verification workflow. Design and mechanics-checking run as separate AI calls with different prompts, different objectives, and a forced halt on disagreement. The design agent gets the problem, the context, and a prompt asking for candidate designs with explicit load-bearing mechanic claims. It produces a design plus a list of mechanics it depends on. The mechanics agent gets ONLY the list of claimed mechanics, queries Microsoft Learn MCP for each, and emits an agree/disagree verdict per claim. Disagreement halts the workflow and returns to the design agent. Agreement passes to a human.
Split-agent verification: design and mechanics-checking run as separate AI calls with different prompts and a forced halt on disagreement. The cheap intermediate while sandbox-as-CI is being built.
The always-required gate is a human adjudicator for category-level questions. Even with a deterministic sandbox check passing and a mechanics agent agreeing on every claim, the design can still be a category error (Wrong Design #2 above would pass both lower gates). A human in the loop on category-level questions is not optional regardless of how strong the mechanic gates are.
For Microsoft delivery teams adopting agentic development, this stack converts AI from “a junior engineer with confident wrong answers on security” to “a fast paired-programmer whose mechanic claims get checked and whose category-level judgment gets reviewed.” The substrate is the AI dev stack. The verification layer is what turns the substrate into a workflow you can defend.
What I Would Actually Do Today
If you are using AI for Dataverse security design right now, four concrete actions, ordered by deploy difficulty.
Force a mechanics-first prompt. Before asking for a design, ask the AI to enumerate the mechanics it will depend on. Verify the citations yourself. Then ask for the design. The mechanics-first ordering prevents synthesis from running ahead of verification, and it is the cheapest possible change: no orchestration code, just prompt discipline.
You are about to design a Dataverse security model for the requirement below.
Before you propose any design, do this:
1. List every load-bearing Dataverse mechanic this problem depends on
(relationship cascade behavior, FSP membership and grant model,
access-team semantics, owner-team scoping, role privilege depth, etc.).
2. For each mechanic, cite the Microsoft Learn page that documents it
(use the wired-in Microsoft Learn MCP; quote the canonical sentence
from each page so I can audit it).
3. Wait for me to verify the citations before you propose a design.
Requirement: <paste requirement here>
Add a mechanics-agent call between design and commit. A second AI call with a different prompt, given only the design and the relevant Microsoft Learn pages, asks the mechanics agent to compare the design to canonical docs.
You are a mechanics verifier. You will receive a Dataverse security design
and a list of Microsoft Learn page URLs.
For each load-bearing claim about Dataverse mechanics in the design:
1. Identify the claim verbatim.
2. Query the relevant Microsoft Learn page via MCP.
3. Quote the canonical sentence that supports or refutes the claim.
4. Mark the claim AGREE, DISAGREE, or UNVERIFIABLE.
Return only the per-claim verdict table. Do not propose changes.
Do not redesign. Disagreement on any claim halts the workflow and the
design returns to the design agent for revision.
Design: <paste design>
Microsoft Learn pages: <paste URL list>
Build a sandbox access-matrix test. Provision the proposed schema in a non-production tenant. Pick three to five representative actor-and-action pairs that exercise the security boundary. Run them as a test harness. This is the deterministic gate; once it exists, the mechanics-agent step is optional. Cost: one engineering sprint to build the harness, ongoing test maintenance per security change.
Keep a human in the loop on category-level questions. No combination of the above gates catches the JSON-as-security-boundary category error. Pattern-level critique is human work. Schedule a security-engineering review for any AI-generated security design before commit, regardless of how many automated gates passed.
The fancy AI dev stack is not the answer to the trust question. It is the substrate on which a workable verification architecture can be built. The trust comes from the architecture you build on top, not from the substrate itself. Microsoft Learn MCP, agents, skills, and codebase context are necessary. They are not sufficient.
The four-wrong-designs conversation is what happens when a team thinks the substrate is the discipline. The verification stack above is what the discipline actually looks like.
Frequently Asked Questions
Is AI safe for writing Dataverse security designs? AI is safe for proposing Dataverse security designs and unsafe as the sole author. In this conversation two of three failed designs violated documented Dataverse mechanics (catchable with a forced retrieval step or a sandbox check), and one was a category error against security-engineering principles (catchable only by a human reviewer or an explicit principle-checker). The remedy is not “stop using AI for design,” it is “build the verification layer that matches the failure class.”
Why didn’t Microsoft Learn MCP catch the wrong designs if it was wired in? For the two mechanic errors, because tool invocation is a learned heuristic and the orchestration did not force retrieval before composition. The MCP would have surfaced the correct mechanics for both in a single query. The AI did not query. For the category error (JSON-as-security-boundary), the MCP would not have helped even if queried, because no canonical page documents that pattern as wrong. Different failure class, different verification need.
What is the real architectural axis behind your trust tiers? What claims are cheap to mechanically verify. Policy-evaluation, access-matrix, syntax, type-level, and unit-test claims are cheap (deterministic checkers exist). Pattern-level critique, threat-modeling completeness, semantic appropriateness, and organizational assumptions are not (human judgment required). AI scales well around the first class when verification is wrapped around it. AI helps less for the second class and humans carry the load.
What does the split-agent verification workflow cost to build, and what’s its limit? One extra short-context LLM call per design candidate (the mechanics agent does not need a frontier model; a mini-tier model is fine because the work is citation lookup, not deep reasoning) plus a human checkpoint. Engineering: a few days plus ongoing prompt-drift maintenance. The limit is that the verifier is itself probabilistic and inherits a softer version of the same failure mode. The deterministic sandbox check is stronger; treat the split-agent as the cheap bridge until sandbox-as-CI is built.
Why insist on a sandbox check instead of trusting documentation queries? Because documentation lookup is not the same as deterministic verification. Even when the AI reads the docs correctly, applying them to the specific design is itself a probabilistic step. Provisioning the proposed schema in a sandbox tenant and running representative actor-and-action pairs answers the access-outcome question deterministically. Also, docs miss undocumented edge cases, environment-specific behavior, plugin side effects, and tenant configuration variance. The platform is the verifier; docs are the cheap approximation.
Cluster: AI Governance + AI Architecture. Related reading: Six Rules for LLM-Agnostic AI Agents on Microsoft Foundry for the architecture rules; AI Governance Framework for Microsoft Enterprises for the governance baseline; AI Readiness Assessment for Microsoft Enterprises which scores delivery practices including this trust-boundary question; Agentic Development with Claude Code which describes the AI dev setup and where this article shows the setup needs more discipline. Mechanics references: Field-level security in Dataverse, Dataverse table relationships and cascade behavior, Access teams and owner teams, Responsible AI in Azure workloads.
Stay in the loop
Get new posts delivered to your inbox. No spam, unsubscribe anytime.
Related articles
The Six Agentic Adoption Patterns: A Practitioner Decode of Microsoft's New Playbook (2026)
A practitioner read of Microsoft's Agentic Transformation Patterns Playbook: six patterns, the 5x5 maturity model, CoE structures, what it understates.
Dataverse MCP, Business Skills, and Coding Agents: The 2026 Decode
Dataverse MCP server, Business Skills, Coding Agents plugin shipped May 5, 2026. Adopt-now-or-defer decision frame, five pilot gotchas, procurement surface.
AgentOps on Microsoft Foundry: A Practitioner Decode of the New CI/CD Reference Architecture (2026)
Practitioner read of Microsoft's new Foundry CI/CD reference architecture: the 5-layer pipeline, evaluation-driven release gates, and where the architecture understates the operational work.