Agentic Development with Claude Code: The Setup That Actually Works

Most developers use Claude Code like a chatbot. They type a question, get an answer, type another question. Each session starts from zero. No memory of what you built yesterday, no reusable patterns, no quality standards.

That is not agentic development. That is autocomplete with extra steps.

TL;DR Claude Code becomes a production system when you give it three things: persistent memory (CLAUDE.md + MEMORY.md), specialized agents (20+ prompt files with scoring rubrics), and automated pipelines (skills that chain agents with quality gates). This article shows the exact setup that produces the published articles you can browse at az365.ai/blog.

What Makes Development “Agentic”?

A chatbot answers questions. An agent completes tasks. The difference is not the model - it is the environment you build around it.

An agentic setup has four layers:

Project instructions that persist across sessions (CLAUDE.md, AGENTS.md)
Memory that accumulates knowledge over time (feedback, preferences, project state)
Specialized agents with defined roles, inputs, outputs, and quality gates
Skills that chain agents into repeatable pipelines

Without these layers, every session is a blank slate. With them, session 50 is dramatically more productive than session 1 because the system has learned your codebase, your preferences, and your quality standards.

Agentic development pipeline showing four layers from project instructions through memory, agents, and skills

Layer 1: Project Instructions (CLAUDE.md)

CLAUDE.md is the file Claude Code reads at the start of every session. It is your leverage point. Every instruction there saves you from repeating yourself in every conversation.

A minimal CLAUDE.md points to your real instructions:

# Claude Instructions
All project rules are in: **[AGENTS.md](./AGENTS.md)**
Read that file in full before writing any code.

AGENTS.md contains the operational details: project structure, coding rules, common mistakes to avoid, MCP server references, and the full agent inventory. Keeping instructions in a separate file means you can update them without touching the root config.

What goes in project instructions

Technology stack with exact versions (not “use React” but “Astro 5.x + Tailwind CSS 4.x via @tailwindcss/vite”)
Common mistakes with fixes (a table of past failures prevents repeating them)
MCP servers with tool descriptions and workflows
CLI commands so the agent can operate your tools without guessing syntax
Writing rules if your project involves content (no em dashes, no AI slop phrases)

The key insight: project instructions are not documentation for humans. They are configuration for an AI agent. Write them like you are programming behavior, not explaining concepts.

We document every architectural decision this way. The same principle applies to living documentation in git - if it is not in the repo, it does not exist.

Layer 2: Persistent Memory

Memory is what makes session 50 better than session 1. Claude Code supports a file-based memory system at ~/.claude/projects/<project>/memory/.

Memory types

Type	What It Stores	Example
user	Your role, preferences, expertise level	”Senior architect, new to React frontend”
feedback	Corrections and confirmed approaches	”Never use —svg-theme auto, it breaks with OS dark mode”
project	Ongoing work, goals, decisions	”Merge freeze after March 5 for mobile release”
reference	Pointers to external systems	”Pipeline bugs tracked in Linear project INGEST”

How memory compounds

Memory accumulates feedback rules across sessions. Each rule prevents a specific mistake from recurring:

“All repos under crmvet/ must be —private”
“Always WebFetch verify after push - never say should deploy”
“SVG export needs —svg-theme light —embed-svg-images”
“Draw.io XML needs HTML entity-encoded in value attributes”

Without memory, I would need to re-explain each of these every session. With memory, the agent reads them automatically and applies them without prompting.

Layer 3: Specialized Agents

An agent is a prompt file that defines a role, inputs, outputs, and quality gates. It is not a chatbot conversation - it is a job description for an AI worker.

Anatomy of an effective agent

# Agent: Diagram Creator

## Role
Create Draw.io diagrams from visual plans using the MCP server.

## Input
- Build spec from /diagram-architect
- Article .mdx file for context
- Azure2 icon catalog

## Process
1. Build on canvas via Draw.io MCP
2. Run programmatic review (score >= 70)
3. Fix issues, re-review
4. Export .drawio + SVG

## Output
- .drawio source file
- .svg for the live site

## Quality Checks
- [ ] All positions on 10px grid
- [ ] Azure2 icons for all services
- [ ] All colors from approved palette
- [ ] Combined review score >= 70

The quality checks are what separate an agent from a prompt. Without measurable gates, you get inconsistent output. With them, you get a standard.

How many agents do you need?

We run 20 agents across two sites:

Category	Count	Examples
Content creation	4 writers + editor + researcher	Site-specific voice, 10 quality gates
Planning	3 (content, SEO, series)	Scoring rubrics, cannibalization checks
Visual	3 (strategist, architect, creator)	Diagram pipeline with MCP
Review	5 (content + 4 diagram critics)	0-100 scoring, weighted dimensions
Ops	5 (publisher, visual QA, LinkedIn, refresh, planner)	Deploy verification, staleness monitoring

You do not need 20 agents on day one. Start with 3: a writer, an editor, and a reviewer. Add agents when you find yourself giving the same instructions repeatedly.

Agent hierarchy showing 20 agents organized in 5 categories radiating from a central content machine hub

Our naming conventions article was the first test of the writer + editor pipeline. The governance series tested batch production across multiple articles with parallel agents.

Layer 4: Skills (Reusable Pipelines)

A skill is a prompt template that users invoke with a slash command. Skills chain multiple agents into a repeatable workflow.

/write-post → Writer Agent → Editor Agent → Review Agent → Publisher Agent

Each invocation follows the same process, applies the same quality standards, and produces consistent output. The skill is the pipeline - the agents are the stages.

Skills we use daily

Skill	What It Does
`/write-post`	Full article production with anti-AI-detection checks
`/review-content`	4-dimension scoring: SEO, quality, engagement, conversion
`/review-diagram`	4 critic agents: layout, brand, storytelling, export
`/drawio`	Architecture diagrams with 648 Azure2 icons
`/diagram-architect`	Plan layout before building (coordinates, icons, edges)
`/capture-idea`	Store raw brain dump as structured backlog item

The Pipeline That Produces Articles

Here is the full lifecycle, from idea to published article:

Capture (/capture-idea) - raw brain dump to structured idea
Plan (/plan-content) - score and prioritize against existing content
Research (Researcher agent + Microsoft Learn MCP) - verify facts against official docs
Write (Writer agent, site-specific voice) - draft with opinions and real examples
Edit (Editor agent, 10 quality gates) - no em dashes, no employer names, no AI slop
Review (/review-content) - score 0-100, fix until >= 70
Visual Plan (Visual Strategist) - what diagrams, where, what type
Diagram Architect - calculate coordinates, resolve icons, plan edges
Diagram Build (Draw.io MCP) - create on canvas or batch generate
Diagram Review (4 critics) - layout, brand, storytelling, export
Export (draw.io CLI) - .drawio to SVG with embedded icons
Publish (Publisher agent) - git commit, push, verify live deployment
LinkedIn (LinkedIn Writer) - algorithm-optimized post

Each step has a defined input, output, and quality gate. If any gate fails, work goes back to the responsible agent with specific fix instructions.

13-step pipeline from capture through publish with red feedback loops for failed reviews

The red dashed arrows are the feedback loops. A content review score below 70 sends the article back to the writer. A diagram review failure sends it back to the builder. No step is “done” until its quality gate passes.

This pipeline produced every article on this site, including the flow inventory governance piece, the environment strategy guide, and the solution-aware flows article.

MCP Servers: Extending the Agent’s Reach

Model Context Protocol servers give your agents access to external tools and data. Three MCP servers power our setup:

MCP Server	What It Provides
Microsoft Learn	Search and fetch official Microsoft documentation
Draw.io	Programmatic diagram creation with Azure2 icons
Google Trends	Trending topics and news for content research

The key lesson with MCP: always verify capabilities by testing, not by assuming. Our Draw.io MCP server required a browser open on localhost:3000 for the WebSocket connection. The tools list said nothing about this requirement. We discovered it when every tool call hung indefinitely.

What Went Wrong (Twice)

Before sharing the lessons, here are two disasters that shaped the system.

The Dark Mode Export Disaster. A batch of diagrams was exported to SVG and deployed to production before anyone checked the live site in dark mode. Every diagram had a black background with dark green shapes, completely unreadable. The root cause: --svg-theme auto uses CSS light-dark() which responds to the OS dark mode preference, not the site’s theme toggle. The fix was --svg-theme light with a CSS card wrapper in the blog layout. The first deploys had asset-routing bugs that the publish-check gate now catches.

The HTML Encoding Bug. Draw.io stores HTML in XML attributes. The value Dev must be encoded as Dev in the .drawio file. Draw.io then decodes it and renders the HTML. The batch generator had the encoding wrong, first too little (raw tags in XML = malformed), then too much (double-encoded = literal  showing on screen). Raw HTML rendering bugs slipped through before publish-check gating was added.

Both bugs had the same root cause: no visual review step in the automated pipeline.

Quality feedback loop: build, review with 4 critics, score check, fix if below 70 or ship if passing

The fix: a mandatory quality loop. Build the diagram, run 4 critics, check the combined score. Below 70? Back to the builder with specific fix instructions. Above 70? Export and deploy. No exceptions. The Layout Critic and Brand Critic run programmatically and scored 100/100. But neither of them could see that the exported SVG looked terrible. The fix: always verify the deployed output with WebFetch or a screenshot before declaring success.

What I Learned Building This System

1. Never accept “can’t do it” from the agent

When Claude said it could not export SVGs, the correct response was “research how.” The draw.io desktop CLI had --export --format svg all along. Push back on capability claims.

2. Test on the live site, not in the editor

Diagrams were built in batches before anyone checked the deployed result. The first deploy showed dark backgrounds and broken icons. Checking after the first diagram would have caught the issue immediately instead of multiplying the rework.

3. Build the review system before the production system

The content pipeline came first, with no quality gates for diagrams. Building the critic agents before batch-generating diagrams would have caught the arrow alignment, broken icons, and dark mode issues before they hit production.

4. Memory prevents repeat mistakes

Every debugging session produced a feedback memory entry. “SVG export needs —svg-theme light” is the kind of knowledge that shortens the iteration loop measurably; calibrate to your own setup.

5. Think systems, not tasks

“Fix this diagram” is a task. “Build a pipeline that produces correct diagrams every time” is a system. The system takes longer to build but pays back on every subsequent diagram.

Getting Started: Your First 30 Minutes

Create CLAUDE.md in your project root. Point it to your coding standards.
Create one feedback memory for your strongest preference (naming conventions, test patterns, deployment rules).
Create one agent for the task you do most often. Define role, input, output, quality checks.
Create one skill that invokes that agent with /your-command.
Run it and iterate. The first version will be rough. By version 3, it will be faster than doing it manually.

The gap between “using Claude Code” and “building with Claude Code” is the same gap between typing commands and writing scripts. One is interactive. The other is infrastructure.

Build the infrastructure.

Architecture Diagrams with Draw.io MCP Server and Claude Code - how we built the diagram pipeline
Power Automate Flow Inventory - what parallel agents built end-to-end on this pipeline

Agentic Development with Claude Code: The Setup That Actually Works

What Makes Development “Agentic”?

Layer 1: Project Instructions (CLAUDE.md)

What goes in project instructions

Layer 2: Persistent Memory

Memory types

How memory compounds

Layer 3: Specialized Agents

Anatomy of an effective agent

How many agents do you need?

Layer 4: Skills (Reusable Pipelines)

Skills we use daily

The Pipeline That Produces Articles

MCP Servers: Extending the Agent’s Reach

What Went Wrong (Twice)

What I Learned Building This System

1. Never accept “can’t do it” from the agent

2. Test on the live site, not in the editor

3. Build the review system before the production system

4. Memory prevents repeat mistakes

5. Think systems, not tasks

Getting Started: Your First 30 Minutes

Stay in the loop

Related articles

Architecture Diagrams with Draw.io MCP Server and Claude Code

Logic Apps as MCP Servers - The Architecture That Actually Works

Generating 20+ Architecture Diagrams in Minutes: A Batch-Generation Pattern

What Makes Development “Agentic”?

Layer 1: Project Instructions (CLAUDE.md)

What goes in project instructions

Layer 2: Persistent Memory

Memory types

How memory compounds

Layer 3: Specialized Agents

Anatomy of an effective agent

How many agents do you need?

Layer 4: Skills (Reusable Pipelines)

Skills we use daily

The Pipeline That Produces Articles

MCP Servers: Extending the Agent’s Reach

What Went Wrong (Twice)

What I Learned Building This System

1. Never accept “can’t do it” from the agent

2. Test on the live site, not in the editor

3. Build the review system before the production system

4. Memory prevents repeat mistakes

5. Think systems, not tasks

Getting Started: Your First 30 Minutes

Related Articles

Stay in the loop

Related articles

Architecture Diagrams with Draw.io MCP Server and Claude Code

Logic Apps as MCP Servers - The Architecture That Actually Works

Generating 20+ Architecture Diagrams in Minutes: A Batch-Generation Pattern