Agentic Development with Claude Code: The Setup That Actually Works
Build a multi-agent development environment with persistent memory, quality gates, and automated pipelines. The setup that turned one CLI tool into a content factory.
Most developers use Claude Code like a chatbot. They type a question, get an answer, type another question. Each session starts from zero. No memory of what you built yesterday, no reusable patterns, no quality standards.
That is not agentic development. That is autocomplete with extra steps.
TL;DR Claude Code becomes a production system when you give it three things: persistent memory (CLAUDE.md + MEMORY.md), specialized agents (20+ prompt files with scoring rubrics), and automated pipelines (skills that chain agents with quality gates). This article shows the exact setup that produces 24 published articles with Azure-style diagrams across two live sites.
What Makes Development “Agentic”?
A chatbot answers questions. An agent completes tasks. The difference is not the model - it is the environment you build around it.
An agentic setup has four layers:
- Project instructions that persist across sessions (CLAUDE.md, AGENTS.md)
- Memory that accumulates knowledge over time (feedback, preferences, project state)
- Specialized agents with defined roles, inputs, outputs, and quality gates
- Skills that chain agents into repeatable pipelines
Without these layers, every session is a blank slate. With them, session 50 is dramatically more productive than session 1 because the system has learned your codebase, your preferences, and your quality standards.
Layer 1: Project Instructions (CLAUDE.md)
CLAUDE.md is the file Claude Code reads at the start of every session. It is your leverage point. Every instruction there saves you from repeating yourself in every conversation.
A minimal CLAUDE.md points to your real instructions:
# Claude Instructions
All project rules are in: **[AGENTS.md](./AGENTS.md)**
Read that file in full before writing any code.
AGENTS.md contains the operational details: project structure, coding rules, common mistakes to avoid, MCP server references, and the full agent inventory. Keeping instructions in a separate file means you can update them without touching the root config.
What goes in project instructions
- Technology stack with exact versions (not “use React” but “Astro 5.x + Tailwind CSS 4.x via @tailwindcss/vite”)
- Common mistakes with fixes (a table of past failures prevents repeating them)
- MCP servers with tool descriptions and workflows
- CLI commands so the agent can operate your tools without guessing syntax
- Writing rules if your project involves content (no em dashes, no AI slop phrases)
The key insight: project instructions are not documentation for humans. They are configuration for an AI agent. Write them like you are programming behavior, not explaining concepts.
We document every architectural decision this way. The same principle applies to living documentation in git - if it is not in the repo, it does not exist.
Layer 2: Persistent Memory
Memory is what makes session 50 better than session 1. Claude Code supports a file-based memory system at ~/.claude/projects/<project>/memory/.
Memory types
| Type | What It Stores | Example |
|---|---|---|
| user | Your role, preferences, expertise level | ”Senior architect, new to React frontend” |
| feedback | Corrections and confirmed approaches | ”Never use —svg-theme auto, it breaks with OS dark mode” |
| project | Ongoing work, goals, decisions | ”Merge freeze after March 5 for mobile release” |
| reference | Pointers to external systems | ”Pipeline bugs tracked in Linear project INGEST” |
How memory compounds
In our setup, memory captured 15 feedback rules over 10 sessions. Each rule prevents a specific mistake from recurring:
- “All repos under crmvet/ must be —private”
- “Always WebFetch verify after push - never say should deploy”
- “SVG export needs —svg-theme light —embed-svg-images”
- “Draw.io XML needs HTML entity-encoded in value attributes”
Without memory, I would need to re-explain each of these every session. With memory, the agent reads them automatically and applies them without prompting.
Layer 3: Specialized Agents
An agent is a prompt file that defines a role, inputs, outputs, and quality gates. It is not a chatbot conversation - it is a job description for an AI worker.
Anatomy of an effective agent
# Agent: Diagram Creator
## Role
Create Draw.io diagrams from visual plans using the MCP server.
## Input
- Build spec from /diagram-architect
- Article .mdx file for context
- Azure2 icon catalog
## Process
1. Build on canvas via Draw.io MCP
2. Run programmatic review (score >= 70)
3. Fix issues, re-review
4. Export .drawio + SVG
## Output
- .drawio source file
- .svg for the live site
## Quality Checks
- [ ] All positions on 10px grid
- [ ] Azure2 icons for all services
- [ ] All colors from approved palette
- [ ] Combined review score >= 70
The quality checks are what separate an agent from a prompt. Without measurable gates, you get inconsistent output. With them, you get a standard.
How many agents do you need?
We run 20 agents across two sites:
| Category | Count | Examples |
|---|---|---|
| Content creation | 4 writers + editor + researcher | Site-specific voice, 10 quality gates |
| Planning | 3 (content, SEO, series) | Scoring rubrics, cannibalization checks |
| Visual | 3 (strategist, architect, creator) | Diagram pipeline with MCP |
| Review | 5 (content + 4 diagram critics) | 0-100 scoring, weighted dimensions |
| Ops | 5 (publisher, visual QA, LinkedIn, refresh, planner) | Deploy verification, staleness monitoring |
You do not need 20 agents on day one. Start with 3: a writer, an editor, and a reviewer. Add agents when you find yourself giving the same instructions repeatedly.
Our naming conventions article was the first test of the writer + editor pipeline. The spec-driven series tested batch production across 7 articles with parallel agents.
Layer 4: Skills (Reusable Pipelines)
A skill is a prompt template that users invoke with a slash command. Skills chain multiple agents into a repeatable workflow.
/write-post → Writer Agent → Editor Agent → Review Agent → Publisher Agent
Each invocation follows the same process, applies the same quality standards, and produces consistent output. The skill is the pipeline - the agents are the stages.
Skills we use daily
| Skill | What It Does |
|---|---|
/write-post | Full article production with anti-AI-detection checks |
/review-content | 4-dimension scoring: SEO, quality, engagement, conversion |
/review-diagram | 4 critic agents: layout, brand, storytelling, export |
/drawio | Architecture diagrams with 648 Azure2 icons |
/diagram-architect | Plan layout before building (coordinates, icons, edges) |
/capture-idea | Store raw brain dump as structured backlog item |
The Pipeline That Produces Articles
Here is the full lifecycle, from idea to published article:
- Capture (
/capture-idea) - raw brain dump to structured idea - Plan (
/plan-content) - score and prioritize against existing content - Research (Researcher agent + Microsoft Learn MCP) - verify facts against official docs
- Write (Writer agent, site-specific voice) - draft with opinions and real examples
- Edit (Editor agent, 10 quality gates) - no em dashes, no employer names, no AI slop
- Review (
/review-content) - score 0-100, fix until >= 70 - Visual Plan (Visual Strategist) - what diagrams, where, what type
- Diagram Architect - calculate coordinates, resolve icons, plan edges
- Diagram Build (Draw.io MCP) - create on canvas or batch generate
- Diagram Review (4 critics) - layout, brand, storytelling, export
- Export (draw.io CLI) - .drawio to SVG with embedded icons
- Publish (Publisher agent) - git commit, push, verify live deployment
- LinkedIn (LinkedIn Writer) - algorithm-optimized post
Each step has a defined input, output, and quality gate. If any gate fails, work goes back to the responsible agent with specific fix instructions.
The red dashed arrows are the feedback loops. A content review score below 70 sends the article back to the writer. A diagram review failure sends it back to the builder. No step is “done” until its quality gate passes.
This pipeline produced every article on this site, including the governance series, environment strategy guide, and the FetchXML deep-dive.
MCP Servers: Extending the Agent’s Reach
Model Context Protocol servers give your agents access to external tools and data. Three MCP servers power our setup:
| MCP Server | What It Provides |
|---|---|
| Microsoft Learn | Search and fetch official Microsoft documentation |
| Draw.io | Programmatic diagram creation with Azure2 icons |
| Google Trends | Trending topics and news for content research |
The key lesson with MCP: always verify capabilities by testing, not by assuming. Our Draw.io MCP server required a browser open on localhost:3000 for the WebSocket connection. The tools list said nothing about this requirement. We discovered it when every tool call hung indefinitely.
What Went Wrong (Twice)
Before sharing the lessons, here are two disasters that shaped the system.
The Dark Mode Export Disaster. We built 8 diagrams, exported them to SVG, deployed to production, and declared victory. Then we checked the live site in dark mode. Every diagram had a black background with dark green shapes - completely unreadable. The root cause: --svg-theme auto uses CSS light-dark() which responds to the OS dark mode preference, not the site’s theme toggle. The fix was --svg-theme light with a CSS card wrapper in the blog layout. We pushed 4 broken deploys before getting it right.
The HTML Encoding Bug. Draw.io stores HTML in XML attributes. The value <b>Dev</b> must be encoded as <b>Dev</b> in the .drawio file. Draw.io then decodes it and renders the HTML. Our batch generator had the encoding wrong - first too little (raw tags in XML = malformed), then too much (double-encoded = literal <b> showing on screen). We deployed raw HTML tags to production twice before comparing a working file against a broken one and finding the difference.
Both bugs had the same root cause: no visual review step in the automated pipeline.
The fix: a mandatory quality loop. Build the diagram, run 4 critics, check the combined score. Below 70? Back to the builder with specific fix instructions. Above 70? Export and deploy. No exceptions. The Layout Critic and Brand Critic run programmatically and scored 100/100. But neither of them could see that the exported SVG looked terrible. The fix: always verify the deployed output with WebFetch or a screenshot before declaring success.
What I Learned Building This System
1. Never accept “can’t do it” from the agent
When Claude said it could not export SVGs, the correct response was “research how.” The draw.io desktop CLI had --export --format svg all along. Push back on capability claims.
2. Test on the live site, not in the editor
We built 8 diagrams before checking the deployed result. The first deploy showed dark backgrounds and broken icons. If we had checked after diagram 1, we would have saved 7 iterations.
3. Build the review system before the production system
We built the content pipeline first, then realized we had no quality gates for diagrams. Building the 4 critic agents BEFORE batch-generating 21 diagrams would have caught the arrow alignment, broken icons, and dark mode issues before they hit production.
4. Memory prevents repeat mistakes
Every debugging session produced a feedback memory entry. “SVG export needs —svg-theme light” is the kind of knowledge that saves 30 minutes per session when it is in memory versus being rediscovered.
5. Think systems, not tasks
“Fix this diagram” is a task. “Build a pipeline that produces correct diagrams every time” is a system. The system takes longer to build but pays back on every subsequent diagram.
Getting Started: Your First 30 Minutes
- Create CLAUDE.md in your project root. Point it to your coding standards.
- Create one feedback memory for your strongest preference (naming conventions, test patterns, deployment rules).
- Create one agent for the task you do most often. Define role, input, output, quality checks.
- Create one skill that invokes that agent with
/your-command. - Run it and iterate. The first version will be rough. By version 3, it will be faster than doing it manually.
The gap between “using Claude Code” and “building with Claude Code” is the same gap between typing commands and writing scripts. One is interactive. The other is infrastructure.
Build the infrastructure.
Related Articles
- Architecture Diagrams with Draw.io MCP Server and Claude Code - how we built the diagram pipeline
- Spec-First Development: Why Your Flow Specs Should Exist Before the Designer Opens - the article that tested this pipeline end-to-end
Stay in the loop
Get new posts delivered to your inbox. No spam, unsubscribe anytime.
Related articles
20 Architecture Diagrams in 20 Minutes: How AI Documents Enterprise Systems
Generate ERDs, network topologies, security models, CI/CD pipelines, and integration maps from code. The batch-generation approach that replaces weeks of Visio work.
What AI Gets Wrong About Power Platform (And Why That Is the Point)
AI made three Power Automate architecture mistakes in 10 minutes. After correction, it delivered 14 production-ready flows. Here is the real pattern.
The 10-Minute Build: How Specs and AI Produced 14 Power Automate Flows
Power Automate flows built by AI in 10 minutes -- but only because two years of governance made specs machine-readable. The full architecture story.