Skip to content

Agentic Development with Claude Code: The Setup That Actually Works

Build a multi-agent development environment with persistent memory, quality gates, and automated pipelines. The setup that turned one CLI tool into a content factory.

Alex Pechenizkiy 8 min read
Agentic Development with Claude Code: The Setup That Actually Works

Most developers use Claude Code like a chatbot. They type a question, get an answer, type another question. Each session starts from zero. No memory of what you built yesterday, no reusable patterns, no quality standards.

That is not agentic development. That is autocomplete with extra steps.

TL;DR Claude Code becomes a production system when you give it three things: persistent memory (CLAUDE.md + MEMORY.md), specialized agents (20+ prompt files with scoring rubrics), and automated pipelines (skills that chain agents with quality gates). This article shows the exact setup that produces 24 published articles with Azure-style diagrams across two live sites.

What Makes Development “Agentic”?

A chatbot answers questions. An agent completes tasks. The difference is not the model - it is the environment you build around it.

An agentic setup has four layers:

  1. Project instructions that persist across sessions (CLAUDE.md, AGENTS.md)
  2. Memory that accumulates knowledge over time (feedback, preferences, project state)
  3. Specialized agents with defined roles, inputs, outputs, and quality gates
  4. Skills that chain agents into repeatable pipelines

Without these layers, every session is a blank slate. With them, session 50 is dramatically more productive than session 1 because the system has learned your codebase, your preferences, and your quality standards.

Agentic development pipeline showing four layers from project instructions through memory, agents, and skills

Layer 1: Project Instructions (CLAUDE.md)

CLAUDE.md is the file Claude Code reads at the start of every session. It is your leverage point. Every instruction there saves you from repeating yourself in every conversation.

A minimal CLAUDE.md points to your real instructions:

# Claude Instructions
All project rules are in: **[AGENTS.md](./AGENTS.md)**
Read that file in full before writing any code.

AGENTS.md contains the operational details: project structure, coding rules, common mistakes to avoid, MCP server references, and the full agent inventory. Keeping instructions in a separate file means you can update them without touching the root config.

What goes in project instructions

  • Technology stack with exact versions (not “use React” but “Astro 5.x + Tailwind CSS 4.x via @tailwindcss/vite”)
  • Common mistakes with fixes (a table of past failures prevents repeating them)
  • MCP servers with tool descriptions and workflows
  • CLI commands so the agent can operate your tools without guessing syntax
  • Writing rules if your project involves content (no em dashes, no AI slop phrases)

The key insight: project instructions are not documentation for humans. They are configuration for an AI agent. Write them like you are programming behavior, not explaining concepts.

We document every architectural decision this way. The same principle applies to living documentation in git - if it is not in the repo, it does not exist.

Layer 2: Persistent Memory

Memory is what makes session 50 better than session 1. Claude Code supports a file-based memory system at ~/.claude/projects/<project>/memory/.

Memory types

TypeWhat It StoresExample
userYour role, preferences, expertise level”Senior architect, new to React frontend”
feedbackCorrections and confirmed approaches”Never use —svg-theme auto, it breaks with OS dark mode”
projectOngoing work, goals, decisions”Merge freeze after March 5 for mobile release”
referencePointers to external systems”Pipeline bugs tracked in Linear project INGEST”

How memory compounds

In our setup, memory captured 15 feedback rules over 10 sessions. Each rule prevents a specific mistake from recurring:

  • “All repos under crmvet/ must be —private”
  • “Always WebFetch verify after push - never say should deploy”
  • “SVG export needs —svg-theme light —embed-svg-images”
  • “Draw.io XML needs HTML entity-encoded in value attributes”

Without memory, I would need to re-explain each of these every session. With memory, the agent reads them automatically and applies them without prompting.

Layer 3: Specialized Agents

An agent is a prompt file that defines a role, inputs, outputs, and quality gates. It is not a chatbot conversation - it is a job description for an AI worker.

Anatomy of an effective agent

# Agent: Diagram Creator

## Role
Create Draw.io diagrams from visual plans using the MCP server.

## Input
- Build spec from /diagram-architect
- Article .mdx file for context
- Azure2 icon catalog

## Process
1. Build on canvas via Draw.io MCP
2. Run programmatic review (score >= 70)
3. Fix issues, re-review
4. Export .drawio + SVG

## Output
- .drawio source file
- .svg for the live site

## Quality Checks
- [ ] All positions on 10px grid
- [ ] Azure2 icons for all services
- [ ] All colors from approved palette
- [ ] Combined review score >= 70

The quality checks are what separate an agent from a prompt. Without measurable gates, you get inconsistent output. With them, you get a standard.

How many agents do you need?

We run 20 agents across two sites:

CategoryCountExamples
Content creation4 writers + editor + researcherSite-specific voice, 10 quality gates
Planning3 (content, SEO, series)Scoring rubrics, cannibalization checks
Visual3 (strategist, architect, creator)Diagram pipeline with MCP
Review5 (content + 4 diagram critics)0-100 scoring, weighted dimensions
Ops5 (publisher, visual QA, LinkedIn, refresh, planner)Deploy verification, staleness monitoring

You do not need 20 agents on day one. Start with 3: a writer, an editor, and a reviewer. Add agents when you find yourself giving the same instructions repeatedly.

Agent hierarchy showing 20 agents organized in 5 categories radiating from a central content machine hub

Our naming conventions article was the first test of the writer + editor pipeline. The spec-driven series tested batch production across 7 articles with parallel agents.

Layer 4: Skills (Reusable Pipelines)

A skill is a prompt template that users invoke with a slash command. Skills chain multiple agents into a repeatable workflow.

/write-post → Writer Agent → Editor Agent → Review Agent → Publisher Agent

Each invocation follows the same process, applies the same quality standards, and produces consistent output. The skill is the pipeline - the agents are the stages.

Skills we use daily

SkillWhat It Does
/write-postFull article production with anti-AI-detection checks
/review-content4-dimension scoring: SEO, quality, engagement, conversion
/review-diagram4 critic agents: layout, brand, storytelling, export
/drawioArchitecture diagrams with 648 Azure2 icons
/diagram-architectPlan layout before building (coordinates, icons, edges)
/capture-ideaStore raw brain dump as structured backlog item

The Pipeline That Produces Articles

Here is the full lifecycle, from idea to published article:

  1. Capture (/capture-idea) - raw brain dump to structured idea
  2. Plan (/plan-content) - score and prioritize against existing content
  3. Research (Researcher agent + Microsoft Learn MCP) - verify facts against official docs
  4. Write (Writer agent, site-specific voice) - draft with opinions and real examples
  5. Edit (Editor agent, 10 quality gates) - no em dashes, no employer names, no AI slop
  6. Review (/review-content) - score 0-100, fix until >= 70
  7. Visual Plan (Visual Strategist) - what diagrams, where, what type
  8. Diagram Architect - calculate coordinates, resolve icons, plan edges
  9. Diagram Build (Draw.io MCP) - create on canvas or batch generate
  10. Diagram Review (4 critics) - layout, brand, storytelling, export
  11. Export (draw.io CLI) - .drawio to SVG with embedded icons
  12. Publish (Publisher agent) - git commit, push, verify live deployment
  13. LinkedIn (LinkedIn Writer) - algorithm-optimized post

Each step has a defined input, output, and quality gate. If any gate fails, work goes back to the responsible agent with specific fix instructions.

13-step pipeline from capture through publish with red feedback loops for failed reviews

The red dashed arrows are the feedback loops. A content review score below 70 sends the article back to the writer. A diagram review failure sends it back to the builder. No step is “done” until its quality gate passes.

This pipeline produced every article on this site, including the governance series, environment strategy guide, and the FetchXML deep-dive.

MCP Servers: Extending the Agent’s Reach

Model Context Protocol servers give your agents access to external tools and data. Three MCP servers power our setup:

MCP ServerWhat It Provides
Microsoft LearnSearch and fetch official Microsoft documentation
Draw.ioProgrammatic diagram creation with Azure2 icons
Google TrendsTrending topics and news for content research

The key lesson with MCP: always verify capabilities by testing, not by assuming. Our Draw.io MCP server required a browser open on localhost:3000 for the WebSocket connection. The tools list said nothing about this requirement. We discovered it when every tool call hung indefinitely.

What Went Wrong (Twice)

Before sharing the lessons, here are two disasters that shaped the system.

The Dark Mode Export Disaster. We built 8 diagrams, exported them to SVG, deployed to production, and declared victory. Then we checked the live site in dark mode. Every diagram had a black background with dark green shapes - completely unreadable. The root cause: --svg-theme auto uses CSS light-dark() which responds to the OS dark mode preference, not the site’s theme toggle. The fix was --svg-theme light with a CSS card wrapper in the blog layout. We pushed 4 broken deploys before getting it right.

The HTML Encoding Bug. Draw.io stores HTML in XML attributes. The value <b>Dev</b> must be encoded as &lt;b&gt;Dev&lt;/b&gt; in the .drawio file. Draw.io then decodes it and renders the HTML. Our batch generator had the encoding wrong - first too little (raw tags in XML = malformed), then too much (double-encoded = literal &lt;b&gt; showing on screen). We deployed raw HTML tags to production twice before comparing a working file against a broken one and finding the difference.

Both bugs had the same root cause: no visual review step in the automated pipeline.

Quality feedback loop: build, review with 4 critics, score check, fix if below 70 or ship if passing

The fix: a mandatory quality loop. Build the diagram, run 4 critics, check the combined score. Below 70? Back to the builder with specific fix instructions. Above 70? Export and deploy. No exceptions. The Layout Critic and Brand Critic run programmatically and scored 100/100. But neither of them could see that the exported SVG looked terrible. The fix: always verify the deployed output with WebFetch or a screenshot before declaring success.

What I Learned Building This System

1. Never accept “can’t do it” from the agent

When Claude said it could not export SVGs, the correct response was “research how.” The draw.io desktop CLI had --export --format svg all along. Push back on capability claims.

2. Test on the live site, not in the editor

We built 8 diagrams before checking the deployed result. The first deploy showed dark backgrounds and broken icons. If we had checked after diagram 1, we would have saved 7 iterations.

3. Build the review system before the production system

We built the content pipeline first, then realized we had no quality gates for diagrams. Building the 4 critic agents BEFORE batch-generating 21 diagrams would have caught the arrow alignment, broken icons, and dark mode issues before they hit production.

4. Memory prevents repeat mistakes

Every debugging session produced a feedback memory entry. “SVG export needs —svg-theme light” is the kind of knowledge that saves 30 minutes per session when it is in memory versus being rediscovered.

5. Think systems, not tasks

“Fix this diagram” is a task. “Build a pipeline that produces correct diagrams every time” is a system. The system takes longer to build but pays back on every subsequent diagram.

Getting Started: Your First 30 Minutes

  1. Create CLAUDE.md in your project root. Point it to your coding standards.
  2. Create one feedback memory for your strongest preference (naming conventions, test patterns, deployment rules).
  3. Create one agent for the task you do most often. Define role, input, output, quality checks.
  4. Create one skill that invokes that agent with /your-command.
  5. Run it and iterate. The first version will be rough. By version 3, it will be faster than doing it manually.

The gap between “using Claude Code” and “building with Claude Code” is the same gap between typing commands and writing scripts. One is interactive. The other is infrastructure.

Build the infrastructure.

Stay in the loop

Get new posts delivered to your inbox. No spam, unsubscribe anytime.

Related articles