Spec-Driven Development with Claude: A Professional Guide to Consistent AI-Assisted Coding

Learn how to use spec-driven development with Claude to get consistent, high-quality results. Master context engineering, structured prompting, and professional workflows that turn AI into a reliable development partner.

[ 11 Apr 2026 • 11 min read •

views

]

Spec-Driven Development with Claude: A Professional Guide

The difference between amateurs and professionals using AI isn’t the model — it’s the spec.

Most developers use Claude the way they’d Google a question: type something vague, hope for the best, retry when it misses. This works for quick scripts. It falls apart for production software.

Spec-driven development is the practice of writing a structured specification before you prompt — then using that spec as the contract between you and the model. It’s the single most effective technique for getting consistent, high-quality code from Claude.

This post covers: why specs matter, how to write them, context engineering, and reusable agent workflows for repetitive tasks.

The Same Task, Two Ways

Without a Spec

Add rate limiting to our API

Claude installs express-rate-limit, uses in-memory storage, applies a blanket 100 req/min limit, and invents its own error format. It works in isolation — but doesn’t know you have Redis, doesn’t match your error format, and won’t survive a restart. You spend 30 minutes rewriting it.

With a Spec

## Spec: Add Rate Limiting to API Routes
 
### Goal
Add rate limiting to all /api/* routes to prevent abuse.
We've had three incidents this quarter from bot traffic.
 
### Context
- Framework: Express on Node.js 22
- Existing middleware chain: auth → validate → handler
- Redis is already available via `src/lib/redis.ts`
- No rate limiting exists today
 
### Requirements
- 100 req/min per API key (authenticated), 20 req/min per IP (unauthenticated)
- Return 429 with `Retry-After` header when exceeded
- State must survive restarts (use Redis)
- Must not add >5ms p99 latency
 
### Constraints
- Do not use express-rate-limit (memory leak issues)
- Do not modify the auth middleware
- Must work with our existing Redis connection pool
- Tests must use real Redis, not mocks
 
### Examples
HTTP/1.1 429 Too Many Requests
Retry-After: 47
Content-Type: application/json
{"error": "rate_limit_exceeded", "retryAfter": 47}

Claude produces a custom sliding-window implementation using your Redis connection, slots it into your middleware chain, matches your error format, and writes tests against real Redis. First attempt is production-ready.

30 lines. 5 minutes to write. Eliminates entire categories of rework.

The core insight: the quality of your output is bounded by the quality of your specification. When Claude generates code that doesn’t fit your project, it’s not hallucinating — it’s doing its best with incomplete information.

The Three-Step Workflow

1. Write the Spec Before You Prompt

A spec covers five things, each eliminating a different type of guesswork:

Section	What it prevents
Goal — What and why	Claude solving the wrong problem
Context — What exists	Claude ignoring your architecture
Requirements — What must be true	”It works but it’s incomplete” output
Constraints — What to avoid	Choices you’ll have to undo
Examples — What success looks like	Format and structure mismatches

2. Feed the Spec as Your Primary Input

Don’t drip-feed requirements across messages. Give Claude the full spec upfront:

CLI: “Read docs/specs/rate-limiting.md and implement it”
IDE: Select the spec file and include it in chat
CLAUDE.md: Put project-wide conventions there so they load automatically

3. Iterate on the Spec, Not the Code

When output isn’t right, don’t patch with follow-up prompts. Ask: “What was missing from my spec?” Then update it:

Claude used a wrong library → add to Constraints
Claude invented a format → add to Examples
Claude missed an existing utility → add to Context
Claude misinterpreted a requirement → make it specific in Requirements

The spec is the source of truth. The code is the artifact.

Context Engineering: The Real Skill

Context engineering is deliberately shaping everything the model sees — not just your prompt, but files, conventions, and constraints in the model’s working memory. Not all context is equal: relevant context produces relevant output, irrelevant context produces noise.

The Four Layers

          ┌─────────────┐
          │  Your Prompt │  ← Strongest signal (most recent)
          ├─────────────┤
          │  Active Files │  ← Shapes style and integration
          ├─────────────┤
          │  Project Rules│  ← CLAUDE.md (auto-loaded)
          ├─────────────┤
          │  Codebase     │  ← Discoverable, but only when prompted
          └─────────────┘

Project Rules (CLAUDE.md)

The most underused power feature. Loaded into every conversation automatically — encode decisions that should never be re-discussed:

# CLAUDE.md
 
## Architecture
- Monorepo with Turborepo
- Backend: Express + TypeScript in /apps/api
- Frontend: Next.js 15 in /apps/web
 
## Conventions
- Named exports only, never default exports
- Error responses follow RFC 7807
- All DB queries go through repository classes
- Zod for runtime validation at API boundaries
 
## Testing
- Unit: Vitest with in-memory doubles
- Integration: Vitest with real Postgres (testcontainers)
- Never mock the database in integration tests
 
## Do Not
- Do not use `any` — use `unknown` and narrow
- Do not add barrel files (index.ts re-exports)
- Do not use classes for React components

This transfers institutional knowledge into Claude’s context. Without it, you’re re-explaining your architecture every session. You can also create nested CLAUDE.md files in subdirectories for area-specific conventions.

Active Files

Always make Claude read relevant files before generating code. The most common mistake: asking Claude to write code that integrates with existing code without showing it the existing code.

“Read src/middleware/auth.ts before implementing”
“Follow the same pattern as src/routes/users.ts”

Pro tip: Show Claude your best existing code as a reference. It will mirror the style and patterns — more effective than written rules.

Conversation Management

Long conversations degrade output quality — older context gets compressed and loses influence.

Start fresh for each task — don’t reuse conversations across tasks
Re-anchor in long sessions: “Remember, errors follow RFC 7807”
Use checkpoints — summarize what was built after each milestone
Use Claude’s memory — “Remember that our API uses RFC 7807” persists across sessions automatically

Spec Templates

The rate limiting example above is a Feature Spec. Here’s the shape for other common task types:

Bug Fix Spec

## Bug: [Title — what's broken, not just the symptom]
 
### Observed vs Expected
[What happens] → [What should happen]
 
### Reproduction
[Concrete steps or a failing test case]
 
### Context
- Likely location: [file paths, line numbers]
- Recent changes: [relevant PRs or deploys]
- Scope: [who/what is affected]
 
### Constraints
[What must NOT change — API surface, idempotency, etc.]

Refactor Spec

## Refactor: [What's changing and why]
 
### Current State → Target State
[How it works today] → [How it should work after]
Reference files: [specific paths]
 
### Migration Strategy
- [ ] Step 1 (lowest risk)
- [ ] Step 2
- [ ] Step 3
 
### Invariants (must NOT change)
[Public API, behavior, performance characteristics]

Advanced Techniques

Spec Layering

For large features, don’t write one massive spec. Layer them with fresh conversations:

spec-v1.md  →  Data model and repository
spec-v2.md  →  API endpoints (references v1 files)
spec-v3.md  →  Frontend (references v2 files)
spec-v4.md  →  Tests and edge cases

Golden Files

Include concrete output examples in your spec. Golden files give Claude a target, not just abstract requirements:

Input:  { "userId": "abc-123", "action": "purchase", "amount": 49.99 }
Output: { "eventType": "user.purchase", "payload": { "userId": "abc-123", "amount": 49.99 }, "metadata": { "timestamp": "ISO-8601", "traceId": "uuid" } }

Especially effective for data transformations, API responses, and config generation.

Anti-Spec

List what Claude should not do. This eliminates “Claude-ish” code — over-engineered, over-commented patterns:

### DO NOT
- Wrapper functions for single-use operations
- Try-catch around operations that can't fail
- Comments that restate the code
- TypeScript `any` for convenience
- Separate files for single-type exports

Put your most common frustrations here and they stop appearing.

Custom Slash Commands: Real Workflow Automation

Claude Code has a built-in feature for building reusable workflows: custom slash commands. Instead of asking Claude to read a workflow file each time, you create a .claude/ directory structure and invoke workflows with /feature load, /feature start, /feature review — first-class commands, not conventions.

Directory Structure

.claude/
└── feature/
    ├── SKILL.md          ← Entry point (name, description, argument routing)
    └── actions/
        ├── load.md       ← /feature load
        ├── start.md      ← /feature start
        ├── review.md     ← /feature review
        ├── test.md       ← /feature test
        ├── explain.md    ← /feature explain
        └── complete.md   ← /feature complete

The Entry Point: SKILL.md

SKILL.md defines the command and routes arguments to action files:

---
name: feature
description: Manage current feature workflow - start, review, explain or complete
argument-hint: load|start|review|test|explain|complete
---
 
# Feature Workflow
 
Manages the full lifecycle of a feature from spec to merge.
 
## Working File
 
@context/current-feature.md
 
## Task
 
Execute the requested action: $ARGUMENTS
 
| Action     | Description                                |
| ---------- | ------------------------------------------ |
| `load`     | Load a feature spec or inline description  |
| `start`    | Begin implementation, create branch        |
| `review`   | Check goals met, code quality              |
| `test`     | Write tests for server actions & utilities |
| `explain`  | Document what changed and why              |
| `complete` | Commit, push, merge, reset                 |
 
See [actions/](actions/) for detailed instructions.

The frontmatter gives Claude the command name and argument hints. $ARGUMENTS passes whatever you type after /feature. The working file (context/current-feature.md) gives the workflow state that persists across steps — goals, status, and history.

Action Files

Each action is a focused markdown file with clear, numbered instructions. Here’s what actions/start.md looks like:

# Start Action
 
1. Read current-feature.md - verify Goals are populated
2. If empty, error: "Run /feature load first"
3. Set Status to "In Progress"
4. Create and checkout the feature branch (derive name from H1 heading)
5. List the goals, then implement them one by one

And actions/review.md:

# Review Action
 
1. Read current-feature.md to understand the goals
2. Review all code changes made for this feature
3. Check for:
   - ✅ Goals met
   - ❌ Goals missing or incomplete
   - ⚠️ Code quality issues or bugs
   - 🚫 Scope creep (code beyond goals)
4. Final verdict: Ready to complete or needs changes

Short, imperative, no ambiguity. Claude follows the steps exactly.

The Full Lifecycle

/feature load rate-limiting     ← Load spec, extract goals, set status
/feature start                  ← Create branch, implement goal by goal
/feature review                 ← Check goals met, flag issues
/feature test                   ← Write tests for new logic
/feature explain                ← Document what changed and why
/feature complete               ← Commit, merge, push, clean up

Each step reads the shared state file, does its job, and updates the state. The complete action handles the full git workflow — commit, merge to main, delete the feature branch, reset the state file, and push. No manual git commands.

State Management

The key pattern that makes this work across steps is a working file that tracks the current feature:

# Current Feature: Add Rate Limiting
 
## Status
In Progress
 
## Goals
- Add sliding-window rate limiting to /api/* routes
- 100 req/min per API key, 20 req/min per IP
- Use existing Redis connection
- Return 429 with Retry-After header
 
## Notes
- Do not use express-rate-limit (memory leak issues)
- Must not add >5ms p99 latency
 
## History
- [2024-03-01] Add dark mode toggle
- [2024-02-15] Refactor auth middleware

This is how the review action knows what to check against, how test knows what logic was added, and how complete knows what to write in the commit message — without you re-explaining anything.

Build Your Own

The same pattern works for any repeatable task:

.claude/
├── feature/          ← /feature load|start|review|test|explain|complete
├── bugfix/           ← /bugfix reproduce|fix|verify|complete
├── refactor/         ← /refactor audit|plan|execute|validate
└── api/              ← /api design|implement|test|document

Each gets its own SKILL.md, its own actions/ directory, and optionally its own state file. Version-control the whole .claude/ directory — new team members get your workflows on git clone.

Claude Code Features That Complement This

CLAUDE.md — Project conventions loaded automatically into every session (version-controlled, team-shared)
Memory — Personal preferences persisted across sessions (“Remember our API uses RFC 7807”)
Hooks — Shell commands that run before/after Claude actions (auto-lint, auto-format, pre-commit checks)
Custom slash commands — The .claude/ directory structure described above. Build full lifecycle workflows (/feature, /bugfix, /refactor) with state management, argument routing, and step-by-step action files

Common Mistakes

Mistake	Fix
Prompting without context	Write a spec with Context and Constraints
One giant conversation	Start fresh per task, re-anchor key constraints
Patching output with follow-up prompts	Update the spec and re-generate
Not showing existing code	Read reference files first
Over-specifying the how	Specify what and why, let Claude choose how
Never updating CLAUDE.md	Update it whenever you correct Claude

How to Know It’s Working

Working: Claude’s first output needs minor tweaks, not rewrites. Code across sessions follows the same patterns. Follow-up messages are “ship it” more than “actually, change this.”

Not working: You’re re-explaining architecture every session. Claude keeps using patterns you’ve told it not to. Every conversation is a 30-message back-and-forth.

The fix is almost always the same: your spec or CLAUDE.md is missing information Claude needs.

Getting Started

This week: Create a CLAUDE.md with your top 5 conventions and a “Do Not” list. Write one spec for your next feature and compare the output quality.

This month: Add anti-patterns to CLAUDE.md. Build one reusable workflow for your most common task type.

This quarter: Share spec templates across your team. Build workflows for all major task types. Version-control everything.

The core loop: write spec → generate code → find what the spec was missing → improve the spec. Every iteration makes the next one faster.