Spec-Driven Development with Claude: A Professional Guide to Consistent AI-Assisted Coding
Learn how to use spec-driven development with Claude to get consistent, high-quality results. Master context engineering, structured prompting, and professional workflows that turn AI into a reliable development partner.
Spec-Driven Development with Claude: A Professional Guide
The difference between amateurs and professionals using AI isn’t the model — it’s the spec.
Most developers use Claude the way they’d Google a question: type something vague, hope for the best, retry when it misses. This works for quick scripts. It falls apart for production software.
Spec-driven development is the practice of writing a structured specification before you prompt — then using that spec as the contract between you and the model. It’s the single most effective technique for getting consistent, high-quality code from Claude.
This post covers: why specs matter, how to write them, context engineering, and reusable agent workflows for repetitive tasks.
The Same Task, Two Ways
Without a Spec
Add rate limiting to our API
Claude installs express-rate-limit, uses in-memory storage, applies a blanket 100 req/min limit, and invents its own error format. It works in isolation — but doesn’t know you have Redis, doesn’t match your error format, and won’t survive a restart. You spend 30 minutes rewriting it.
With a Spec
## Spec: Add Rate Limiting to API Routes
### Goal
Add rate limiting to all /api/* routes to prevent abuse.
We've had three incidents this quarter from bot traffic.
### Context
- Framework: Express on Node.js 22
- Existing middleware chain: auth → validate → handler
- Redis is already available via `src/lib/redis.ts`
- No rate limiting exists today
### Requirements
- 100 req/min per API key (authenticated), 20 req/min per IP (unauthenticated)
- Return 429 with `Retry-After` header when exceeded
- State must survive restarts (use Redis)
- Must not add >5ms p99 latency
### Constraints
- Do not use express-rate-limit (memory leak issues)
- Do not modify the auth middleware
- Must work with our existing Redis connection pool
- Tests must use real Redis, not mocks
### Examples
HTTP/1.1 429 Too Many Requests
Retry-After: 47
Content-Type: application/json
{"error": "rate_limit_exceeded", "retryAfter": 47}Claude produces a custom sliding-window implementation using your Redis connection, slots it into your middleware chain, matches your error format, and writes tests against real Redis. First attempt is production-ready.
30 lines. 5 minutes to write. Eliminates entire categories of rework.
The core insight: the quality of your output is bounded by the quality of your specification. When Claude generates code that doesn’t fit your project, it’s not hallucinating — it’s doing its best with incomplete information.
The Three-Step Workflow
1. Write the Spec Before You Prompt
A spec covers five things, each eliminating a different type of guesswork:
| Section | What it prevents |
|---|---|
| Goal — What and why | Claude solving the wrong problem |
| Context — What exists | Claude ignoring your architecture |
| Requirements — What must be true | ”It works but it’s incomplete” output |
| Constraints — What to avoid | Choices you’ll have to undo |
| Examples — What success looks like | Format and structure mismatches |
2. Feed the Spec as Your Primary Input
Don’t drip-feed requirements across messages. Give Claude the full spec upfront:
- CLI: “Read
docs/specs/rate-limiting.mdand implement it” - IDE: Select the spec file and include it in chat
- CLAUDE.md: Put project-wide conventions there so they load automatically
3. Iterate on the Spec, Not the Code
When output isn’t right, don’t patch with follow-up prompts. Ask: “What was missing from my spec?” Then update it:
- Claude used a wrong library → add to Constraints
- Claude invented a format → add to Examples
- Claude missed an existing utility → add to Context
- Claude misinterpreted a requirement → make it specific in Requirements
The spec is the source of truth. The code is the artifact.
Context Engineering: The Real Skill
Context engineering is deliberately shaping everything the model sees — not just your prompt, but files, conventions, and constraints in the model’s working memory. Not all context is equal: relevant context produces relevant output, irrelevant context produces noise.
The Four Layers
┌─────────────┐
│ Your Prompt │ ← Strongest signal (most recent)
├─────────────┤
│ Active Files │ ← Shapes style and integration
├─────────────┤
│ Project Rules│ ← CLAUDE.md (auto-loaded)
├─────────────┤
│ Codebase │ ← Discoverable, but only when prompted
└─────────────┘
Project Rules (CLAUDE.md)
The most underused power feature. Loaded into every conversation automatically — encode decisions that should never be re-discussed:
# CLAUDE.md
## Architecture
- Monorepo with Turborepo
- Backend: Express + TypeScript in /apps/api
- Frontend: Next.js 15 in /apps/web
## Conventions
- Named exports only, never default exports
- Error responses follow RFC 7807
- All DB queries go through repository classes
- Zod for runtime validation at API boundaries
## Testing
- Unit: Vitest with in-memory doubles
- Integration: Vitest with real Postgres (testcontainers)
- Never mock the database in integration tests
## Do Not
- Do not use `any` — use `unknown` and narrow
- Do not add barrel files (index.ts re-exports)
- Do not use classes for React componentsThis transfers institutional knowledge into Claude’s context. Without it, you’re re-explaining your architecture every session. You can also create nested CLAUDE.md files in subdirectories for area-specific conventions.
Active Files
Always make Claude read relevant files before generating code. The most common mistake: asking Claude to write code that integrates with existing code without showing it the existing code.
- “Read
src/middleware/auth.tsbefore implementing” - “Follow the same pattern as
src/routes/users.ts”
Pro tip: Show Claude your best existing code as a reference. It will mirror the style and patterns — more effective than written rules.
Conversation Management
Long conversations degrade output quality — older context gets compressed and loses influence.
- Start fresh for each task — don’t reuse conversations across tasks
- Re-anchor in long sessions: “Remember, errors follow RFC 7807”
- Use checkpoints — summarize what was built after each milestone
- Use Claude’s memory — “Remember that our API uses RFC 7807” persists across sessions automatically
Spec Templates
The rate limiting example above is a Feature Spec. Here’s the shape for other common task types:
Bug Fix Spec
## Bug: [Title — what's broken, not just the symptom]
### Observed vs Expected
[What happens] → [What should happen]
### Reproduction
[Concrete steps or a failing test case]
### Context
- Likely location: [file paths, line numbers]
- Recent changes: [relevant PRs or deploys]
- Scope: [who/what is affected]
### Constraints
[What must NOT change — API surface, idempotency, etc.]Refactor Spec
## Refactor: [What's changing and why]
### Current State → Target State
[How it works today] → [How it should work after]
Reference files: [specific paths]
### Migration Strategy
- [ ] Step 1 (lowest risk)
- [ ] Step 2
- [ ] Step 3
### Invariants (must NOT change)
[Public API, behavior, performance characteristics]Advanced Techniques
Spec Layering
For large features, don’t write one massive spec. Layer them with fresh conversations:
spec-v1.md → Data model and repository
spec-v2.md → API endpoints (references v1 files)
spec-v3.md → Frontend (references v2 files)
spec-v4.md → Tests and edge cases
Golden Files
Include concrete output examples in your spec. Golden files give Claude a target, not just abstract requirements:
Input: { "userId": "abc-123", "action": "purchase", "amount": 49.99 }
Output: { "eventType": "user.purchase", "payload": { "userId": "abc-123", "amount": 49.99 }, "metadata": { "timestamp": "ISO-8601", "traceId": "uuid" } }Especially effective for data transformations, API responses, and config generation.
Anti-Spec
List what Claude should not do. This eliminates “Claude-ish” code — over-engineered, over-commented patterns:
### DO NOT
- Wrapper functions for single-use operations
- Try-catch around operations that can't fail
- Comments that restate the code
- TypeScript `any` for convenience
- Separate files for single-type exportsPut your most common frustrations here and they stop appearing.
Custom Slash Commands: Real Workflow Automation
Claude Code has a built-in feature for building reusable workflows: custom slash commands. Instead of asking Claude to read a workflow file each time, you create a .claude/ directory structure and invoke workflows with /feature load, /feature start, /feature review — first-class commands, not conventions.
Directory Structure
.claude/
└── feature/
├── SKILL.md ← Entry point (name, description, argument routing)
└── actions/
├── load.md ← /feature load
├── start.md ← /feature start
├── review.md ← /feature review
├── test.md ← /feature test
├── explain.md ← /feature explain
└── complete.md ← /feature complete
The Entry Point: SKILL.md
SKILL.md defines the command and routes arguments to action files:
---
name: feature
description: Manage current feature workflow - start, review, explain or complete
argument-hint: load|start|review|test|explain|complete
---
# Feature Workflow
Manages the full lifecycle of a feature from spec to merge.
## Working File
@context/current-feature.md
## Task
Execute the requested action: $ARGUMENTS
| Action | Description |
| ---------- | ------------------------------------------ |
| `load` | Load a feature spec or inline description |
| `start` | Begin implementation, create branch |
| `review` | Check goals met, code quality |
| `test` | Write tests for server actions & utilities |
| `explain` | Document what changed and why |
| `complete` | Commit, push, merge, reset |
See [actions/](actions/) for detailed instructions.The frontmatter gives Claude the command name and argument hints. $ARGUMENTS passes whatever you type after /feature. The working file (context/current-feature.md) gives the workflow state that persists across steps — goals, status, and history.
Action Files
Each action is a focused markdown file with clear, numbered instructions. Here’s what actions/start.md looks like:
# Start Action
1. Read current-feature.md - verify Goals are populated
2. If empty, error: "Run /feature load first"
3. Set Status to "In Progress"
4. Create and checkout the feature branch (derive name from H1 heading)
5. List the goals, then implement them one by oneAnd actions/review.md:
# Review Action
1. Read current-feature.md to understand the goals
2. Review all code changes made for this feature
3. Check for:
- ✅ Goals met
- ❌ Goals missing or incomplete
- ⚠️ Code quality issues or bugs
- 🚫 Scope creep (code beyond goals)
4. Final verdict: Ready to complete or needs changesShort, imperative, no ambiguity. Claude follows the steps exactly.
The Full Lifecycle
/feature load rate-limiting ← Load spec, extract goals, set status
/feature start ← Create branch, implement goal by goal
/feature review ← Check goals met, flag issues
/feature test ← Write tests for new logic
/feature explain ← Document what changed and why
/feature complete ← Commit, merge, push, clean up
Each step reads the shared state file, does its job, and updates the state. The complete action handles the full git workflow — commit, merge to main, delete the feature branch, reset the state file, and push. No manual git commands.
State Management
The key pattern that makes this work across steps is a working file that tracks the current feature:
# Current Feature: Add Rate Limiting
## Status
In Progress
## Goals
- Add sliding-window rate limiting to /api/* routes
- 100 req/min per API key, 20 req/min per IP
- Use existing Redis connection
- Return 429 with Retry-After header
## Notes
- Do not use express-rate-limit (memory leak issues)
- Must not add >5ms p99 latency
## History
- [2024-03-01] Add dark mode toggle
- [2024-02-15] Refactor auth middlewareThis is how the review action knows what to check against, how test knows what logic was added, and how complete knows what to write in the commit message — without you re-explaining anything.
Build Your Own
The same pattern works for any repeatable task:
.claude/
├── feature/ ← /feature load|start|review|test|explain|complete
├── bugfix/ ← /bugfix reproduce|fix|verify|complete
├── refactor/ ← /refactor audit|plan|execute|validate
└── api/ ← /api design|implement|test|document
Each gets its own SKILL.md, its own actions/ directory, and optionally its own state file. Version-control the whole .claude/ directory — new team members get your workflows on git clone.
Claude Code Features That Complement This
- CLAUDE.md — Project conventions loaded automatically into every session (version-controlled, team-shared)
- Memory — Personal preferences persisted across sessions (“Remember our API uses RFC 7807”)
- Hooks — Shell commands that run before/after Claude actions (auto-lint, auto-format, pre-commit checks)
- Custom slash commands — The
.claude/directory structure described above. Build full lifecycle workflows (/feature,/bugfix,/refactor) with state management, argument routing, and step-by-step action files
Common Mistakes
| Mistake | Fix |
|---|---|
| Prompting without context | Write a spec with Context and Constraints |
| One giant conversation | Start fresh per task, re-anchor key constraints |
| Patching output with follow-up prompts | Update the spec and re-generate |
| Not showing existing code | Read reference files first |
| Over-specifying the how | Specify what and why, let Claude choose how |
| Never updating CLAUDE.md | Update it whenever you correct Claude |
How to Know It’s Working
Working: Claude’s first output needs minor tweaks, not rewrites. Code across sessions follows the same patterns. Follow-up messages are “ship it” more than “actually, change this.”
Not working: You’re re-explaining architecture every session. Claude keeps using patterns you’ve told it not to. Every conversation is a 30-message back-and-forth.
The fix is almost always the same: your spec or CLAUDE.md is missing information Claude needs.
Getting Started
This week: Create a CLAUDE.md with your top 5 conventions and a “Do Not” list. Write one spec for your next feature and compare the output quality.
This month: Add anti-patterns to CLAUDE.md. Build one reusable workflow for your most common task type.
This quarter: Share spec templates across your team. Build workflows for all major task types. Version-control everything.
The core loop: write spec → generate code → find what the spec was missing → improve the spec. Every iteration makes the next one faster.
Further Reading
- Anthropic’s Guide to Prompt Engineering — foundational principles for structuring inputs to Claude
- Claude Code Documentation — CLAUDE.md, memory, hooks, and project setup
- Building Effective Agents — Anthropic — research on agentic workflows and tool use