Claude Code and TDD: Testing Patterns That Actually Work
Claude Code produces its best code when given tests first. Writing the tests yourself and asking Claude Code to make them pass is faster than asking Claude to write code and tests simultaneously — and produces more reliable implementations. This guide covers the TDD-first workflow, test generation for existing code, and patterns that avoid the common failure modes of AI-assisted testing. For a comprehensive Claude Code overview, visit the Claude Code Complete Guide.
Why TDD + Claude Code works better than asking for both
When you ask Claude Code "implement this feature and write tests", you often get:
- Tests that only cover the happy path
- Tests written for the code rather than against the spec
- Missing edge cases that the implementation also misses
When you write the tests first and ask Claude Code to make them pass:
- The tests encode your specification exactly
- Claude Code cannot misinterpret ambiguous requirements
- Edge cases you thought of while writing tests get handled
- You catch regressions immediately because the tests already exist
The trade-off: you spend more time writing tests upfront. But for non-trivial features, this investment pays back within the same session.
The test-first workflow
Step 1: Write tests that define the complete specification
// __tests__/rate-limiter.test.ts
// Write this BEFORE implementing lib/rate-limiter.ts
import { RateLimiter } from "../lib/rate-limiter";
describe("RateLimiter", () => {
test("allows requests within limit", async () => {
const limiter = new RateLimiter({ max: 5, windowMs: 60_000 });
for (let i = 0; i < 5; i++) {
expect(await limiter.check("user-1")).toBe(true);
}
});
test("blocks requests over limit", async () => {
const limiter = new RateLimiter({ max: 5, windowMs: 60_000 });
for (let i = 0; i < 5; i++) {
await limiter.check("user-1");
}
expect(await limiter.check("user-1")).toBe(false);
});
test("isolates limits per key", async () => {
const limiter = new RateLimiter({ max: 5, windowMs: 60_000 });
for (let i = 0; i < 5; i++) {
await limiter.check("user-1");
}
// user-2 should still be under limit
expect(await limiter.check("user-2")).toBe(true);
});
test("resets after window expires", async () => {
const limiter = new RateLimiter({ max: 2, windowMs: 100 });
await limiter.check("user-1");
await limiter.check("user-1");
expect(await limiter.check("user-1")).toBe(false);
await new Promise(r => setTimeout(r, 150)); // wait for window
expect(await limiter.check("user-1")).toBe(true);
});
test("returns remaining count", async () => {
const limiter = new RateLimiter({ max: 5, windowMs: 60_000 });
const result = await limiter.checkWithInfo("user-1");
expect(result.allowed).toBe(true);
expect(result.remaining).toBe(4);
});
});
Step 2: Give the tests to Claude Code
You: The tests in __tests__/rate-limiter.test.ts are all failing
because lib/rate-limiter.ts doesn't exist yet.
Implement the RateLimiter class to make all tests pass.
Requirements:
- In-memory storage only (no Redis required for the tests)
- TypeScript
- Export the class as a named export
Claude Code reads the tests, understands the full API contract from them, and implements the class. Every edge case you encoded in the tests gets handled.
Test generation for existing code
When you have existing code without tests, Claude Code generates tests efficiently:
For a specific function:
You: Write comprehensive tests for the parseWebhookPayload() function
in lib/webhooks.ts. Cover:
- Valid Stripe webhook payloads
- Missing required fields
- Invalid signature
- Malformed JSON
- All supported event types
For a full module:
You: Generate a complete test suite for lib/payments.ts.
Look at the existing code and identify all public functions,
edge cases, and error paths. Aim for >80% coverage.
For a REST API:
You: Write integration tests for app/api/users/route.ts.
Test: GET /users (list), POST /users (create),
GET /users/:id (get), PUT /users/:id (update),
DELETE /users/:id (delete).
Include: auth required checks, validation errors, not-found cases.
Patterns that work reliably
Pattern 1: Tests as executable specification
Write tests before discussing implementation. This forces clarity:
// Vague spec: "implement user authentication"
// Precise spec as tests:
test("login with valid credentials returns JWT", async () => {
const result = await auth.login("user@example.com", "correct-password");
expect(result.token).toBeDefined();
expect(typeof result.token).toBe("string");
expect(result.expiresIn).toBe(3600);
});
test("login with invalid password returns 401 error", async () => {
await expect(
auth.login("user@example.com", "wrong-password")
).rejects.toThrow("Invalid credentials");
});
test("login with non-existent user returns 401 error", async () => {
// Same error as wrong password — don't leak user existence
await expect(
auth.login("ghost@example.com", "any-password")
).rejects.toThrow("Invalid credentials");
});
When you hand these tests to Claude Code, the ambiguity is gone.
Pattern 2: One assertion per test
// Bad: multiple assertions, harder to debug failures
test("user signup", async () => {
const user = await createUser({ email: "test@example.com" });
expect(user.id).toBeDefined();
expect(user.email).toBe("test@example.com");
expect(user.createdAt).toBeDefined();
expect(user.passwordHash).not.toBe("plaintext");
// 4 things can fail — hard to identify which
});
// Better: focused tests
test("createUser returns an id", async () => {
const user = await createUser({ email: "test@example.com" });
expect(user.id).toBeDefined();
});
test("createUser hashes the password", async () => {
const user = await createUser({ email: "test@example.com", password: "secret" });
expect(user.passwordHash).not.toBe("secret");
});
Pattern 3: Explicit test names that read as documentation
// Vague
test("user tests")
test("handles error")
test("works correctly")
// Specific (Claude Code also writes better implementations when tests are explicit)
test("createUser throws ValidationError when email is missing")
test("createUser throws ConflictError when email already exists")
test("createUser sets emailVerified to false on initial creation")
Making Claude Code tests trustworthy
AI-generated tests have two common failure modes:
1. Tests that only test the happy path
When Claude Code writes tests for existing code, it tends to test what the code does, not what it should do. Counter this with explicit instructions:
You: Generate tests for lib/payments.ts.
Include ALL of these scenarios:
- Successful payment processing
- Declined card (error code card_declined)
- Insufficient funds (error code insufficient_funds)
- Network timeout from Stripe
- Invalid amount (negative, zero, over limit)
- Missing required fields (card number, expiry, CVV)
- Duplicate transaction prevention
2. Tests that pass but don't verify the right thing
// Weak test: passes but doesn't verify the actual behavior
test("sendEmail is called", async () => {
const spy = jest.spyOn(emailService, "sendEmail");
await processOrder(order);
expect(spy).toHaveBeenCalled(); // just checks it was called, not with what
});
// Ask Claude Code to make tests more specific:
// "The test for sendEmail should verify the recipient, subject, and
// that the order ID appears in the body."
test("processOrder sends confirmation to buyer with order details", async () => {
const spy = jest.spyOn(emailService, "sendEmail");
await processOrder({ id: "order-123", buyerEmail: "buyer@example.com" });
expect(spy).toHaveBeenCalledWith(
expect.objectContaining({
to: "buyer@example.com",
subject: expect.stringContaining("Order Confirmed"),
body: expect.stringContaining("order-123"),
})
);
});
Python testing with pytest
The same patterns apply in Python:
# tests/test_rate_limiter.py — write first
import pytest
from app.rate_limiter import RateLimiter
def test_allows_requests_within_limit():
limiter = RateLimiter(max_requests=5, window_seconds=60)
for _ in range(5):
assert limiter.check("user-1") is True
def test_blocks_requests_over_limit():
limiter = RateLimiter(max_requests=5, window_seconds=60)
for _ in range(5):
limiter.check("user-1")
assert limiter.check("user-1") is False
@pytest.mark.parametrize("invalid_input", [None, "", 0, -1])
def test_rejects_invalid_keys(invalid_input):
limiter = RateLimiter(max_requests=5, window_seconds=60)
with pytest.raises(ValueError):
limiter.check(invalid_input)
Then:
You: The tests in tests/test_rate_limiter.py are failing because
app/rate_limiter.py doesn't exist. Implement the RateLimiter class
in Python to make all tests pass.
Frequently Asked Questions
Does Claude Code write tests as part of implementing features? By default, yes — Claude Code typically includes tests when implementing a feature. But these tests are written after the implementation, which means they test what the code does rather than what it should do. The TDD pattern in this guide gives you better-specified tests.
How do I get Claude Code to improve test coverage on existing tests? Ask specifically: "Run the tests in tests/payments.test.ts and identify what scenarios are not covered. Then add tests for the missing cases, focusing on error paths and edge cases." This is more effective than "add more tests."
Can Claude Code run tests and fix failures automatically? Yes. After Claude Code implements code, you can ask "run the tests and fix any failures." Claude Code will run the test suite, read the failure output, diagnose the issues, and fix them — iterating until all tests pass. For complex failures, it will explain what was wrong.
Should I let Claude Code modify my tests to make them pass? Generally no. If Claude Code implements something that doesn't match your tests, the right response is to fix the implementation, not loosen the tests. Exceptions: if your tests have bugs (wrong expected values, wrong setup), Claude Code can fix those too — but ask explicitly rather than letting it modify tests implicitly. For a workflow guide on structuring Claude Code sessions for multi-file work, see Claude Code Plan Mode.
How do I use TDD with multi-file features that span several modules?
Use Claude Code's plan mode first: write the tests, then ask Claude to /plan the implementation before writing any code. The plan will show which files it intends to change. Review it against your tests to verify alignment before approving execution. This two-step approach — tests define the spec, plan mode shows the scope — prevents the most common TDD failure mode: tests that pass but don't cover the right integration boundary.
Take It Further
Claude Code Power Prompts 300 — The testing section includes 30 prompts for test generation, TDD workflows, coverage improvement, and debugging test failures — all ready to copy-paste into your Claude Code session.
→ Get Claude Code Power Prompts — $29
30-day money-back guarantee. Instant download.