Claude Code Test Generation: Auto-Write Unit & Integration Tests
Yes — Claude Code can generate tests automatically. Point it at any existing function, class, or module and it will produce complete pytest or Jest test files covering happy paths, edge cases, and error conditions. You do not need to write a prompt from scratch: a single slash command is enough. Teams that use structured test generation prompts produce 2.7x more test coverage per sprint versus writing tests manually, while maintaining equivalent defect detection. For projects with 0% test coverage, Claude Code can bootstrap a full suite in 1-2 hours.
This guide covers every pattern: generating tests from existing code, TDD (write tests first), integration test scaffolding, and the write → run → fix loop that makes test generation genuinely useful rather than a one-shot party trick.
Generating Unit Tests from Existing Code
The most common use case is retrofitting tests onto code that has none. Claude Code handles this with a direct instruction or a custom slash command.
Direct instruction pattern
Open Claude Code in your project and run:
Write pytest unit tests for src/billing/invoice.py. Cover the calculate_total, apply_discount, and generate_pdf functions. Include edge cases for zero amounts, negative discounts, and missing required fields.
Claude Code reads the source file, identifies the public API, and produces a tests/test_invoice.py file. It will also check whether conftest.py exists and add fixtures if needed.
Slash command pattern (repeatable)
If you generate tests regularly, encode the instruction as a slash command in your project's .claude/commands/ directory:
<!-- .claude/commands/gen-tests.md -->
# /gen-tests
Read the file at $ARGUMENTS. Generate a complete pytest test file for every public function and class. Place the output in tests/ mirroring the source tree. Run the tests immediately with `pytest` and fix any failures before finishing.
Now you can run /gen-tests src/billing/invoice.py from any Claude Code session. For more on building custom slash commands and skills, see How to Build a Custom Claude Code Skill.
Python example: input function and generated tests
Source function (src/billing/invoice.py):
from decimal import Decimal
from typing import Optional
def calculate_total(
subtotal: Decimal,
tax_rate: Decimal,
discount: Optional[Decimal] = None,
) -> Decimal:
"""Return the final invoice total after tax and optional discount."""
if subtotal < 0:
raise ValueError("subtotal cannot be negative")
if tax_rate < 0 or tax_rate > 1:
raise ValueError("tax_rate must be between 0 and 1")
taxed = subtotal * (1 + tax_rate)
if discount is not None:
if discount < 0:
raise ValueError("discount cannot be negative")
taxed = taxed - discount
return taxed.quantize(Decimal("0.01"))
Generated tests (tests/test_invoice.py):
import pytest
from decimal import Decimal
from src.billing.invoice import calculate_total
class TestCalculateTotal:
def test_basic_calculation(self):
result = calculate_total(Decimal("100.00"), Decimal("0.10"))
assert result == Decimal("110.00")
def test_with_discount(self):
result = calculate_total(
Decimal("100.00"), Decimal("0.10"), discount=Decimal("10.00")
)
assert result == Decimal("100.00")
def test_zero_subtotal(self):
result = calculate_total(Decimal("0"), Decimal("0.10"))
assert result == Decimal("0.00")
def test_zero_tax_rate(self):
result = calculate_total(Decimal("100.00"), Decimal("0"))
assert result == Decimal("100.00")
def test_negative_subtotal_raises(self):
with pytest.raises(ValueError, match="subtotal cannot be negative"):
calculate_total(Decimal("-1.00"), Decimal("0.10"))
def test_tax_rate_above_one_raises(self):
with pytest.raises(ValueError, match="tax_rate must be between 0 and 1"):
calculate_total(Decimal("100.00"), Decimal("1.5"))
def test_negative_discount_raises(self):
with pytest.raises(ValueError, match="discount cannot be negative"):
calculate_total(
Decimal("100.00"), Decimal("0.10"), discount=Decimal("-5.00")
)
def test_result_rounded_to_two_decimals(self):
result = calculate_total(Decimal("100.00"), Decimal("0.075"))
assert result == Decimal("107.50")
Claude Code generates this output in full, immediately runs pytest tests/test_invoice.py, and fixes any assertion mismatches before reporting back.
TDD with Claude Code: Write Tests First
Test-driven development with Claude Code flips the workflow: you describe the desired behavior and Claude Code writes both the tests and the implementation that makes them pass.
TypeScript Jest example
Tell Claude Code:
I need a function validateEmail(email: string): boolean that returns true only for valid RFC 5321 addresses. Write Jest tests first using TDD, then implement the function to make them pass.
Generated tests (src/__tests__/validateEmail.test.ts):
import { validateEmail } from "../validateEmail";
describe("validateEmail", () => {
describe("valid addresses", () => {
it.each([
"user@example.com",
"user+tag@sub.domain.org",
"firstname.lastname@company.io",
"user123@example.co.uk",
])("accepts %s", (email) => {
expect(validateEmail(email)).toBe(true);
});
});
describe("invalid addresses", () => {
it.each([
["missing @ symbol", "userexample.com"],
["double @", "user@@example.com"],
["empty local part", "@example.com"],
["empty domain", "user@"],
["spaces in address", "user @example.com"],
["missing TLD", "user@example"],
["consecutive dots in domain", "user@example..com"],
])("%s: %s", (_desc, email) => {
expect(validateEmail(email)).toBe(false);
});
});
it("returns false for empty string", () => {
expect(validateEmail("")).toBe(false);
});
it("returns false for non-string input types coerced to string", () => {
expect(validateEmail(null as unknown as string)).toBe(false);
});
});
Generated implementation (src/validateEmail.ts):
export function validateEmail(email: string): boolean {
if (!email || typeof email !== "string") return false;
// RFC 5321 simplified: local@domain.tld, no consecutive dots in domain
const pattern =
/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)+$/;
return pattern.test(email);
}
Claude Code runs npx jest src/__tests__/validateEmail.test.ts, checks that all cases pass, and reports the result. If a test fails due to a subtlety in the regex, it revises the implementation in-place without being asked.
Upgrade Your Test Workflow
If you are writing repetitive test boilerplate, the prompts in the P1 Power Prompts 300 collection include a full TDD workflow prompt, a test-coverage audit prompt, and a fixture generation prompt. These are the exact prompts used in the examples throughout this guide.
Integration Test Patterns
Integration tests are harder because they cross module boundaries and often need databases, HTTP clients, or message queues. Claude Code handles this well when you give it the full call chain.
Database integration test (pytest + SQLAlchemy)
Write pytest integration tests for the UserRepository class in src/repository/user.py. Use an in-memory SQLite database via SQLAlchemy. Cover create, read, update, delete, and the find_by_email query.
Claude Code will:
- Detect that
UserRepositorydepends on aSessionobject - Create a
conftest.pywith anin_memory_dbfixture usingcreate_engine("sqlite:///:memory:") - Write tests that use the fixture and roll back after each test
- Run
pytest tests/integration/and fix any SQLAlchemy version incompatibilities
Resulting conftest.py:
import pytest
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from src.models import Base
@pytest.fixture(scope="function")
def db_session():
engine = create_engine("sqlite:///:memory:", echo=False)
Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
session = Session()
yield session
session.rollback()
session.close()
Base.metadata.drop_all(engine)
HTTP API integration test (Jest + supertest)
Write supertest integration tests for the Express router in src/routes/users.ts. Cover POST /users (create), GET /users/:id, PUT /users/:id, and DELETE /users/:id. Mock the database layer with jest.mock.
Claude Code will scaffold tests using supertest and jest.mock("../db"), set up beforeEach/afterEach cleanup, and assert on status codes, response bodies, and error messages.
Running Tests in a Loop: Write → Run → Fix
The most powerful pattern is not one-shot test generation — it is the iterative loop where Claude Code writes tests, runs them, reads the failure output, and fixes either the tests or the implementation until everything is green.
Trigger this explicitly:
Generate tests for src/payments/stripe_client.py, run them, and keep iterating until all tests pass. Do not stop until pytest exits 0.
Claude Code will:
- Write an initial test file
- Run
pytestand capture stdout/stderr - Parse failure messages (import errors, assertion mismatches, missing fixtures)
- Apply targeted fixes
- Re-run, repeat until clean
For long-running cycles, you can encode this loop as a Claude Code hook that fires pytest automatically after every Write tool call to a tests/ file:
{
"hooks": {
"PostToolUse": [
{
"matcher": "Write",
"hooks": [
{
"type": "command",
"command": "if echo \"$CLAUDE_FILE_PATH\" | grep -q '^tests/'; then pytest \"$CLAUDE_FILE_PATH\" --tb=short 2>&1 | tail -20; fi"
}
]
}
]
}
}
This hook runs pytest automatically on every test file Claude Code writes, piping the tail of the output back into Claude Code's context so it can act on failures without being asked.
CLAUDE.md Configuration for Test Preferences
Add a ## Testing section to your project's CLAUDE.md to encode your team's test preferences once rather than repeating them in every prompt:
## Testing
- Test framework: pytest (Python), Jest + supertest (TypeScript)
- Test file location: tests/ for Python, src/__tests__/ for TypeScript
- Naming convention: test_{module}.py, {module}.test.ts
- Fixtures: use pytest fixtures in conftest.py; avoid global state
- Mocking: prefer pytest-mock (mocker fixture); avoid monkeypatching stdlib
- Coverage minimum: 80% line coverage; run with pytest --cov=src --cov-fail-under=80
- Always run tests after writing them; fix failures before reporting done
- Integration tests use in-memory SQLite, not production database
- Never commit tests that skip with pytest.mark.skip without a JIRA ticket reference
With this in place, every test generation request automatically follows your conventions without additional instruction. For a full reference on CLAUDE.md patterns that affect tool behavior, see the Claude Code Complete Guide.
Common Pitfalls and How to Avoid Them
1. Tests that pass because assertions are wrong
Claude Code occasionally generates tests like assert result is not None that pass trivially. Fix this by adding to your CLAUDE.md: "All assertions must be specific — no is not None, no assert True. Assert exact values or specific exception types."
2. Over-mocking hides real bugs
When Claude Code mocks too aggressively, integration tests become useless. Instruct it explicitly: "Mock only external I/O (HTTP, database calls to external services). Do not mock internal modules."
3. Tests tied to implementation details
Tests that assert on internal state (private attributes, intermediate variables) break when you refactor. Tell Claude Code: "Test only the public API. Do not access private attributes or internal methods."
4. Missing edge cases for async code
For async/await functions, Claude Code may forget to test timeout and cancellation paths. Add to your CLAUDE.md: "For async functions, include tests for timeout (use pytest-timeout or jest.useFakeTimers) and cancellation handling."
5. pytest fixture scope mismatches
Using a session-scoped database fixture with tests that mutate state causes cross-test contamination. Claude Code defaults to scope="function" which is usually correct, but verify this when tests pass in isolation but fail in suite.
Get More Claude Code Productivity Patterns
The P1 Power Prompts 300 collection includes 300 battle-tested prompts for Claude Code, covering test generation, code review, refactoring, documentation, and debugging. If you work with Claude Code daily, these prompts will cut the time you spend writing instructions by a significant margin.
Frequently Asked Questions
Can Claude Code generate tests for code it did not write?
Yes. Claude Code reads any source file you point it at, regardless of who wrote it. It infers expected behavior from the implementation — function signatures, docstrings, type hints, and internal logic. For undocumented code with no type hints, it still generates useful tests, though it may add comments noting assumptions it made about intended behavior.
Does Claude Code update tests when the source code changes?
Not automatically — but you can make it do so. Add a PostToolUse hook (see the Claude Code Hooks guide) that detects writes to source files and triggers a prompt to check whether the corresponding test file needs updating. Alternatively, instruct Claude Code at the start of a session: "Before finishing any task that edits source files, check whether the corresponding tests in tests/ need to be updated and update them."
What test frameworks does Claude Code support?
Claude Code is framework-agnostic. It can write tests for pytest, unittest, nose2 (Python), Jest, Vitest, Mocha, Jasmine (JavaScript/TypeScript), RSpec (Ruby), JUnit (Java), Go's built-in testing package, Rust's #[test] attribute, and others. Specify the framework in your instruction or in CLAUDE.md and it will follow that convention. If you do not specify one, Claude Code picks the most common framework for the language it detects.
How do I prevent Claude Code from overwriting existing test files?
Add to your CLAUDE.md: "Before writing any test file, check whether it already exists. If it does, add new test cases to the existing file rather than replacing it. Never delete existing test cases." You can also enforce this with a PreToolUse hook that intercepts Write calls targeting tests/ and checks for existing content before allowing the write.
Can Claude Code achieve 100% branch coverage?
In practice, no — and 100% branch coverage is rarely the right goal. Claude Code will cover the branches it can infer from the source, which typically yields 80–90% line coverage on well-structured code. Branches that require specific infrastructure state (specific time of day, external service responses, OS signals) may require manual tests. Use pytest --cov=src --cov-report=term-missing to identify uncovered lines, then ask Claude Code to fill the gaps for specific uncovered branches.