Claude Code Automated Test Generation: Complete Guide (2026)
Claude Code generates unit tests, integration tests, and edge case coverage from existing source code — you provide the function or module, specify the test framework (pytest, Jest, Vitest), and Claude produces test files that can run immediately with minimal editing. The best results come from structured prompts that specify what to test, what frameworks to use, and what edge cases to emphasize. This guide covers prompt patterns, framework-specific examples, and a workflow for generating tests at scale.
Why AI-Generated Tests Are Worth Using
Writing tests is high-value but time-consuming. In a study of 20 engineering teams using Claude Code for test generation, teams that used structured test generation prompts produced 2.7x more test coverage per sprint than teams writing tests manually — while maintaining the same defect detection rate.
The key insight: Claude excels at generating mechanical test variations (boundary values, null inputs, type errors) that humans routinely skip. It does not replace test design judgment — you still decide what behavior matters — but it eliminates the boilerplate.
The Core Test Generation Prompt Pattern
Every effective test generation prompt includes:
- The source code to test
- The test framework and assertion style
- What categories of tests to generate
- Any known edge cases to cover
- What NOT to mock (or what to mock)
Template:
Here is [filename] containing [class/function name].
Generate a test file using [pytest/Jest/Vitest].
Cover:
- Happy path (expected inputs)
- Boundary values (empty, zero, max length, etc.)
- Error cases (invalid input, exception handling)
- [Any domain-specific edge cases you know about]
Framework details:
- Use [pytest fixtures / Jest beforeEach / describe blocks]
- Mock [external dependencies: DB, HTTP calls, filesystem]
- Do NOT mock [internal pure functions]
Output only the test file. No explanations.
Python / pytest Examples
Scenario 1: Pure function
# src/pricing.py
def calculate_discount(price: float, tier: str, quantity: int) -> float:
"""Returns discounted price based on customer tier and quantity."""
if price < 0:
raise ValueError("Price cannot be negative")
if quantity <= 0:
raise ValueError("Quantity must be positive")
base_discount = {"free": 0.0, "basic": 0.05, "pro": 0.15, "enterprise": 0.25}
if tier not in base_discount:
raise ValueError(f"Unknown tier: {tier}")
discount = base_discount[tier]
if quantity >= 100:
discount += 0.05 # bulk bonus
return round(price * (1 - discount), 2)
Prompt:
Here is src/pricing.py with calculate_discount().
Generate tests/test_pricing.py using pytest.
Cover all tiers, quantity thresholds, boundary values, and all exception paths.
Use pytest.mark.parametrize where appropriate.
Claude output:
# tests/test_pricing.py
import pytest
from src.pricing import calculate_discount
class TestCalculateDiscount:
# Happy path: all tiers
@pytest.mark.parametrize("tier,expected_discount", [
("free", 0.0),
("basic", 0.05),
("pro", 0.15),
("enterprise", 0.25),
])
def test_standard_discount_by_tier(self, tier, expected_discount):
result = calculate_discount(100.0, tier, 1)
assert result == round(100.0 * (1 - expected_discount), 2)
# Bulk bonus threshold
def test_bulk_bonus_applied_at_100_units(self):
# Pro: 0.15 + 0.05 bulk = 0.20
assert calculate_discount(100.0, "pro", 100) == 80.0
def test_bulk_bonus_not_applied_below_100(self):
assert calculate_discount(100.0, "pro", 99) == 85.0
# Boundary values
def test_zero_price(self):
assert calculate_discount(0.0, "pro", 1) == 0.0
def test_fractional_price(self):
result = calculate_discount(10.99, "basic", 1)
assert result == round(10.99 * 0.95, 2)
# Error cases
def test_raises_on_negative_price(self):
with pytest.raises(ValueError, match="Price cannot be negative"):
calculate_discount(-1.0, "pro", 1)
def test_raises_on_zero_quantity(self):
with pytest.raises(ValueError, match="Quantity must be positive"):
calculate_discount(100.0, "pro", 0)
def test_raises_on_negative_quantity(self):
with pytest.raises(ValueError, match="Quantity must be positive"):
calculate_discount(100.0, "pro", -5)
def test_raises_on_unknown_tier(self):
with pytest.raises(ValueError, match="Unknown tier: gold"):
calculate_discount(100.0, "gold", 1)
Scenario 2: Class with external dependencies
Here is src/user_service.py with UserService class.
It has create_user(), get_user(), and delete_user() methods.
It depends on: database session (db_session), email service (email_client).
Generate tests/test_user_service.py using pytest.
Use pytest-mock for mocking.
Test that:
- create_user calls db_session.add() and email_client.send_welcome()
- get_user returns None when user doesn't exist
- delete_user raises UserNotFoundError when user doesn't exist
- All DB exceptions are caught and re-raised as ServiceError
50 test generation prompts — pytest, Jest, Vitest, and more
Power Prompts ($29) includes test generation templates for all common frameworks, plus patterns for fixtures, mocking strategies, and parametrized edge case coverage.
JavaScript / TypeScript — Jest and Vitest
Jest example for a utility function
// src/utils/formatCurrency.ts
export function formatCurrency(
amount: number,
currency: string = "USD",
locale: string = "en-US"
): string {
if (!isFinite(amount)) throw new Error("Amount must be finite");
return new Intl.NumberFormat(locale, {
style: "currency",
currency,
}).format(amount);
}
Prompt:
Here is src/utils/formatCurrency.ts.
Generate src/utils/__tests__/formatCurrency.test.ts using Jest.
Cover: positive, negative, zero, large numbers, different currencies (USD, EUR, KRW),
Infinity, NaN, and different locales. Use describe/it blocks.
Claude output:
// src/utils/__tests__/formatCurrency.test.ts
import { formatCurrency } from "../formatCurrency";
describe("formatCurrency", () => {
describe("USD formatting", () => {
it("formats a positive amount", () => {
expect(formatCurrency(1234.56)).toBe("$1,234.56");
});
it("formats zero", () => {
expect(formatCurrency(0)).toBe("$0.00");
});
it("formats a negative amount", () => {
expect(formatCurrency(-99.99)).toBe("-$99.99");
});
it("formats a large number", () => {
expect(formatCurrency(1_000_000)).toBe("$1,000,000.00");
});
});
describe("different currencies", () => {
it("formats EUR", () => {
const result = formatCurrency(100, "EUR", "de-DE");
expect(result).toContain("100");
expect(result).toContain("€");
});
it("formats KRW without decimals", () => {
const result = formatCurrency(50000, "KRW", "ko-KR");
expect(result).toContain("50,000");
});
});
describe("error cases", () => {
it("throws on Infinity", () => {
expect(() => formatCurrency(Infinity)).toThrow("Amount must be finite");
});
it("throws on NaN", () => {
expect(() => formatCurrency(NaN)).toThrow("Amount must be finite");
});
});
});
Vitest with async functions and API mocking
Here is src/api/userApi.ts with fetchUser(id: string) that calls fetch().
Generate src/api/__tests__/userApi.test.ts using Vitest.
Mock the global fetch function using vi.stubGlobal.
Test: successful fetch, 404 response, network error (fetch throws), and invalid JSON.
Integration Test Generation
For integration tests that touch the database, Claude needs the schema context:
Here is the User model (Prisma schema below) and UserRepository class.
Generate integration tests using Jest + testcontainers (PostgreSQL).
Setup: spin up a postgres container, run migrations, seed minimal test data.
Test: createUser, findByEmail (found and not found), updateUser, deleteUser.
Teardown: drop all data after each test.
Prisma schema:
[paste schema]
UserRepository:
[paste class]
Integration tests require more scaffolding context — always provide the full dependency setup (ORM config, connection strings, migration command) in your prompt.
Generating Tests for Untested Legacy Code
When dealing with legacy code that has no tests, use a two-step approach:
Step 1: Characterization tests (document current behavior)
Here is src/legacy/invoice.py written in 2019 with no tests.
I don't know what all the edge cases are. Generate characterization tests
that document what the code CURRENTLY does, even if some behavior seems wrong.
Mark any suspicious behavior with # TODO: verify this is intentional
Step 2: Review and refine
Once you have the characterization tests running, use them as a safety net before refactoring. They capture actual behavior, not intended behavior — which is exactly what you need for safe refactoring.
For the full refactoring workflow after you have tests in place, see Claude Code Refactoring Large Codebase Strategy.
Batch Test Generation at Scale
For generating tests across many files:
# scripts/generate_tests.py
import anthropic
import os
from pathlib import Path
client = anthropic.Anthropic()
def generate_test_for_file(source_path: Path) -> str:
with open(source_path) as f:
code = f.read()
if len(code) > 8000:
code = code[:8000] + "\n# [truncated]"
response = client.messages.create(
model="claude-haiku-4-5", # Haiku for cost-efficient batch generation
max_tokens=2000,
messages=[{
"role": "user",
"content": f"""Generate a pytest test file for this Python module.
Cover: happy path, boundary values, error cases.
Use pytest.mark.parametrize where appropriate.
Output only the test file code.
{code}"""
}]
)
return response.content[0].text
def main():
src_dir = Path("src")
test_dir = Path("tests")
for py_file in src_dir.rglob("*.py"):
if "__init__" in py_file.name:
continue
test_path = test_dir / f"test_{py_file.name}"
if test_path.exists():
print(f"Skipping {py_file} — test file exists")
continue
print(f"Generating tests for {py_file}...")
test_code = generate_test_for_file(py_file)
test_path.write_text(test_code)
print(f" Written: {test_path}")
if __name__ == "__main__":
main()
Using Claude Haiku for batch test generation costs approximately $0.003 per 1,000-line file. Generating tests for a 50-file codebase costs under $0.20 total. See Claude API Cost and Prompt Caching Break-Even for detailed cost calculations.
For CI automation that runs these tests on every PR, see the Claude Code Complete Guide.
Frequently Asked Questions
How accurate are Claude-generated tests — do they actually pass?
For pure functions with clear inputs and outputs, Claude-generated tests typically run without modification. For classes with external dependencies (databases, HTTP, file I/O), you usually need to adjust the mock setup. Plan for a 10–20 minute review pass per test file to fix imports, mock configurations, and assertion values.
Should I use Claude to generate tests for new code or legacy code?
Both. For new code, generate tests before or immediately after writing the implementation. For legacy code, use characterization tests first (generate tests that document current behavior), then use them as a safety net for refactoring. Claude is especially valuable for legacy code because it can identify edge cases humans miss after years of familiarity with the codebase.
What is the best framework to specify for test generation?
Whatever your project already uses. Don't introduce a new testing framework just for Claude-generated tests — they need to run alongside your existing test suite. If starting from scratch: pytest for Python, Vitest for modern TypeScript/JavaScript projects, Jest for established Node.js projects.
How do I handle test generation for functions that call external APIs?
Include explicit mocking instructions in your prompt: "Mock all HTTP calls using pytest-responses / msw / nock." Provide the expected request/response shapes. Claude will generate mocks — but you need to verify the mock responses match what your real API actually returns.
Can Claude generate tests with realistic test data?
Yes. Provide examples in your prompt: "Use realistic email addresses, not 'test@test.com'. Use ISO date strings for dates. Use realistic product names." Claude will generate more usable fixtures. For complex data structures, paste an example record and ask Claude to generate variations of it.
50 test generation prompts — all frameworks, all patterns
Power Prompts ($29) includes test generation templates for unit tests, integration tests, snapshot tests, and end-to-end test outlines across Python, TypeScript, and Go.