How to Debug a 10K+ Line Codebase with Claude Code
Debugging a large codebase with Claude Code requires five practices: a CLAUDE.md architecture map, progressive disclosure (start with the bug report, drill down file by file), subagent delegation per module, git bisect automation for regressions, and structured log parsing. Claude Code's 1M token context window fits roughly 40,000 lines of code — enough to hold an entire medium-sized service, but not an entire monorepo. Strategy determines what goes in that window. For a complete Claude Code overview, see the Claude Code Complete Guide.
Why Large Codebases Are Hard for AI Agents
A 10K+ line codebase breaks the naive "dump everything and ask" approach in three ways:
Context saturation. Claude Code's context window is large (200K tokens by default in claude.ai Projects, up to 1M via the API), but filling it with entire repositories degrades reasoning quality. Studies on long-context LLM performance show accuracy degradation when the needle (relevant code) is buried deep in irrelevant hay. A 50K-line Rails app might have 3K lines directly relevant to a given bug.
File discovery. Claude cannot browse a filesystem it hasn't seen. Without explicit instruction, it will ask you to share files — or worse, hallucinate file contents based on naming conventions. The agent needs a map.
Dependency graphs. A bug in a 15-file chain (user action → service → worker → database adapter → cache layer) requires Claude to trace the call graph across the entire chain. Without knowing the graph structure upfront, it wastes turns asking "where is X defined?"
The strategies below solve each problem systematically.
Strategy 1: CLAUDE.md as a Codebase Map
CLAUDE.md is read automatically at the start of every Claude Code session. For a large codebase, it is the single highest-leverage file you can create. Treat it as a codebase orientation document for a new senior engineer joining the team.
A debugging-optimized CLAUDE.md includes:
# Architecture
## Service layout
- `src/api/` — Express route handlers, thin layer only
- `src/services/` — Business logic, all DB calls go here
- `src/workers/` — BullMQ background jobs
- `src/models/` — Sequelize ORM models
- `src/lib/` — Shared utilities (logger, cache, http client)
## Data flow
User HTTP request → `src/api/routes/*.ts`
→ `src/services/*.service.ts`
→ `src/models/*.model.ts` (Postgres via Sequelize)
→ `src/lib/cache.ts` (Redis, 5-minute TTL)
Background work: `src/workers/*.worker.ts` (BullMQ, Redis queue)
## Critical dependencies
- All external API calls go through `src/lib/http-client.ts` (has retry logic)
- All errors must be thrown as `AppError` from `src/lib/errors.ts`
- Environment config is centralized in `src/config/index.ts`
## Known complexity areas
- `src/services/billing.service.ts` — Stripe webhook idempotency logic is subtle
- `src/workers/report.worker.ts` — Memory-intensive, spawns child processes
- `src/lib/cache.ts` — Cache invalidation has three layers (local, Redis, CDN)
## How to run tests
npm test — full suite
npm run test:unit — fast, no DB
npm run test:integration — requires Docker Compose up
Why this architecture section is worth every minute it takes to write: When you open a debugging session with "there's a memory leak in production," Claude Code reads this file first. It immediately knows that memory-intensive work lives in report.worker.ts, that the cache has three invalidation layers, and that errors should come through AppError. Without the map, it spends the first 10 messages asking you to share files.
For existing codebases without a CLAUDE.md, generate a draft with:
/init
Claude Code will inspect your repository and produce a starting CLAUDE.md. Edit it to add the "Known complexity areas" section manually — that section requires your human knowledge of where the bodies are buried.
Strategy 2: Progressive Disclosure
The worst debugging prompt: "Here is my entire codebase. Find the bug."
The best debugging prompt structure:
1. Start with the symptom only
2. Share error logs / stack traces
3. Share the one file most likely to be the entry point
4. Let Claude ask what it needs next
5. Share files on demand, not upfront
Session opening template:
Production bug: users report that PDF exports are empty (blank pages)
since yesterday's deploy. This started after commit a3f9b12.
Stack trace from Datadog:
[PASTE STACK TRACE HERE]
Entry point is likely src/workers/report.worker.ts — sharing that now.
[PASTE FILE CONTENTS]
What else do you need to trace this?
This approach is more effective than dumping 20 files because:
- Claude can identify which files it actually needs from the stack trace
- The session starts focused on the symptom, not the entire codebase
- You avoid wasting 30K tokens of context on irrelevant files
Progressive file-sharing rule: Only share files that Claude explicitly asks for, or files you know are in the call path. Trust Claude's file requests — if it asks for src/lib/pdf-renderer.ts, share it. If it asks for something you know is irrelevant, redirect it.
Strategy 3: Subagent Delegation
For bugs that span multiple modules or repositories, spawn subagents to investigate in parallel. Claude Code supports this natively via the Task tool when running in agent mode, or you can orchestrate it yourself using --print mode in parallel shell sessions. For a detailed guide to parallel subagent patterns, see How to Use Claude Code Subagents for Parallel Research.
Pattern: module-parallel investigation
# Terminal 1 — investigate the API layer
claude --print "Read src/api/routes/export.ts and src/api/middleware/*.ts.
Find all code paths that trigger a PDF export job.
Report: what parameters are passed to the worker?" > api-investigation.txt &
# Terminal 2 — investigate the worker layer
claude --print "Read src/workers/report.worker.ts and src/services/pdf.service.ts.
Find how the worker receives job data and what it passes to the PDF renderer.
Report: what data transformations happen before render?" > worker-investigation.txt &
# Terminal 3 — investigate the renderer
claude --print "Read src/lib/pdf-renderer.ts and any files it imports.
Find all conditions that could produce empty/blank output.
Report: what input conditions lead to blank pages?" > renderer-investigation.txt &
wait
cat api-investigation.txt worker-investigation.txt renderer-investigation.txt | \
claude --print "Synthesize these three investigation reports.
Identify the most likely root cause of blank PDF exports."
This approach cuts investigation time from sequential turns to parallel execution. For a 15-file bug chain, three subagents running simultaneously can produce a synthesis in the time one agent takes to read five files.
When to use subagents:
- Bug spans 3+ distinct modules or packages
- You need to compare behavior across environments (production vs staging config)
- You have a regression and need to analyze both old and new code simultaneously
Strategy 4: Git Bisect + Claude
Git bisect is the fastest way to find regressions — it binary-searches your commit history to find the exact commit that introduced a bug. Claude Code can automate the entire bisect session.
Automated bisect prompt:
I need to find which commit introduced a memory leak in the report worker.
The leak: process memory grows ~50MB per report job and is never freed.
Good commit: d4a1f83 (two weeks ago, confirmed no leak)
Bad commit: HEAD (current, confirmed leak exists)
Test command to check for the bug:
node scripts/test-memory-leak.js
Exit code 0 = no leak, exit code 1 = leak present.
Please run git bisect to find the culprit commit.
After bisect identifies it, read the diff and explain what changed.
Claude Code will execute:
git bisect start
git bisect bad HEAD
git bisect good d4a1f83
git bisect run node scripts/test-memory-leak.js
Git bisect typically finds the culprit in 7–10 steps for a 500-commit range (log₂(500) ≈ 9). Claude Code automates every step and then reads the identified commit's diff to explain what changed.
Write the test script before bisecting. The test script is the hardest part. For memory leaks:
// scripts/test-memory-leak.js
const { Worker } = require('./src/workers/report.worker');
async function testMemory() {
const before = process.memoryUsage().heapUsed;
// Run 5 report jobs
for (let i = 0; i < 5; i++) {
await Worker.processJob({ reportId: `test-${i}`, userId: 'test' });
}
// Force GC if available
if (global.gc) global.gc();
const after = process.memoryUsage().heapUsed;
const growthMB = (after - before) / 1024 / 1024;
console.log(`Memory growth: ${growthMB.toFixed(1)} MB`);
process.exit(growthMB > 20 ? 1 : 0); // exit 1 if leak detected
}
testMemory().catch(err => { console.error(err); process.exit(1); });
Run with node --expose-gc scripts/test-memory-leak.js for reliable GC.
Strategy 5: Log Parsing at Scale
Production logs are often the fastest path to a bug. A 10,000-line log file is too large to read manually but well within Claude Code's context window. The key is structured extraction before analysis.
Three-step log analysis pattern:
Step 1 — Extract the signal. Feed Claude a log sample with explicit extraction instructions:
Here are 500 lines from our application log (2026-04-25, production).
The issue: report jobs are failing silently — no error logged, but output is empty.
Extract and list:
1. All lines containing "report" or "pdf"
2. All WARNING or ERROR level lines
3. Any lines with unusual latency (>5000ms)
4. Any lines that appear in bursts or clusters
[PASTE LOG EXCERPT]
Step 2 — Pattern identification. Ask Claude to find anomalies:
Based on those extractions, identify:
- Timing patterns (what happens right before the silent failure?)
- Sequence breaks (expected log lines that are absent)
- Correlation with other events (deploys, traffic spikes, config changes)
Step 3 — Hypothesis generation. Ask Claude to generate ranked hypotheses:
Generate the top 3 hypotheses for the root cause, ranked by likelihood.
For each: (a) what evidence supports it, (b) what single log line or
metric would confirm or rule it out, (c) what code file to check first.
For logs larger than your context window, use shell pre-processing before feeding to Claude:
# Extract relevant lines before sending to Claude
grep -E "(report|pdf|worker|ERROR|WARN)" production.log | \
grep "2026-04-25T14:" | \ # narrow to the incident window
tail -n 2000 | \ # cap at 2000 lines
claude --print "Analyze these log lines for the root cause of empty PDF exports..."
Worked Example: Debugging a Production Memory Leak Across 15 Files
This is a condensed version of a real investigation pattern. The scenario: a Node.js service's memory usage grows ~200MB per hour in production and requires daily restarts.
Session 1 — Orientation (10 minutes)
Memory leak investigation. Service: Node.js 20, Express + BullMQ.
Memory grows ~200MB/hour. Started after v2.3.1 deploy last Thursday.
CLAUDE.md is at the repo root — please read it first.
Then read src/workers/report.worker.ts and src/lib/cache.ts.
Initial question: what are the most common Node.js memory leak sources
in a BullMQ worker with Redis caching?
Claude reads CLAUDE.md (knows the architecture), reads the two highest-risk files (identified from CLAUDE.md's "Known complexity areas"), and returns a ranked list of leak candidates: event listener accumulation, unclosed Redis connections, closure references in job callbacks.
Session 2 — Narrowing (15 minutes)
The cache.ts investigation was interesting. You flagged that Redis
connections are created per-job rather than reused.
Here is the git diff for v2.3.1 (the deploy that introduced the leak):
[PASTE DIFF]
Does this diff change how Redis connections are created or released?
Claude identifies that v2.3.1 added per-job cache instantiation in report.worker.ts line 47, creating a new Redis client for every job without calling .quit(). 15 files share this worker class through dependency injection.
Session 3 — Fix and verification (10 minutes)
Confirmed: the Redis connection leak is the root cause.
The fix is to use a shared Redis singleton from src/lib/cache.ts
rather than per-job instantiation.
Please:
1. Write the fix for src/workers/report.worker.ts
2. Check if any other files in src/workers/ have the same pattern
3. Write a test for src/workers/__tests__/report.worker.test.ts
that would have caught this regression
Total investigation time: 35 minutes across 3 sessions. The critical enabler was CLAUDE.md pointing directly to report.worker.ts as "memory-intensive" — without that, session 1 would have been spent on file discovery.
Using Claude Code's Context Window Effectively
Claude Code supports up to 200K tokens by default (approximately 150,000 words or ~600KB of code). Here is what to put in and keep out:
What belongs in context:
- CLAUDE.md (always, auto-loaded)
- Stack traces and error messages (high signal density)
- The 3–5 files most likely to contain the bug
- Relevant test files (show expected vs actual behavior)
- Git diff for the suspect commit
What to keep out:
- Lockfiles (
package-lock.json,yarn.lock,Gemfile.lock) — zero signal - Generated files (compiled output, migrations auto-generated)
- Vendor/node_modules — never
- Unrelated modules — 90% of a monorepo is irrelevant to any given bug
- Full database schemas unless the bug is a query issue
Context efficiency benchmark:
| Content type | Typical size | Signal density |
|---|---|---|
| Stack trace + error | 2–5 KB | Very high |
| CLAUDE.md (good one) | 3–8 KB | Very high |
| Single source file | 5–20 KB | High (relevant file) |
| Single source file | 5–20 KB | Near zero (irrelevant file) |
| Full test suite | 50–200 KB | Medium |
| node_modules | 50,000+ KB | Zero |
A focused 50KB context (CLAUDE.md + stack trace + 3 relevant files) consistently outperforms a saturated 600KB context containing everything.
FAQ
What is the maximum codebase size Claude Code can handle in one session? Claude Code's context window is 200K tokens by default through the claude.ai interface. Via the Anthropic API, you can access up to 1M tokens on Claude 3.5 Sonnet. 200K tokens holds approximately 150,000 words or 8,000–15,000 lines of reasonably verbose source code. Most bugs require far less than this if you apply progressive disclosure.
Does Claude Code actually read files, or does it rely on training data? Claude Code reads your actual files via filesystem tools. It does not rely on training-data memorization of your codebase. This is why file discovery strategy matters — Claude Code needs to be told which files to read, or it will ask you.
How do I debug a bug that only occurs in production, not locally? Focus on environmental differences: production config values, environment variables, traffic load, and data volume. Share your production config (with secrets redacted) and ask Claude to compare it against your local config. Production-only bugs are frequently caused by configuration divergence, not code logic.
Should I use /compact during long debugging sessions?
Yes. When a debugging session accumulates many turns of tool calls and file reads, use /compact to summarize the conversation history into a dense summary. This frees context space for new files and prevents the "lost in the middle" degradation on long contexts. Compact after each major investigation phase, not continuously.
How accurate is Claude Code's bug identification on first attempt? Based on community reports and our own testing, Claude Code correctly identifies the root cause on the first investigation session roughly 60–70% of the time for bugs in code it can fully read. Success rate drops for bugs in external libraries, infrastructure issues (DNS, network), or race conditions that require runtime observation. For those, use Claude to narrow hypotheses, then use runtime tools (profilers, distributed tracing) to confirm.
Sources
- Anthropic Claude Code documentation: https://docs.anthropic.com/en/docs/claude-code
- Claude Code CLAUDE.md reference: https://docs.anthropic.com/en/docs/claude-code/memory
- Anthropic long-context best practices: https://docs.anthropic.com/en/docs/build-with-claude/long-context-tips
- Claude Code subagents documentation: https://docs.anthropic.com/en/docs/claude-code/sub-agents
- Git bisect documentation: https://git-scm.com/docs/git-bisect
- Node.js memory management and GC documentation: https://nodejs.org/en/docs/guides/diagnostics/memory
Take It Further
Claude Code Power Prompts 300 — 300 battle-tested prompts for Claude Code, organized by use case. Copy, paste, ship.
40 slash command templates. Token-optimized variants. JSONL file for direct import. Tested in production sessions.
→ Get Claude Code Power Prompts 300 — $29
30-day money-back guarantee. Instant download.