Claude Code for Data Science: Jupyter and Notebook Workflows
Claude Code works with Jupyter notebooks through two paths: the CLI that can read and run notebook cells, and direct .ipynb file editing that treats notebook JSON as structured data. For most data science workflows, the most effective pattern is using Claude Code in the terminal alongside an open notebook — Claude generates code, you paste and run, iterate. This guide covers the patterns that work for EDA, visualization, and model development.
Setup: CLAUDE.md for Data Science Projects
# data-project CLAUDE.md
## Environment
- Python 3.12, Jupyter Lab
- Package manager: uv or pip
- Key packages: pandas, numpy, matplotlib, seaborn, scikit-learn, polars
## Data Conventions
- Raw data: data/raw/ (read-only, never modified)
- Processed: data/processed/
- Outputs: data/outputs/ (figures, reports)
- Column naming: snake_case
- Date columns: always parse as datetime, store UTC
## Code Style
- Type hints on functions
- Docstrings on public functions
- No magic numbers — name your constants
- DataFrames: prefer method chaining over intermediate variables
## Notebook Conventions
- First cell: imports only
- Second cell: configuration/constants
- Each analysis section: one markdown cell explaining what/why, then code cell(s)
- Save figures: always save to data/outputs/ in addition to displaying
## Testing
- Tests for data transformation functions: tests/
- Use pytest
Pattern 1: EDA (Exploratory Data Analysis)
# Generate a complete EDA notebook for a dataset
claude "Write Jupyter notebook cells for EDA on this dataset:
File: data/raw/sales_2026.csv
Known columns: date, product_id, quantity, price, region, customer_id
Generate cells for:
1. Load and basic info (shape, dtypes, head)
2. Missing value analysis (heatmap + counts)
3. Distribution of numeric columns (histograms)
4. Time series: monthly revenue trend
5. Top 10 products by revenue
6. Regional breakdown (bar chart)
Use seaborn for plots, save each figure to data/outputs/.
Each section: markdown explanation cell + code cell."
Generated output (example of one section):
# Cell: Missing Value Analysis
import missingno as msno
import matplotlib.pyplot as plt
# Count missing values
missing = df.isnull().sum()
missing_pct = (missing / len(df) * 100).sort_values(ascending=False)
missing_df = pd.DataFrame({'count': missing, 'pct': missing_pct})
print(missing_df[missing_df['count'] > 0])
# Visualize
fig, ax = plt.subplots(figsize=(10, 6))
msno.bar(df, ax=ax, color='steelblue', fontsize=12)
plt.title('Missing Value Distribution')
plt.tight_layout()
plt.savefig('data/outputs/missing_values.png', dpi=150, bbox_inches='tight')
plt.show()
Pattern 2: Debugging DataFrames
When a DataFrame transformation isn't working:
claude "Debug this pandas code:
df_result = (
df.groupby(['region', 'month'])
.agg({'revenue': 'sum', 'quantity': 'sum'})
.reset_index()
.pivot(index='month', columns='region', values='revenue')
.fillna(0)
)
Error: 'DataFrame' object has no attribute 'values' on the pivot step.
DataFrame dtypes: region (object), month (object), revenue (float64), quantity (int64)
Sample: [paste df.head() output]"
Pattern 3: Visualization Generation
claude "Create a visualization function:
- Input: dataframe with columns [date, category, value]
- Output: subplot grid showing:
1. Line plot per category over time
2. Stacked bar chart by month
3. Correlation heatmap between numeric columns
- Use seaborn theme='whitegrid'
- Save to data/outputs/analysis_[timestamp].png
- Return the figure object"
Pattern 4: Reading and Modifying Existing Notebooks
Claude Code can read .ipynb files directly:
# Read a notebook and suggest improvements
claude "@notebooks/sales_analysis.ipynb
Review this notebook and:
1. Identify any cells with deprecated pandas syntax (use pd.concat instead of append, etc.)
2. Find plots missing axis labels or titles
3. Suggest where to add docstrings
4. Note any magic numbers that should be constants
Don't modify anything yet — just report."
Then apply specific fixes:
claude "@notebooks/sales_analysis.ipynb
Fix all the issues you found:
- Update deprecated pandas syntax
- Add missing axis labels (use descriptive labels from column names)
- Add module-level constants for magic numbers
Show me the changed cells only."
Pattern 5: Model Evaluation Code
claude "Write model evaluation cells for a binary classification problem.
Model: sklearn LogisticRegression (already trained as 'model')
Test data: X_test, y_test (already defined)
Generate cells for:
1. Predictions and probability scores
2. Classification report (precision, recall, F1)
3. Confusion matrix heatmap
4. ROC curve with AUC
5. Precision-Recall curve
6. Feature importance (coefficients) bar chart
Each metric: explain what it means in one markdown sentence.
All plots: save to data/outputs/model_eval/"
Using Claude Code CLI with Jupyter
# Run notebook non-interactively
jupyter nbconvert --to notebook --execute notebooks/analysis.ipynb
# Claude can help debug failed executions
claude "This notebook failed during execution:
Error: KeyError: 'customer_segment' in cell 15
The column was renamed from 'segment' to 'customer_segment' in preprocessing.
Fix all references in the notebook."
# Generate a report from notebook output
claude "Convert the outputs from notebooks/analysis.ipynb into a
markdown report summary. Extract: key metrics, main findings,
and embed the saved figure paths as image references."
Frequently Asked Questions
Can Claude Code run notebook cells directly?
Claude Code can read .ipynb files and generate/modify cell content. To actually run cells, you still use Jupyter Lab/Notebook or jupyter nbconvert --execute. Claude generates the code, you run it.
How do I share DataFrame context with Claude for debugging?
Include df.dtypes, df.head(), and the exact error message. For large DataFrames, also include df.describe() for numeric columns. Claude needs the structure and a sample to debug effectively.
Is Claude good at pandas vs polars? Claude generates good pandas code (more training data). Polars is newer but Claude Code handles it reasonably with CLAUDE.md context: "Use polars for all DataFrame operations, not pandas."
What's the best way to iterate on visualizations with Claude? Describe what you want, generate, run it, then paste the output description back: "The bars are too narrow and the x-axis labels overlap. Fix spacing and rotate labels 45°." Claude iterates well on visualization details.
Related Guides
- Claude Code Complete Guide — Full reference
- Claude Code for Backend: Python, Go, Node.js — Python backend patterns
- Context Engineering for Claude — CLAUDE.md optimization
Go Deeper
Power Prompts 300 — $29 — 20 data science prompts: full EDA workflows, model evaluation templates, visualization generation patterns, and the data science CLAUDE.md template.
30-day money-back guarantee. Instant download.