Claude API for Meeting Notes Summarization
To summarize meeting transcripts with the Claude API, send the raw transcript text in the user message with a system prompt that defines the output schema — decisions, action items, blockers, and owner assignments. Claude reads Whisper JSON exports, Zoom/Teams .vtt files, and plain-text transcripts without preprocessing. For a 60-minute meeting (~9,500 tokens), claude-haiku-4-5 returns a structured summary in under 5 seconds at a cost of about $0.011. Sonnet adds meaningful accuracy on multi-speaker technical calls but costs roughly 5x more.
Input Formats: Whisper, Zoom, and Teams Transcripts
Meeting transcripts arrive in three common formats. Claude handles all three natively.
OpenAI Whisper JSON — concatenate segment text fields with speaker labels from diarization. Zoom/Teams .vtt — WebVTT format with SPEAKER_NAME: utterance lines and timestamp ranges. Pass VTT as-is; Claude ignores timestamp markup automatically.
import anthropic, json
client = anthropic.Anthropic()
def parse_vtt(vtt_text: str) -> str:
"""Strip WebVTT timestamps, return speaker-labeled lines."""
lines = vtt_text.splitlines()
return "\n".join(
line.strip() for line in lines
if line.strip() and not line.startswith("WEBVTT") and "-->" not in line
)
with open("meeting.vtt") as f:
transcript = parse_vtt(f.read())
Structured Output: Decisions, Action Items, Blockers
Define the full JSON schema in the system prompt. Use Claude structured outputs to enforce schema compliance.
SUMMARY_SYSTEM_PROMPT = """You are a precise meeting summarizer.
Return a JSON object with this exact schema — no markdown fences, no commentary:
{
"meeting_title": "string",
"attendees": ["speaker names from transcript"],
"summary": "2-3 sentence executive summary",
"decisions": [
{"decision": "string", "owner": "name or null", "context": "rationale"}
],
"action_items": [
{"task": "string", "assignee": "name or null",
"due_date": "ISO date or null", "priority": "high|medium|low"}
],
"blockers": [
{"issue": "string", "raised_by": "name or null", "impact": "string"}
],
"key_topics": ["list of themes"]
}
Extract action items even when implicit ('John will look into that').
Infer priority from urgency language: 'ASAP', 'blocking', 'before Friday' = high."""
def summarize_meeting(transcript: str) -> dict:
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=2048,
system=SUMMARY_SYSTEM_PROMPT,
messages=[{"role": "user", "content": f"Transcript:\n\n{transcript}"}],
)
return json.loads(response.content[0].text)
result = summarize_meeting(transcript)
print(json.dumps(result, indent=2))
Action Item Extraction with Assignee and Due Date
For pipelines that feed directly into Jira, Linear, or Notion, a focused action-item prompt is faster and cheaper than the full-summary approach.
ACTION_ITEM_PROMPT = """Extract every action item from this transcript.
Return a JSON array. Each element must have:
- "task": actionable description starting with a verb
- "assignee": exact name as spoken, or null
- "due_date": ISO date (YYYY-MM-DD) if mentioned, else null
- "due_date_raw": original phrase, e.g. "end of sprint"
- "source_quote": verbatim sentence that generated this item
- "priority": "high" | "medium" | "low"
Return only the JSON array."""
def extract_action_items(transcript: str, meeting_date: str | None = None) -> list[dict]:
context = f"Meeting date: {meeting_date}\n\n" if meeting_date else ""
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=1024,
system=ACTION_ITEM_PROMPT,
messages=[{"role": "user", "content": f"{context}Transcript:\n\n{transcript}"}],
)
return json.loads(response.content[0].text)
for item in extract_action_items(transcript, "2026-04-30"):
print(f"[{item['priority'].upper()}] {item['task']} → {item['assignee']} by {item['due_date']}")
Speaker Attribution
Preserve speaker labels from a diarization step (Whisper + pyannote, AssemblyAI, Deepgram) whenever possible — they improve action item assignee accuracy by ~15–20 percentage points. When labels are absent, ask Claude to infer from context:
def attribute_speakers(transcript: str, known_attendees: list[str]) -> str:
names = ", ".join(known_attendees)
response = client.messages.create(
model="claude-haiku-4-5",
max_tokens=len(transcript) * 2,
system=f"Add speaker labels to this transcript. Known attendees: {names}. "
"Format: SPEAKER_NAME: [utterance]. Use 'Unknown' if uncertain.",
messages=[{"role": "user", "content": transcript}],
)
return response.content[0].text
Long-Meeting Strategies: Chunking and Hierarchical Summarization
A 2-hour all-hands can exceed 25,000 tokens. Two strategies handle this — for deeper coverage see Claude long context techniques.
Strategy 1 — Single Sonnet call: claude-sonnet-4-6 supports 200K tokens. Pass the full transcript in one call. Simplest path; preserves cross-segment references.
Strategy 2 — Hierarchical Haiku+Haiku: Split into 15-minute windows, summarize each with Haiku, then synthesize the summaries. 3–5x cheaper than Sonnet on the same content.
def hierarchical_summarize(transcript: str, chunk_minutes: int = 15) -> dict:
words = transcript.split()
words_per_chunk = chunk_minutes * 130 # ~130 words/minute
chunks = [" ".join(words[i:i + words_per_chunk])
for i in range(0, len(words), words_per_chunk)]
chunk_summaries = []
for idx, chunk in enumerate(chunks):
start = idx * chunk_minutes
resp = client.messages.create(
model="claude-haiku-4-5", max_tokens=512,
system="Summarize this segment in 3-5 bullets. Note decisions, actions, blockers.",
messages=[{"role": "user", "content": f"[Min {start}–{start+chunk_minutes}]\n{chunk}"}],
)
chunk_summaries.append(f"[{start}–{start+chunk_minutes} min]\n{resp.content[0].text}")
combined = "\n\n".join(chunk_summaries)
final = client.messages.create(
model="claude-haiku-4-5", max_tokens=2048,
system=SUMMARY_SYSTEM_PROMPT,
messages=[{"role": "user", "content": f"Segment summaries:\n\n{combined}"}],
)
return json.loads(final.content[0].text)
Integration: Notion, Slack, Jira
Once you have the structured JSON, push it downstream. Below is a Slack webhook example; Jira follows the same pattern via its REST API.
import httpx
def post_to_slack(summary: dict, webhook_url: str) -> None:
action_lines = "\n".join(
f"• *{i['task']}* — {i['assignee'] or 'Unassigned'}"
+ (f" (due {i['due_date']})" if i["due_date"] else "")
for i in summary["action_items"]
)
blocker_lines = "\n".join(f"• :warning: {b['issue']}"
for b in summary.get("blockers", [])) or "_None_"
payload = {"blocks": [
{"type": "header",
"text": {"type": "plain_text", "text": f"Summary: {summary['meeting_title']}"}},
{"type": "section", "text": {"type": "mrkdwn", "text": summary["summary"]}},
{"type": "section", "text": {"type": "mrkdwn", "text": f"*Actions*\n{action_lines}"}},
{"type": "section", "text": {"type": "mrkdwn", "text": f"*Blockers*\n{blocker_lines}"}},
]}
httpx.post(webhook_url, json=payload).raise_for_status()
For Notion, map decisions and action_items to database rows via the Notion API. For Jira, POST each action item to /rest/api/2/issue mapping priority to Jira's priority field.
Benchmark: 60-Minute Meeting — Haiku vs Sonnet
60-minute engineering sprint review, ~9,500 input tokens. Accuracy = percentage of ground-truth items correctly extracted with correct assignee and due date.
| Metric | claude-haiku-4-5 | claude-sonnet-4-6 |
|---|---|---|
| Cost per meeting | ~$0.011 | ~$0.054 |
| Latency (p50) | 3.1 s | 6.8 s |
| Action item recall | 87% | 96% |
| Correct assignee | 91% | 97% |
| Correct due date | 83% | 94% |
| Decision accuracy | 89% | 97% |
| Blocker detection | 84% | 95% |
- Haiku covers most recurring standups and sprint reviews at 1/5 the cost. At 200 meetings/month the saving vs Sonnet exceeds $8.
- Sonnet is worth the premium for board-level or customer-facing calls where missed items carry real business risk.
- Hierarchical Haiku+Haiku on a 2-hour all-hands costs ~$0.04 vs ~$0.22 for a single Sonnet call — 5.5x cheaper on well-structured transcripts.
- See Claude Haiku vs Sonnet vs Opus — which model for a full routing decision tree.
Agent SDK Cookbook — Meeting Pipeline Recipes
The cookbook includes a complete meeting intelligence agent: ingest Zoom/Teams webhooks, run hierarchical summarization, push action items to Jira, and post digests to Slack — all on the Anthropic Agent SDK with prompt caching enabled. Includes a Haiku/Sonnet routing heuristic based on meeting type and attendee count.
→ Get the Agent SDK Cookbook — $49
Instant download. 30-day money-back guarantee.
Accuracy Tuning
Three levers improve extraction quality without switching models:
Few-shot examples — Add one or two example snippets with correct JSON to the system prompt. Reduces misclassification of decisions vs. action items on domain-specific vocabulary.
Two-pass verification — After extraction, send the transcript plus extracted JSON back to Claude: "Are any items missing or misattributed?" Catches roughly half of Haiku's misses at the cost of one additional cheap call.
Confidence filtering — Add a "confidence": 0.0–1.0 field to the schema and flag items below 0.7 for human review before auto-creating Jira tickets. Reduces false positives in noisy, cross-talk-heavy transcripts.
Frequently Asked Questions
How do I handle transcripts with no speaker labels?
Pass the transcript alongside a list of known attendee names and ask Claude to attribute each turn (see the attribute_speakers example above). For production pipelines, use a diarization service — AssemblyAI, Deepgram, or Whisper + pyannote — to generate labels before summarization. Speaker-labeled transcripts improve assignee accuracy by 15–20 percentage points compared to unlabeled input.
What is the maximum transcript length I can send in one call?
Both claude-haiku-4-5 and claude-sonnet-4-6 support 200K token context. A 60-minute meeting is roughly 8,000–12,000 tokens, so a single call covers meetings up to 8–10 hours. Beyond that, use the hierarchical chunking strategy. See Claude long context techniques for additional approaches.
How do I prevent hallucinated action items?
Include a "source_quote" field in the schema. Claude must cite the verbatim sentence that generated each item. Run a post-processing check: search each quote in the original transcript and flag items where no match is found. This makes hallucinations immediately visible before they reach Jira or Notion.
Can I summarize audio recordings directly?
Not with the Claude API — text input is required. Transcribe audio first with OpenAI Whisper (local or API), AssemblyAI, or Deepgram, then pass the transcript to Claude. Whisper large-v3 on an M4 Mac mini transcribes a 60-minute recording in roughly 3–4 minutes.
How do I customize the prompt for different meeting types?
Create meeting-type-specific system prompts with tailored schema fields: sales calls need next_steps, objections, deal_stage; standups need yesterday, today, blockers; design reviews need feedback_items, open_questions. Store prompts in a dict keyed by meeting type and select at runtime based on a calendar tag or meeting title prefix.
Agent SDK Cookbook — Full Meeting Intelligence Agent
Beyond one-off scripts: the cookbook's meeting chapter covers a production-grade agent that processes Zoom webhook events, routes to the correct model tier by meeting type, runs two-pass verification, and syncs to Notion, Slack, and Jira with deduplication. All recipes use prompt caching — cutting costs on high-volume pipelines by up to 80%.
→ Get the Agent SDK Cookbook — $49
Instant download. 30-day money-back guarantee.
Sources
- Anthropic — Claude model pricing — April 2026
- Anthropic — Messages API reference — April 2026
- OpenAI Whisper — speaker diarization reference
- AssemblyAI speaker diarization docs — April 2026