Building a Content Generation Agent with Quality Checks
A content generation agent without quality checks produces high volume and inconsistent quality — the opposite of what content teams need. A production content agent runs output through a multi-stage pipeline: draft generation, self-review, fact extraction, format validation, and a human approval gate before anything reaches your CMS. This guide builds that full pipeline with Claude Agent SDK, including the quality check patterns that prevent bad content from shipping.
The Architecture
A naive content agent writes and publishes in one step. A production agent has stages with validation gates between them.
Input spec → Draft generation → Self-review → Structure validation → Fact extraction → [Human gate] → CMS publish
Each stage can reject and retry the previous stage's output. Human gates are optional — use them for high-stakes content, skip them for lower-stakes bulk generation.
Stage 1: Draft Generation
import anthropic
import json
from dataclasses import dataclass
from typing import Optional
client = anthropic.Anthropic()
@dataclass
class ContentSpec:
title: str
primary_keyword: str
target_length: int # words
audience: str # "junior developers", "CTOs", etc.
tone: str # "technical", "conversational", "authoritative"
required_sections: list[str]
cta_product: Optional[str] = None
@dataclass
class DraftResult:
content: str
word_count: int
sections_found: list[str]
metadata: dict
def generate_draft(spec: ContentSpec) -> DraftResult:
"""Stage 1: Generate initial draft from spec."""
system_prompt = f"""You are a technical content writer specializing in developer tools.
Writing constraints:
- Target length: {spec.target_length} words (±10%)
- Audience: {spec.audience}
- Tone: {spec.tone}
- Primary keyword: "{spec.primary_keyword}" — use naturally in title, first paragraph, and 2-3 subheadings
- Required sections: {', '.join(spec.required_sections)}
Quality standards:
- First paragraph (40-60 words) must directly answer the primary question
- Every major claim needs a specific example or number
- Code examples must be complete and runnable
- No filler phrases ("In today's world", "As we know")
- End with a FAQ section (5 questions minimum)"""
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=8000,
system=system_prompt,
messages=[{
"role": "user",
"content": f"Write an article titled: '{spec.title}'\n\nPrimary keyword to target: {spec.primary_keyword}"
}]
)
content = response.content[0].text
word_count = len(content.split())
# Extract sections found
import re
sections_found = re.findall(r'^#{1,3} (.+)$', content, re.MULTILINE)
return DraftResult(
content=content,
word_count=word_count,
sections_found=sections_found,
metadata={
"model": "claude-sonnet-4-5",
"input_tokens": response.usage.input_tokens,
"output_tokens": response.usage.output_tokens,
}
)
Stage 2: Self-Review
Have Claude critique its own draft. This is surprisingly effective — a second pass with a different system prompt catches issues the first pass misses.
@dataclass
class ReviewResult:
passed: bool
score: int # 0-100
issues: list[str]
suggestions: list[str]
revised_content: Optional[str]
def self_review(draft: DraftResult, spec: ContentSpec) -> ReviewResult:
"""Stage 2: Claude reviews its own draft for quality issues."""
review_schema = {
"type": "object",
"properties": {
"score": {"type": "integer", "minimum": 0, "maximum": 100},
"passed": {"type": "boolean"},
"issues": {"type": "array", "items": {"type": "string"}},
"suggestions": {"type": "array", "items": {"type": "string"}},
"needs_revision": {"type": "boolean"}
},
"required": ["score", "passed", "issues", "suggestions", "needs_revision"]
}
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=2000,
system="""You are a senior editor reviewing technical content. Be critical and specific.
Review criteria:
1. First paragraph answers the question directly (not vague)
2. Every claim has a specific number, example, or reference
3. Code examples are complete and correct
4. Required sections are all present
5. No filler, no fluff, no generic statements
6. FAQ section has 5+ substantive questions
7. Word count within 10% of target
Respond ONLY with valid JSON matching the provided schema. No prose before or after.""",
messages=[
{
"role": "user",
"content": f"""Review this article draft. Target: {spec.target_length} words, required sections: {spec.required_sections}
DRAFT:
{draft.content[:6000]} # Truncate for token efficiency
Respond with JSON:
{json.dumps(review_schema, indent=2)}"""
}
]
)
try:
review_data = json.loads(response.content[0].text)
except json.JSONDecodeError:
# Fallback: extract JSON from response
import re
json_match = re.search(r'\{.*\}', response.content[0].text, re.DOTALL)
if json_match:
review_data = json.loads(json_match.group())
else:
# Review failed — treat as passed to avoid blocking
return ReviewResult(passed=True, score=70, issues=[], suggestions=[], revised_content=None)
revised_content = None
if review_data.get("needs_revision") and review_data.get("issues"):
revised_content = apply_revision(draft.content, review_data["issues"], spec)
return ReviewResult(
passed=review_data["passed"],
score=review_data["score"],
issues=review_data["issues"],
suggestions=review_data["suggestions"],
revised_content=revised_content
)
def apply_revision(content: str, issues: list[str], spec: ContentSpec) -> str:
"""Apply corrections based on review issues."""
issues_text = "\n".join(f"- {issue}" for issue in issues)
response = client.messages.create(
model="claude-sonnet-4-5",
max_tokens=8000,
system="You are a technical editor. Fix the specific issues listed. Return the full revised article.",
messages=[{
"role": "user",
"content": f"""Fix these specific issues in the article:
ISSUES TO FIX:
{issues_text}
ARTICLE:
{content}
Return the complete revised article, fixing only the listed issues."""
}]
)
return response.content[0].text
Stage 3: Structure Validation
Enforce structural requirements programmatically — don't rely on LLM to count sections correctly.
import re
from dataclasses import dataclass
@dataclass
class ValidationResult:
passed: bool
failures: list[str]
warnings: list[str]
metrics: dict
def validate_structure(content: str, spec: ContentSpec) -> ValidationResult:
"""Stage 3: Programmatic structure validation."""
failures = []
warnings = []
# Word count check
word_count = len(content.split())
target_min = int(spec.target_length * 0.9)
target_max = int(spec.target_length * 1.1)
if word_count < target_min:
failures.append(f"Too short: {word_count} words (target: {spec.target_length}±10%)")
elif word_count > target_max:
warnings.append(f"Slightly long: {word_count} words (target: {spec.target_length}±10%)")
# Required sections check
content_lower = content.lower()
for section in spec.required_sections:
if section.lower() not in content_lower:
failures.append(f"Missing required section: '{section}'")
# FAQ check
faq_questions = re.findall(r'^\*\*[^*]+\?\*\*|^#{1,3}.*\?', content, re.MULTILINE)
if len(faq_questions) < 5:
warnings.append(f"Only {len(faq_questions)} FAQ questions found (recommend 5+)")
# Keyword check
keyword_count = content_lower.count(spec.primary_keyword.lower())
if keyword_count < 2:
warnings.append(f"Primary keyword '{spec.primary_keyword}' appears only {keyword_count} times")
elif keyword_count > 10:
warnings.append(f"Keyword stuffing risk: '{spec.primary_keyword}' appears {keyword_count} times")
# Code block check
code_blocks = re.findall(r'```', content)
if len(code_blocks) < 2: # At least one code block (open + close = 2 ``` markers)
warnings.append("No code examples found — consider adding concrete examples")
# First paragraph length check
paragraphs = content.split('\n\n')
non_heading_paras = [p for p in paragraphs if not p.startswith('#')]
if non_heading_paras:
first_para_words = len(non_heading_paras[0].split())
if first_para_words > 100:
warnings.append(f"First paragraph too long ({first_para_words} words) — aim for 40-60")
elif first_para_words < 20:
warnings.append(f"First paragraph too short ({first_para_words} words)")
return ValidationResult(
passed=len(failures) == 0,
failures=failures,
warnings=warnings,
metrics={
"word_count": word_count,
"sections_found": len(re.findall(r'^#{1,3} ', content, re.MULTILINE)),
"faq_questions": len(faq_questions),
"keyword_density": keyword_count / word_count * 100 if word_count > 0 else 0
}
)
Orchestrating the Full Pipeline
import time
from enum import Enum
class PipelineStatus(Enum):
DRAFTING = "drafting"
REVIEWING = "reviewing"
VALIDATING = "validating"
AWAITING_APPROVAL = "awaiting_approval"
APPROVED = "approved"
REJECTED = "rejected"
FAILED = "failed"
def run_content_pipeline(
spec: ContentSpec,
require_human_approval: bool = True,
max_revision_attempts: int = 2
) -> dict:
"""
Full content generation pipeline with quality checks.
Returns the final content and pipeline metadata.
"""
pipeline_log = []
start_time = time.time()
def log(stage: str, status: str, details: dict = None):
entry = {"stage": stage, "status": status, "elapsed_s": round(time.time() - start_time, 1)}
if details:
entry.update(details)
pipeline_log.append(entry)
print(f"[{entry['elapsed_s']}s] {stage}: {status}")
# Stage 1: Draft
log("draft", "started")
draft = generate_draft(spec)
log("draft", "completed", {"word_count": draft.word_count})
# Stage 2: Self-review (with retry)
final_content = draft.content
for attempt in range(max_revision_attempts + 1):
log("review", f"attempt {attempt + 1}")
review = self_review(DraftResult(
content=final_content,
word_count=len(final_content.split()),
sections_found=[],
metadata={}
), spec)
if review.passed or attempt == max_revision_attempts:
if review.revised_content:
final_content = review.revised_content
log("review", "revision_applied")
log("review", "completed", {
"score": review.score,
"issues": len(review.issues),
"passed": review.passed
})
break
# Stage 3: Structure validation
log("validation", "started")
validation = validate_structure(final_content, spec)
log("validation", "completed", {
"passed": validation.passed,
"failures": validation.failures,
"warnings": validation.warnings
})
if not validation.passed:
return {
"status": PipelineStatus.FAILED,
"reason": "Validation failed",
"failures": validation.failures,
"pipeline_log": pipeline_log
}
# Stage 4: Human approval gate (optional)
if require_human_approval:
log("human_gate", "awaiting_approval")
print("\n" + "="*60)
print(f"CONTENT READY FOR REVIEW: {spec.title}")
print(f"Word count: {validation.metrics['word_count']}")
print(f"Review score: {review.score}/100")
if validation.warnings:
print(f"Warnings: {validation.warnings}")
print("\nFirst 500 chars:")
print(final_content[:500])
print("="*60)
approval = input("\nApprove for publish? (y/n/edit): ").strip().lower()
if approval != "y":
return {
"status": PipelineStatus.REJECTED,
"reason": f"Human rejected: {approval}",
"content": final_content,
"pipeline_log": pipeline_log
}
return {
"status": PipelineStatus.APPROVED,
"content": final_content,
"title": spec.title,
"slug": spec.primary_keyword.replace('"', '').replace(' ', '-').lower(),
"metrics": validation.metrics,
"review_score": review.score,
"pipeline_log": pipeline_log,
"total_elapsed_s": round(time.time() - start_time, 1)
}
# Usage
spec = ContentSpec(
title="How to Build a REST API with FastAPI and Claude",
primary_keyword="claude fastapi tutorial",
target_length=2500,
audience="intermediate Python developers",
tone="technical",
required_sections=["Quick Answer", "Setup", "Implementation", "FAQ"],
cta_product="Agent SDK Cookbook"
)
result = run_content_pipeline(spec, require_human_approval=True)
if result["status"] == PipelineStatus.APPROVED:
print(f"\nContent approved: {result['metrics']['word_count']} words, score {result['review_score']}/100")
# Write to file, publish to CMS, etc.
Frequently Asked Questions
What quality score should I require before human review? Require a minimum score of 70 before sending to human review. Scores below 70 indicate structural problems that warrant automatic revision. Set your threshold based on your content standards — for SEO-critical content, 80+ is appropriate.
How do I prevent the agent from self-approving bad content? Two controls: (1) The self-review uses a different system prompt focused on finding problems, not finding reasons to pass. (2) The programmatic validation stage catches structural issues regardless of LLM review score. LLMs are optimistic about their own output — the validation layer must be rule-based.
Should I use the same model for drafting and review? Sonnet for both is effective — the key is the different system prompt and role framing. For highest quality, you can use Opus for the review stage to catch subtle issues Sonnet misses.
How much does this pipeline cost per article? At Sonnet pricing: draft generation (~4k tokens out) ≈ $0.06, self-review (~1k out) ≈ $0.02, revision if needed (~4k out) ≈ $0.06. Total: $0.10-0.15 per article. For bulk generation of 100 articles/month, under $15 in API costs.
Can I skip the human approval gate for automated bulk content?
Yes — set require_human_approval=False. Ensure your validation rules are comprehensive before fully automating. A good minimum: word count within range, all required sections present, review score ≥ 80.
Related Guides
- How to Test Claude Agents: Evaluation Harness Patterns — Testing pipelines
- Claude Agent SDK: Build Automation Agents — SDK fundamentals
- Agentic Workflows: The Next Frontier in Developer Tools — Architecture patterns
Go Deeper
Agent SDK Cookbook — $49 — Full content pipeline implementation with CMS integrations (Contentful, Notion, Ghost), bulk generation orchestration, quality scoring dashboard, and cost tracking by content type.
→ Get the Agent SDK Cookbook — $49
30-day money-back guarantee. Instant download.