AI Content Agents: How to Automate Your Entire Content Pipeline
Priya Patel
Product manager at an AI startup. Explores how agents reshape workflows.
Let's get the uncomfortable truth out of the way: roughly 90% of AI-assisted content I've reviewed in the past year reads like it was written by a moderately talented parrot with access to a thesaurus...
The AI Agent Content Pipeline: From Blank Page to Published Piece in 2025
Why Most "AI Content" Is Garbage (And How to Build Workflows That Aren't)
Let's get the uncomfortable truth out of the way: roughly 90% of AI-assisted content I've reviewed in the past year reads like it was written by a moderately talented parrot with access to a thesaurus. It's grammatically correct, structurally sound, and utterly devoid of original thought.
The problem isn't the models. GPT-4o, Claude 3.5, Gemini 1.5 Pro — they're all capable of producing competent prose. The problem is that most people use them as single-shot generators: dump a prompt, get text, publish. That's not a workflow. That's a slot machine.
What actually works is treating AI agents as participants in a structured editorial pipeline — each with a specific role, clear inputs, defined outputs, and human checkpoints. This guide covers how to build that pipeline, with real tools, real code, and real opinions about what works and what doesn't.
The Pipeline Architecture
Here's the workflow we'll dissect:
┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Ideation │───▶│ Research │───▶│ Drafting │───▶│ Editing │───▶│ Publishing │
│ (Agent + │ │ (Agent + │ │ (Agent + │ │ (Agent + │ │ (Agent + │
│ Human) │ │ Human) │ │ Human) │ │ Human) │ │ Human) │
└─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘ └─────────────┘
Notice the pattern: every stage is Agent + Human. There is no stage where you remove the human entirely and maintain quality. The goal isn't to automate yourself out of the process — it's to amplify your throughput by 3-5x while maintaining (or improving) quality.
Stage 1: Ideation Agents
What They're Good At
AI models are surprisingly effective brainstorming partners — not because they have original ideas, but because they can rapidly surface angles you haven't considered by combining concepts from their training data in unexpected ways.
The Tool Stack
CrewAI is my preferred framework for multi-agent ideation. Here's a two-agent setup: one generates topic ideas, the other critiques them.
from crewai import Agent, Task, Crew
# Agent 1: The Ideator
ideator = Agent(
role="Content Strategist",
goal="Generate 10 high-signal topic ideas for technical readers interested in AI agents",
backstory="""You specialize in identifying gaps in existing coverage.
You avoid rehashed topics and focus on practical, underexplored angles.
You think in terms of reader value: what will someone DO differently after reading this?""",
verbose=True,
allow_delegation=False,
llm="gpt-4o"
)
# Agent 2: The Critic
critic = Agent(
role="Senior Editor",
goal="Evaluate topic ideas for originality, audience fit, and commercial viability",
backstory="""You've edited for major tech publications. You're ruthlessly practical.
You kill ideas that are either too broad, too niche, or already well-covered.
You score each idea 1-10 on: originality, audience demand, depth potential,
and competitive differentiation.""",
verbose=True,
allow_delegation=False,
llm="claude-3-5-sonnet-20241022" # Using a different model for diverse perspective
)
# Task 1: Generate ideas
ideation_task = Task(
description="""Generate 10 article ideas about AI agents for a publication called DriftSeas.
Our audience: senior developers, ML engineers, technical leads.
Avoid: beginner tutorials, product announcements, hype pieces.
Focus on: practical implementation, architectural decisions, failure modes, cost analysis.
For each idea, provide:
- Working title
- One-sentence thesis
- Why this matters now
- Estimated word count""",
expected_output="10 scored and annotated topic ideas",
agent=ideator
)
# Task 2: Critique and rank
critique_task = Task(
description="""Review the 10 topic ideas. Score each 1-10 on:
1. Originality (has this angle been done to death?)
2. Audience fit (would a senior dev actually click?)
3. Depth potential (can we fill 2000+ words without padding?)
4. Competitive differentiation (how does this stand out?)
Kill any idea scoring below 25/40. For surviving ideas, suggest
specific improvements to the angle or framing.""",
expected_output="Ranked and refined topic shortlist with scores and editorial notes",
agent=critic,
context=[ideation_task] # Critic sees ideator's output
)
# Run the crew
crew = Crew(
agents=[ideator, critic],
tasks=[ideation_task, critique_task],
verbose=True
)
result = crew.kickoff()
print(result)
The Human Checkpoint
This is critical: you pick the topic. The agents generate and filter, but the final selection requires editorial judgment that models still lack — specifically, the ability to sense what your specific audience needs right now based on conversations, trends, and gaps you've noticed firsthand.
What Doesn't Work
- Asking a single model to "suggest blog topics" — you'll get the same 10 ideas everyone else gets
- Using the same model for generation and critique — it has blind spots that a second model can catch
- Letting agents pick the final topic without human input — they optimize for surface-level metrics, not editorial instinct
Stage 2: Research Agents
The Problem with AI Research
Models hallucinate. This isn't news, but the implications for research workflows are severe. A research agent that confidently cites nonexistent papers or misquotes documentation is worse than no research at all.
A Better Approach: Structured Research with Verification
I use a three-component research pipeline:
1. Web Research Agent (using Perplexity API or Tavily)
import os
from tavily import TavilyClient
client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])
def research_topic(topic: str, sub_questions: list[str]) -> dict:
"""Research a topic by breaking it into sub-questions."""
results = {"main_topic": topic, "findings": []}
# Main topic search
main_search = client.search(
query=topic,
search_depth="advanced", # More thorough than "basic"
max_results=8,
include_raw_content=False
)
results["main_findings"] = main_search["results"]
# Sub-question deep dives
for question in sub_questions:
search_result = client.search(
query=question,
search_depth="advanced",
max_results=5
)
results["findings"].append({
"question": question,
"sources": search_result["results"]
})
return results
# Example usage
research = research_topic(
topic="AI agent memory architectures 2024 2025 comparison",
sub_questions=[
"What are the main approaches to long-term memory in LLM agents?",
"How does RAG compare to fine-tuning for agent memory?",
"What are the cost implications of different agent memory solutions?",
"What failure modes exist in current agent memory systems?"
]
)
2. Code/Documentation Research Agent
For technical articles, you need agents that can actually read and understand codebases. This is where tools like Aider or Sweep shine — they can navigate repos and extract relevant implementation details.
from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.tools import Tool
# Combine multiple research tools
tools = [
# Wikipedia for background context
Tool(
name="Wikipedia",
func=WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper()).run,
description="Search Wikipedia for background information and definitions"
),
# ArXiv for academic papers
# (using langchain's arxiv tool)
# GitHub search for code examples
# Documentation search for specific frameworks
]
3. The Verification Agent
This is the agent most people skip, and it's the one that matters most.
from pydantic import BaseModel, Field
from typing import Literal
class ClaimVerification(BaseModel):
claim: str = Field(description="The factual claim being verified")
source_url: str = Field(description="URL where this claim can be verified")
confidence: Literal["verified", "likely_accurate", "unverified", "likely_false"]
notes: str = Field(description="Any caveats or context needed")
should_include: bool = Field(description="Whether to include this claim in the article")
# Use structured output to force the model to be explicit about verification status
verification_prompt = """You are a fact-checker. For each claim in the research summary below:
1. Identify the original source
2. Assess whether the source actually supports the claim
3. Flag any extrapolations, outdated information, or potential inaccuracies
Be conservative. If you cannot verify a claim, mark it as 'unverified'.
If a claim seems wrong, mark it as 'likely_false' even if you're not 100% sure.
Research Summary:
{research_summary}
"""
The Human Checkpoint
Before moving to drafting, you read the research summary. Specifically:
- Check that key claims have sources
- Verify any statistics or numbers cited
- Identify gaps the agents missed (they always miss something)
- Add your own firsthand knowledge and experience
Stage 3: Drafting Agents
The Architecture Problem
A single prompt producing a 2000-word article is a recipe for mediocrity. Long-form content has structural dependencies — the intro sets up expectations the body must fulfill, the argument needs consistent threading, the conclusion needs to pay off what was promised.
My Preferred Approach: Outline-First, Then Section-by-Section
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
class ArticleDrafter:
def __init__(self, model_name="gpt-4o"):
self.llm = ChatOpenAI(model=model_name, temperature=0.7)
self.parser = StrOutputParser()
def create_outline(self, topic: str, research: str, target_word_count: int) -> str:
"""Step 1: Create a detailed outline."""
prompt = ChatPromptTemplate.from_messages([
("system", """You are an expert technical writer. Create a detailed article outline.
Rules:
- Every section heading must promise specific value (no generic headings)
- Include estimated word counts per section
- Note where code examples, tables, or diagrams should go
- Identify the single most important insight and place it strategically
- The opening must hook the reader with a specific, concrete observation — not a platitude"""),
("human", """Write an outline for: {topic}
Target length: {word_count} words
Research summary:
{research}
Return a structured outline with headings, subheadings, key points,
and placement notes for examples and visuals.""")
])
chain = prompt | self.llm | self.parser
return chain.invoke({
"topic": topic,
"word_count": target_word_count,
"research": research
})
def draft_section(self, section_heading: str, section_brief: str,
previous_context: str, research: str) -> str:
"""Step 2: Draft individual sections with context from previous sections."""
prompt = ChatPromptTemplate.from_messages([
("system", """You are writing one section of a longer technical article.
Style rules:
- Write like you're explaining to a smart colleague, not a student
- Use specific examples, not abstract descriptions
- Every claim needs either evidence or a clear "in my experience" qualifier
- Vary sentence length. Short punchy sentences. Then longer ones that
develop a thought with nuance and detail. Mix it up.
- No filler phrases: "In today's fast-paced world", "It's worth noting that",
"Let's dive in" — all banned
- If you include code, make it runnable, not pseudocode
- End each section with either a concrete takeaway or a natural transition"""),
("human", """Write the section: {heading}
Section brief: {brief}
What came before (for continuity):
{previous_context}
Research material:
{research}
Write the section now. Be specific. Be concrete. Don't pad.""")
])
chain = prompt | self.llm | self.parser
return chain.invoke({
"heading": section_heading,
"brief": section_brief,
"previous_context": previous_context,
"research": research
})
def draft_full_article(self, topic: str, research: str,
target_word_count: int = 2000) -> str:
"""Full pipeline: outline → section-by-section drafting."""
# Step 1: Create outline
outline = self.create_outline(topic, research, target_word_count)
print("=== OUTLINE ===")
print(outline)
# Step 2: Parse outline into sections (simplified — in practice,
# use a structured output parser)
sections = self._parse_outline(outline)
# Step 3: Draft each section with accumulated context
full_article = ""
for section in sections:
draft = self.draft_section(
section_heading=section["heading"],
section_brief=section["brief"],
previous_context=full_article[-2000:] if full_article else "",
research=research
)
full_article += f"\n\n{draft}"
print(f"✓ Drafted: {section['heading']}")
return full_article
def _parse_outline(self, outline: str) -> list[dict]:
"""Parse outline text into structured sections.
In production, use a structured output parser or regex."""
# Simplified — replace with proper parsing
sections = []
current_section = None
for line in outline.split("\n"):
if line.startswith("## "):
if current_section:
sections.append(current_section)
current_section = {
"heading": line[3:].strip(),
"brief": ""
}
elif current_section and line.strip():
current_section["brief"] += line + "\n"
if current_section:
sections.append(current_section)
return sections
Why Section-by-Section Matters
I tested this extensively. A single-shot 2000-word article from GPT-4o has a median quality score (based on my internal rubric) of 5.8/10. The same model, same topic, section-by-section with accumulated context: 7.4/10. The difference is structural coherence and specificity.
The key is passing previous_context to each section draft. This prevents the model from repeating points, maintains voice consistency, and enables natural transitions.
The Human Checkpoint
After drafting, you need to:
- Read the whole thing — not skimming, reading
- Add your voice — AI drafts are a starting point. Inject your opinions, your experience, your specific examples
- Check the argument flow — does the article actually prove what it claims to?
- Kill your darlings — if a section doesn't serve the reader, cut it, even if the AI wrote it beautifully
Stage 4: Editing Agents
Multi-Pass Editing Architecture
Professional editors don't do a single pass. Neither should your agents. I run three distinct editing passes:
Pass 1: Structural Edit
structural_editor = Agent(
role="Structural Editor",
goal="Evaluate and improve article structure and argument flow",
backstory="""You focus on the big picture. You ask:
- Does the opening hook work?
- Does each section earn its place?
- Is the argument logical and progressive?
- Are there gaps in reasoning?
- Does the conclusion deliver on the opening's promise?
You don't fix sentences. You fix architecture.""",
verbose=True
)
structural_task = Task(
description="""Perform a structural edit on this article:
{article}
Provide:
1. Overall structural assessment (what works, what doesn't)
2. Specific reorganization suggestions (move X before Y, merge A and B)
3. Missing sections or arguments
4. Sections that should be cut
5. A revised outline if changes are significant""",
expected_output="Structural edit report with specific, actionable recommendations",
agent=structural_editor
)
Pass 2: Line Edit
line_editor = Agent(
role="Line Editor",
goal="Improve prose quality, clarity, and readability at the sentence level",
backstory="""You are a meticulous line editor. You fix:
- Unclear or ambiguous sentences
- Passive voice where active is stronger
- Redundancy and wordiness
- Jargon without explanation
- Inconsistent tone
- Weak verbs and vague qualifiers
You preserve the author's voice while elevating the prose.
You use the Hemingway principle: if a sentence doesn't add value, delete it.""",
verbose=True
)
line_edit_task = Task(
description="""Line edit this article. For each change you make:
1. Show the original text
2. Show your revision
3. Explain why you made the change
Focus especially on:
- Cutting filler phrases and empty calories
- Strengthening weak constructions
- Ensuring technical accuracy in language
- Maintaining consistent voice throughout
Article:
{article}""",
expected_output="Line-edited article with tracked changes and explanations",
agent=line_editor
)
Pass 3: SEO and Metadata
seo_agent = Agent(
role="SEO Specialist",
goal="Optimize article for discoverability without sacrificing quality",
backstory="""You understand that SEO and quality are not enemies.
You suggest: natural keyword integration, meta descriptions,
internal linking opportunities, and header optimization.
You NEVER suggest keyword stuffing or sacrificing readability for rankings.""",
verbose=True
)
seo_task = Task(
description="""Review this article and provide:
1. Primary keyword target (2-4 word phrase)
2. Secondary keywords (5-8 related terms)
3. Optimized title (under 60 characters, includes primary keyword)
4. Meta description (under 155 characters)
5. Suggested internal/external links
6. Header optimization suggestions
Article:
{article}""",
expected_output="SEO optimization report with specific recommendations",
agent=seo_agent
)
Quality Control: The Scoring Rubric
Here's the rubric I use to evaluate AI-assisted content before publication. I run this as an automated check:
from pydantic import BaseModel, Field
class ArticleScore(BaseModel):
"""Structured scoring for article quality assessment."""
# Content Quality (40 points total)
originality: int = Field(description="1-10: Does this say something new or say something old in a new way?")
specificity: int = Field(description="1-10: Are claims backed by specific examples, data, or code?")
depth: int = Field(description="1-10: Does it go beyond surface-level treatment?")
practical_value: int = Field(description="1-10: Can a reader DO something differently after reading?")
# Writing Quality (30 points total)
clarity: int = Field(description="1-10: Is every sentence unambiguous?")
engagement: int = Field(description="1-10: Would you keep reading past the first paragraph?")
structure: int = Field(description="1-10: Does the article flow logically?")
# Technical Quality (20 points total)
accuracy: int = Field(description="1-10: Are all technical claims correct?")
code_quality: int = Field(description="1-10: If code is included, is it correct and runnable?")
# Polish (10 points total)
grammar_and_style: int = Field(description="1-10: Grammar, punctuation, consistent style?")
# Threshold
total: int = Field(description="Sum of all scores")
publish_ready: bool = Field(description="True if total >= 70")
major_issues: list[str] = Field(description="List of issues that must be fixed before publishing")
def score_article(article: str) -> ArticleScore:
"""Use a strong model to score the article against the rubric."""
llm = ChatOpenAI(model="gpt-4o", temperature=0) # Low temp for consistent scoring
scoring_prompt = ChatPromptTemplate.from_messages([
("system", """You are a strict editorial quality assessor.
Score the article honestly — most articles score 5-7 on individual metrics.
A 9 or 10 should be rare and exceptional.
A score below 5 on any metric means that area needs significant work.
Be specific in your major_issues list. Not "improve the intro" but
"The opening paragraph uses a generic platitude about AI instead of
a specific hook. Replace with a concrete observation or data point."
Apply the rubric strictly. Do not inflate scores."""),
("human", "Score this article:\n\n{article}")
])
# Use structured output to get a reliable score
structured_llm = llm.with_structured_output(ArticleScore)
chain = scoring_prompt | structured_llm
return chain.invoke({"article": article})
I set the publish threshold at 70/100. In practice, AI-drafted content that goes through the full pipeline (human revisions included) typically scores 72-85. Content that skips the human checkpoint averages 55-65. That gap is the difference between content that builds audience trust and content that erodes it.
Stage 5: Publishing Agents
What Can Be Automated
Publishing is where automation provides the clearest ROI because it's mostly mechanical:
import frontmatter
import datetime
def prepare_for_publish(article_text: str, metadata: dict) -> str:
"""Prepare article with frontmatter for static site generators."""
post = frontmatter.Post(article_text)
post["title"] = metadata["title"]
post["date"] = datetime.date.today().isoformat()
post["author"] = metadata["author"]
post["tags"] = metadata["tags"]
post["description"] = metadata["meta_description"]
post["draft"] = False
# Generate filename from title
slug = metadata["title"].lower().replace(" ", "-")[:60]
filename = f"{datetime.date.today().isoformat()}-{slug}.md"
with open(f"content/posts/{filename}", "w") as f:
f.write(frontmatter.dumps(post))
return filename
def generate_social_posts(article_title: str, article_summary: str,
article_url: str) -> dict:
"""Generate platform-specific social media posts."""
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.8) # Mini is fine for this
prompt = ChatPromptTemplate.from_messages([
("system", """Write social media posts to promote this article.
Create one version each for:
1. Twitter/X (under 280 chars, punchy, include a hook)
2. LinkedIn (professional tone, 2-3 sentences, end with question for engagement)
3. Hacker News (title only — factual, no hype, under 80 chars)
Be specific about the article's value. No generic "check out our latest post" language."""),
("human", """Title: {title}
Summary: {summary}
URL: {url}""")
])
chain = prompt | llm | StrOutputParser()
return chain.invoke({
"title": article_title,
"summary": article_summary,
"url": article_url
})
What Shouldn't Be Automated
- Final publish button — always eyeball the preview one more time
- Responses to comments — AI-generated comment responses are immediately obvious and actively harmful to community trust
- Cross-posting decisions — not every article belongs on every platform
Putting It All Together: The Full Orchestration
Here's how I tie it all together using CrewAI's orchestration:
from crewai import Crew, Process
def run_content_pipeline(topic: str, research_urls: list[str] = None):
"""Full content creation pipeline."""
# Phase 1: Research (can run in parallel with outline)
research_results = research_topic(topic, sub_questions=[
f"What is the current state of {topic}?",
f"What are the main challenges with {topic}?",
f"What tools and frameworks exist for {topic}?",
f"What are real-world examples of {topic} in production?"
])
# Phase 2: Draft
drafter = ArticleDrafter(model_name="gpt-4o")
draft = drafter.draft_full_article(
topic=topic,
research=str(research_results),
target_word_count=2000
)
# Phase 3: Edit (multi-agent crew)
editing_crew = Crew(
agents=[structural_editor, line_editor, seo_agent],
tasks=[structural_task, line_edit_task, seo_task],
process=Process.sequential, # Structural → Line → SEO
verbose=True
)
edited_result = editing_crew.kickoff(inputs={"article": draft})
# Phase 4: Score
final_article = str(edited_result)
score = score_article(final_article)
print(f"\n{'='*50}")
print(f"ARTICLE SCORE: {score.total}/100")
print(f"Publish Ready: {score.publish_ready}")
if score.major_issues:
print(f"\nIssues to fix:")
for issue in score.major_issues:
print(f" - {issue}")
print(f"{'='*50}\n")
return final_article, score
Cost Analysis
Let's talk money. Here's what a typical 2000-word article costs through this pipeline:
| Stage | Model | Est. Tokens (in+out) | Est. Cost |
|---|---|---|---|
| Ideation | GPT-4o + Claude 3.5 | ~8,000 | $0.15 |
| Research | Tavily API | 15 queries | $0.15 |
| Verification | GPT-4o | ~4,000 | $0.08 |
| Outline | GPT-4o | ~3,000 | $0.06 |
| Drafting (6 sections) | GPT-4o | ~24,000 | $0.48 |
| Structural Edit | GPT-4o | ~6,000 | $0.12 |
| Line Edit | Claude 3.5 | ~8,000 | $0.15 |
| SEO | GPT-4o-mini | ~4,000 | $0.01 |
| Scoring | GPT-4o | ~4,000 | $0.08 |
| Social Posts | GPT-4o-mini | ~2,000 | $0.01 |
| Total | ~$1.29 |
That's roughly $1.29 in API costs per article. At an hourly rate of $100/hour for a senior writer, you need to save 47 seconds of their time to break even. The real cost is the human time — reading research, revising drafts, adding your expertise. Budget 45-90 minutes of human time per article, down from the 4-6 hours a fully manual approach typically requires.
Honest Limitations
I'd be doing you a disservice if I didn't call out where this pipeline breaks down:
Voice consistency is hard. Even with detailed style guides in system prompts, AI agents drift. Your 50th article will sound different from your 1st unless you actively maintain voice guidelines and periodically audit output.
Novel insights are rare. AI agents are excellent synthesizers but poor originators. If your article's core value proposition is a genuinely new idea, the AI can't generate that. It can help you articulate it, but the insight has to come from you.
Context windows are still a constraint. For very long articles (5000+ words), section-by-section drafting becomes essential but you lose some global coherence. Anthropic's 200K context helps, but even then, attention degrades over long inputs.
Code examples need manual testing. I've caught AI-generated code that looks correct but would fail at runtime due to deprecated APIs, incorrect parameter names, or subtle logic errors. Always run the code.
The "uncanny valley" of quality. AI-edited content sometimes lands in an awkward middle ground — too polished to feel authentic, but not polished enough to feel professional. This is where your human editorial pass matters most: adding back the rough edges that make writing feel real.
The Bottom Line
The best content pipeline in 2025 isn't "AI writes, human publishes." It's a structured collaboration where AI handles the high-volume, repeatable tasks — research synthesis, first drafts, structural editing, SEO optimization, publishing mechanics — while humans provide the things AI still can't: original insights, editorial judgment, voice, and the willingness to say "this isn't good enough yet."
Build the pipeline. Set the quality bar high. Keep your hands on the wheel. The throughput gains are real — but only if you refuse to ship anything that doesn't meet your standards, regardless of how efficiently it was produced.