Back to Home
Creative Agents

AI Content Agents: How to Automate Your Entire Content Pipeline

Priya Patel

Product manager at an AI startup. Explores how agents reshape workflows.

April 29, 202619 min read

Let's get the uncomfortable truth out of the way: roughly 90% of AI-assisted content I've reviewed in the past year reads like it was written by a moderately talented parrot with access to a thesaurus...

The AI Agent Content Pipeline: From Blank Page to Published Piece in 2025

Why Most "AI Content" Is Garbage (And How to Build Workflows That Aren't)

Let's get the uncomfortable truth out of the way: roughly 90% of AI-assisted content I've reviewed in the past year reads like it was written by a moderately talented parrot with access to a thesaurus. It's grammatically correct, structurally sound, and utterly devoid of original thought.

The problem isn't the models. GPT-4o, Claude 3.5, Gemini 1.5 Pro — they're all capable of producing competent prose. The problem is that most people use them as single-shot generators: dump a prompt, get text, publish. That's not a workflow. That's a slot machine.

What actually works is treating AI agents as participants in a structured editorial pipeline — each with a specific role, clear inputs, defined outputs, and human checkpoints. This guide covers how to build that pipeline, with real tools, real code, and real opinions about what works and what doesn't.


The Pipeline Architecture

Here's the workflow we'll dissect:

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Ideation   │───▶│  Research   │───▶│  Drafting   │───▶│  Editing    │───▶│  Publishing │
│  (Agent +   │    │  (Agent +   │    │  (Agent +   │    │  (Agent +   │    │  (Agent +   │
│   Human)    │    │   Human)    │    │   Human)    │    │   Human)    │    │   Human)    │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

Notice the pattern: every stage is Agent + Human. There is no stage where you remove the human entirely and maintain quality. The goal isn't to automate yourself out of the process — it's to amplify your throughput by 3-5x while maintaining (or improving) quality.


Stage 1: Ideation Agents

What They're Good At

AI models are surprisingly effective brainstorming partners — not because they have original ideas, but because they can rapidly surface angles you haven't considered by combining concepts from their training data in unexpected ways.

The Tool Stack

CrewAI is my preferred framework for multi-agent ideation. Here's a two-agent setup: one generates topic ideas, the other critiques them.

from crewai import Agent, Task, Crew

# Agent 1: The Ideator
ideator = Agent(
    role="Content Strategist",
    goal="Generate 10 high-signal topic ideas for technical readers interested in AI agents",
    backstory="""You specialize in identifying gaps in existing coverage. 
    You avoid rehashed topics and focus on practical, underexplored angles.
    You think in terms of reader value: what will someone DO differently after reading this?""",
    verbose=True,
    allow_delegation=False,
    llm="gpt-4o"
)

# Agent 2: The Critic
critic = Agent(
    role="Senior Editor",
    goal="Evaluate topic ideas for originality, audience fit, and commercial viability",
    backstory="""You've edited for major tech publications. You're ruthlessly practical.
    You kill ideas that are either too broad, too niche, or already well-covered.
    You score each idea 1-10 on: originality, audience demand, depth potential, 
    and competitive differentiation.""",
    verbose=True,
    allow_delegation=False,
    llm="claude-3-5-sonnet-20241022"  # Using a different model for diverse perspective
)

# Task 1: Generate ideas
ideation_task = Task(
    description="""Generate 10 article ideas about AI agents for a publication called DriftSeas.
    Our audience: senior developers, ML engineers, technical leads.
    Avoid: beginner tutorials, product announcements, hype pieces.
    Focus on: practical implementation, architectural decisions, failure modes, cost analysis.
    
    For each idea, provide:
    - Working title
    - One-sentence thesis
    - Why this matters now
    - Estimated word count""",
    expected_output="10 scored and annotated topic ideas",
    agent=ideator
)

# Task 2: Critique and rank
critique_task = Task(
    description="""Review the 10 topic ideas. Score each 1-10 on:
    1. Originality (has this angle been done to death?)
    2. Audience fit (would a senior dev actually click?)
    3. Depth potential (can we fill 2000+ words without padding?)
    4. Competitive differentiation (how does this stand out?)
    
    Kill any idea scoring below 25/40. For surviving ideas, suggest 
    specific improvements to the angle or framing.""",
    expected_output="Ranked and refined topic shortlist with scores and editorial notes",
    agent=critic,
    context=[ideation_task]  # Critic sees ideator's output
)

# Run the crew
crew = Crew(
    agents=[ideator, critic],
    tasks=[ideation_task, critique_task],
    verbose=True
)

result = crew.kickoff()
print(result)

The Human Checkpoint

This is critical: you pick the topic. The agents generate and filter, but the final selection requires editorial judgment that models still lack — specifically, the ability to sense what your specific audience needs right now based on conversations, trends, and gaps you've noticed firsthand.

What Doesn't Work

  • Asking a single model to "suggest blog topics" — you'll get the same 10 ideas everyone else gets
  • Using the same model for generation and critique — it has blind spots that a second model can catch
  • Letting agents pick the final topic without human input — they optimize for surface-level metrics, not editorial instinct

Stage 2: Research Agents

The Problem with AI Research

Models hallucinate. This isn't news, but the implications for research workflows are severe. A research agent that confidently cites nonexistent papers or misquotes documentation is worse than no research at all.

A Better Approach: Structured Research with Verification

I use a three-component research pipeline:

1. Web Research Agent (using Perplexity API or Tavily)

import os
from tavily import TavilyClient

client = TavilyClient(api_key=os.environ["TAVILY_API_KEY"])

def research_topic(topic: str, sub_questions: list[str]) -> dict:
    """Research a topic by breaking it into sub-questions."""
    results = {"main_topic": topic, "findings": []}
    
    # Main topic search
    main_search = client.search(
        query=topic,
        search_depth="advanced",  # More thorough than "basic"
        max_results=8,
        include_raw_content=False
    )
    results["main_findings"] = main_search["results"]
    
    # Sub-question deep dives
    for question in sub_questions:
        search_result = client.search(
            query=question,
            search_depth="advanced",
            max_results=5
        )
        results["findings"].append({
            "question": question,
            "sources": search_result["results"]
        })
    
    return results

# Example usage
research = research_topic(
    topic="AI agent memory architectures 2024 2025 comparison",
    sub_questions=[
        "What are the main approaches to long-term memory in LLM agents?",
        "How does RAG compare to fine-tuning for agent memory?",
        "What are the cost implications of different agent memory solutions?",
        "What failure modes exist in current agent memory systems?"
    ]
)

2. Code/Documentation Research Agent

For technical articles, you need agents that can actually read and understand codebases. This is where tools like Aider or Sweep shine — they can navigate repos and extract relevant implementation details.

from langchain_community.tools import WikipediaQueryRun
from langchain_community.utilities import WikipediaAPIWrapper
from langchain.tools import Tool

# Combine multiple research tools
tools = [
    # Wikipedia for background context
    Tool(
        name="Wikipedia",
        func=WikipediaQueryRun(api_wrapper=WikipediaAPIWrapper()).run,
        description="Search Wikipedia for background information and definitions"
    ),
    # ArXiv for academic papers
    # (using langchain's arxiv tool)
    # GitHub search for code examples
    # Documentation search for specific frameworks
]

3. The Verification Agent

This is the agent most people skip, and it's the one that matters most.

from pydantic import BaseModel, Field
from typing import Literal

class ClaimVerification(BaseModel):
    claim: str = Field(description="The factual claim being verified")
    source_url: str = Field(description="URL where this claim can be verified")
    confidence: Literal["verified", "likely_accurate", "unverified", "likely_false"]
    notes: str = Field(description="Any caveats or context needed")
    should_include: bool = Field(description="Whether to include this claim in the article")

# Use structured output to force the model to be explicit about verification status
verification_prompt = """You are a fact-checker. For each claim in the research summary below:

1. Identify the original source
2. Assess whether the source actually supports the claim
3. Flag any extrapolations, outdated information, or potential inaccuracies

Be conservative. If you cannot verify a claim, mark it as 'unverified'.
If a claim seems wrong, mark it as 'likely_false' even if you're not 100% sure.

Research Summary:
{research_summary}
"""

The Human Checkpoint

Before moving to drafting, you read the research summary. Specifically:

  • Check that key claims have sources
  • Verify any statistics or numbers cited
  • Identify gaps the agents missed (they always miss something)
  • Add your own firsthand knowledge and experience

Stage 3: Drafting Agents

The Architecture Problem

A single prompt producing a 2000-word article is a recipe for mediocrity. Long-form content has structural dependencies — the intro sets up expectations the body must fulfill, the argument needs consistent threading, the conclusion needs to pay off what was promised.

My Preferred Approach: Outline-First, Then Section-by-Section

from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser

class ArticleDrafter:
    def __init__(self, model_name="gpt-4o"):
        self.llm = ChatOpenAI(model=model_name, temperature=0.7)
        self.parser = StrOutputParser()
    
    def create_outline(self, topic: str, research: str, target_word_count: int) -> str:
        """Step 1: Create a detailed outline."""
        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are an expert technical writer. Create a detailed article outline.
            
            Rules:
            - Every section heading must promise specific value (no generic headings)
            - Include estimated word counts per section
            - Note where code examples, tables, or diagrams should go
            - Identify the single most important insight and place it strategically
            - The opening must hook the reader with a specific, concrete observation — not a platitude"""),
            ("human", """Write an outline for: {topic}
            
            Target length: {word_count} words
            
            Research summary:
            {research}
            
            Return a structured outline with headings, subheadings, key points, 
            and placement notes for examples and visuals.""")
        ])
        
        chain = prompt | self.llm | self.parser
        return chain.invoke({
            "topic": topic,
            "word_count": target_word_count,
            "research": research
        })
    
    def draft_section(self, section_heading: str, section_brief: str, 
                       previous_context: str, research: str) -> str:
        """Step 2: Draft individual sections with context from previous sections."""
        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are writing one section of a longer technical article.
            
            Style rules:
            - Write like you're explaining to a smart colleague, not a student
            - Use specific examples, not abstract descriptions
            - Every claim needs either evidence or a clear "in my experience" qualifier
            - Vary sentence length. Short punchy sentences. Then longer ones that 
              develop a thought with nuance and detail. Mix it up.
            - No filler phrases: "In today's fast-paced world", "It's worth noting that", 
              "Let's dive in" — all banned
            - If you include code, make it runnable, not pseudocode
            - End each section with either a concrete takeaway or a natural transition"""),
            ("human", """Write the section: {heading}
            
            Section brief: {brief}
            
            What came before (for continuity):
            {previous_context}
            
            Research material:
            {research}
            
            Write the section now. Be specific. Be concrete. Don't pad.""")
        ])
        
        chain = prompt | self.llm | self.parser
        return chain.invoke({
            "heading": section_heading,
            "brief": section_brief,
            "previous_context": previous_context,
            "research": research
        })
    
    def draft_full_article(self, topic: str, research: str, 
                           target_word_count: int = 2000) -> str:
        """Full pipeline: outline → section-by-section drafting."""
        # Step 1: Create outline
        outline = self.create_outline(topic, research, target_word_count)
        print("=== OUTLINE ===")
        print(outline)
        
        # Step 2: Parse outline into sections (simplified — in practice, 
        # use a structured output parser)
        sections = self._parse_outline(outline)
        
        # Step 3: Draft each section with accumulated context
        full_article = ""
        for section in sections:
            draft = self.draft_section(
                section_heading=section["heading"],
                section_brief=section["brief"],
                previous_context=full_article[-2000:] if full_article else "",
                research=research
            )
            full_article += f"\n\n{draft}"
            print(f"✓ Drafted: {section['heading']}")
        
        return full_article
    
    def _parse_outline(self, outline: str) -> list[dict]:
        """Parse outline text into structured sections.
        In production, use a structured output parser or regex."""
        # Simplified — replace with proper parsing
        sections = []
        current_section = None
        for line in outline.split("\n"):
            if line.startswith("## "):
                if current_section:
                    sections.append(current_section)
                current_section = {
                    "heading": line[3:].strip(),
                    "brief": ""
                }
            elif current_section and line.strip():
                current_section["brief"] += line + "\n"
        if current_section:
            sections.append(current_section)
        return sections

Why Section-by-Section Matters

I tested this extensively. A single-shot 2000-word article from GPT-4o has a median quality score (based on my internal rubric) of 5.8/10. The same model, same topic, section-by-section with accumulated context: 7.4/10. The difference is structural coherence and specificity.

The key is passing previous_context to each section draft. This prevents the model from repeating points, maintains voice consistency, and enables natural transitions.

The Human Checkpoint

After drafting, you need to:

  1. Read the whole thing — not skimming, reading
  2. Add your voice — AI drafts are a starting point. Inject your opinions, your experience, your specific examples
  3. Check the argument flow — does the article actually prove what it claims to?
  4. Kill your darlings — if a section doesn't serve the reader, cut it, even if the AI wrote it beautifully

Stage 4: Editing Agents

Multi-Pass Editing Architecture

Professional editors don't do a single pass. Neither should your agents. I run three distinct editing passes:

Pass 1: Structural Edit

structural_editor = Agent(
    role="Structural Editor",
    goal="Evaluate and improve article structure and argument flow",
    backstory="""You focus on the big picture. You ask:
    - Does the opening hook work?
    - Does each section earn its place?
    - Is the argument logical and progressive?
    - Are there gaps in reasoning?
    - Does the conclusion deliver on the opening's promise?
    
    You don't fix sentences. You fix architecture.""",
    verbose=True
)

structural_task = Task(
    description="""Perform a structural edit on this article:
    
    {article}
    
    Provide:
    1. Overall structural assessment (what works, what doesn't)
    2. Specific reorganization suggestions (move X before Y, merge A and B)
    3. Missing sections or arguments
    4. Sections that should be cut
    5. A revised outline if changes are significant""",
    expected_output="Structural edit report with specific, actionable recommendations",
    agent=structural_editor
)

Pass 2: Line Edit

line_editor = Agent(
    role="Line Editor",
    goal="Improve prose quality, clarity, and readability at the sentence level",
    backstory="""You are a meticulous line editor. You fix:
    - Unclear or ambiguous sentences
    - Passive voice where active is stronger
    - Redundancy and wordiness
    - Jargon without explanation
    - Inconsistent tone
    - Weak verbs and vague qualifiers
    
    You preserve the author's voice while elevating the prose.
    You use the Hemingway principle: if a sentence doesn't add value, delete it.""",
    verbose=True
)

line_edit_task = Task(
    description="""Line edit this article. For each change you make:
    1. Show the original text
    2. Show your revision
    3. Explain why you made the change
    
    Focus especially on:
    - Cutting filler phrases and empty calories
    - Strengthening weak constructions
    - Ensuring technical accuracy in language
    - Maintaining consistent voice throughout
    
    Article:
    {article}""",
    expected_output="Line-edited article with tracked changes and explanations",
    agent=line_editor
)

Pass 3: SEO and Metadata

seo_agent = Agent(
    role="SEO Specialist",
    goal="Optimize article for discoverability without sacrificing quality",
    backstory="""You understand that SEO and quality are not enemies.
    You suggest: natural keyword integration, meta descriptions, 
    internal linking opportunities, and header optimization.
    You NEVER suggest keyword stuffing or sacrificing readability for rankings.""",
    verbose=True
)

seo_task = Task(
    description="""Review this article and provide:
    1. Primary keyword target (2-4 word phrase)
    2. Secondary keywords (5-8 related terms)
    3. Optimized title (under 60 characters, includes primary keyword)
    4. Meta description (under 155 characters)
    5. Suggested internal/external links
    6. Header optimization suggestions
    
    Article:
    {article}""",
    expected_output="SEO optimization report with specific recommendations",
    agent=seo_agent
)

Quality Control: The Scoring Rubric

Here's the rubric I use to evaluate AI-assisted content before publication. I run this as an automated check:

from pydantic import BaseModel, Field

class ArticleScore(BaseModel):
    """Structured scoring for article quality assessment."""
    
    # Content Quality (40 points total)
    originality: int = Field(description="1-10: Does this say something new or say something old in a new way?")
    specificity: int = Field(description="1-10: Are claims backed by specific examples, data, or code?")
    depth: int = Field(description="1-10: Does it go beyond surface-level treatment?")
    practical_value: int = Field(description="1-10: Can a reader DO something differently after reading?")
    
    # Writing Quality (30 points total)
    clarity: int = Field(description="1-10: Is every sentence unambiguous?")
    engagement: int = Field(description="1-10: Would you keep reading past the first paragraph?")
    structure: int = Field(description="1-10: Does the article flow logically?")
    
    # Technical Quality (20 points total)
    accuracy: int = Field(description="1-10: Are all technical claims correct?")
    code_quality: int = Field(description="1-10: If code is included, is it correct and runnable?")
    
    # Polish (10 points total)
    grammar_and_style: int = Field(description="1-10: Grammar, punctuation, consistent style?")
    
    # Threshold
    total: int = Field(description="Sum of all scores")
    publish_ready: bool = Field(description="True if total >= 70")
    major_issues: list[str] = Field(description="List of issues that must be fixed before publishing")

def score_article(article: str) -> ArticleScore:
    """Use a strong model to score the article against the rubric."""
    llm = ChatOpenAI(model="gpt-4o", temperature=0)  # Low temp for consistent scoring
    
    scoring_prompt = ChatPromptTemplate.from_messages([
        ("system", """You are a strict editorial quality assessor. 
        Score the article honestly — most articles score 5-7 on individual metrics.
        A 9 or 10 should be rare and exceptional.
        A score below 5 on any metric means that area needs significant work.
        
        Be specific in your major_issues list. Not "improve the intro" but 
        "The opening paragraph uses a generic platitude about AI instead of 
        a specific hook. Replace with a concrete observation or data point."
        
        Apply the rubric strictly. Do not inflate scores."""),
        ("human", "Score this article:\n\n{article}")
    ])
    
    # Use structured output to get a reliable score
    structured_llm = llm.with_structured_output(ArticleScore)
    chain = scoring_prompt | structured_llm
    
    return chain.invoke({"article": article})

I set the publish threshold at 70/100. In practice, AI-drafted content that goes through the full pipeline (human revisions included) typically scores 72-85. Content that skips the human checkpoint averages 55-65. That gap is the difference between content that builds audience trust and content that erodes it.


Stage 5: Publishing Agents

What Can Be Automated

Publishing is where automation provides the clearest ROI because it's mostly mechanical:

import frontmatter
import datetime

def prepare_for_publish(article_text: str, metadata: dict) -> str:
    """Prepare article with frontmatter for static site generators."""
    
    post = frontmatter.Post(article_text)
    post["title"] = metadata["title"]
    post["date"] = datetime.date.today().isoformat()
    post["author"] = metadata["author"]
    post["tags"] = metadata["tags"]
    post["description"] = metadata["meta_description"]
    post["draft"] = False
    
    # Generate filename from title
    slug = metadata["title"].lower().replace(" ", "-")[:60]
    filename = f"{datetime.date.today().isoformat()}-{slug}.md"
    
    with open(f"content/posts/{filename}", "w") as f:
        f.write(frontmatter.dumps(post))
    
    return filename

def generate_social_posts(article_title: str, article_summary: str, 
                          article_url: str) -> dict:
    """Generate platform-specific social media posts."""
    
    llm = ChatOpenAI(model="gpt-4o-mini", temperature=0.8)  # Mini is fine for this
    prompt = ChatPromptTemplate.from_messages([
        ("system", """Write social media posts to promote this article.
        Create one version each for:
        
        1. Twitter/X (under 280 chars, punchy, include a hook)
        2. LinkedIn (professional tone, 2-3 sentences, end with question for engagement)
        3. Hacker News (title only — factual, no hype, under 80 chars)
        
        Be specific about the article's value. No generic "check out our latest post" language."""),
        ("human", """Title: {title}
        Summary: {summary}
        URL: {url}""")
    ])
    
    chain = prompt | llm | StrOutputParser()
    return chain.invoke({
        "title": article_title,
        "summary": article_summary,
        "url": article_url
    })

What Shouldn't Be Automated

  • Final publish button — always eyeball the preview one more time
  • Responses to comments — AI-generated comment responses are immediately obvious and actively harmful to community trust
  • Cross-posting decisions — not every article belongs on every platform

Putting It All Together: The Full Orchestration

Here's how I tie it all together using CrewAI's orchestration:

from crewai import Crew, Process

def run_content_pipeline(topic: str, research_urls: list[str] = None):
    """Full content creation pipeline."""
    
    # Phase 1: Research (can run in parallel with outline)
    research_results = research_topic(topic, sub_questions=[
        f"What is the current state of {topic}?",
        f"What are the main challenges with {topic}?",
        f"What tools and frameworks exist for {topic}?",
        f"What are real-world examples of {topic} in production?"
    ])
    
    # Phase 2: Draft
    drafter = ArticleDrafter(model_name="gpt-4o")
    draft = drafter.draft_full_article(
        topic=topic,
        research=str(research_results),
        target_word_count=2000
    )
    
    # Phase 3: Edit (multi-agent crew)
    editing_crew = Crew(
        agents=[structural_editor, line_editor, seo_agent],
        tasks=[structural_task, line_edit_task, seo_task],
        process=Process.sequential,  # Structural → Line → SEO
        verbose=True
    )
    
    edited_result = editing_crew.kickoff(inputs={"article": draft})
    
    # Phase 4: Score
    final_article = str(edited_result)
    score = score_article(final_article)
    
    print(f"\n{'='*50}")
    print(f"ARTICLE SCORE: {score.total}/100")
    print(f"Publish Ready: {score.publish_ready}")
    if score.major_issues:
        print(f"\nIssues to fix:")
        for issue in score.major_issues:
            print(f"  - {issue}")
    print(f"{'='*50}\n")
    
    return final_article, score

Cost Analysis

Let's talk money. Here's what a typical 2000-word article costs through this pipeline:

Stage Model Est. Tokens (in+out) Est. Cost
Ideation GPT-4o + Claude 3.5 ~8,000 $0.15
Research Tavily API 15 queries $0.15
Verification GPT-4o ~4,000 $0.08
Outline GPT-4o ~3,000 $0.06
Drafting (6 sections) GPT-4o ~24,000 $0.48
Structural Edit GPT-4o ~6,000 $0.12
Line Edit Claude 3.5 ~8,000 $0.15
SEO GPT-4o-mini ~4,000 $0.01
Scoring GPT-4o ~4,000 $0.08
Social Posts GPT-4o-mini ~2,000 $0.01
Total ~$1.29

That's roughly $1.29 in API costs per article. At an hourly rate of $100/hour for a senior writer, you need to save 47 seconds of their time to break even. The real cost is the human time — reading research, revising drafts, adding your expertise. Budget 45-90 minutes of human time per article, down from the 4-6 hours a fully manual approach typically requires.


Honest Limitations

I'd be doing you a disservice if I didn't call out where this pipeline breaks down:

Voice consistency is hard. Even with detailed style guides in system prompts, AI agents drift. Your 50th article will sound different from your 1st unless you actively maintain voice guidelines and periodically audit output.

Novel insights are rare. AI agents are excellent synthesizers but poor originators. If your article's core value proposition is a genuinely new idea, the AI can't generate that. It can help you articulate it, but the insight has to come from you.

Context windows are still a constraint. For very long articles (5000+ words), section-by-section drafting becomes essential but you lose some global coherence. Anthropic's 200K context helps, but even then, attention degrades over long inputs.

Code examples need manual testing. I've caught AI-generated code that looks correct but would fail at runtime due to deprecated APIs, incorrect parameter names, or subtle logic errors. Always run the code.

The "uncanny valley" of quality. AI-edited content sometimes lands in an awkward middle ground — too polished to feel authentic, but not polished enough to feel professional. This is where your human editorial pass matters most: adding back the rough edges that make writing feel real.


The Bottom Line

The best content pipeline in 2025 isn't "AI writes, human publishes." It's a structured collaboration where AI handles the high-volume, repeatable tasks — research synthesis, first drafts, structural editing, SEO optimization, publishing mechanics — while humans provide the things AI still can't: original insights, editorial judgment, voice, and the willingness to say "this isn't good enough yet."

Build the pipeline. Set the quality bar high. Keep your hands on the wheel. The throughput gains are real — but only if you refuse to ship anything that doesn't meet your standards, regardless of how efficiently it was produced.

Keywords

AI agentcreative-agents