Back to Home
Data Agents

AI Data Agents: How to Analyze Datasets Without Writing SQL

Sarah Kim

Quantitative researcher turned AI writer. Specializes in financial AI agents.

April 27, 202614 min read

You have a CSV with 50,000 rows of sales data. Your VP wants to know which customer segments are underperforming by region, with a visual breakdown, by end of day. You could write SQL, wrangle pandas,...

AI Agents for Data Analysis Without SQL: A Practical Guide for 2024

You have a CSV with 50,000 rows of sales data. Your VP wants to know which customer segments are underperforming by region, with a visual breakdown, by end of day. You could write SQL, wrangle pandas, build charts in matplotlib, and assemble a report in Google Slides. Or you could have an AI agent do most of that in minutes.

This guide covers the real tools, the actual workflows, and the honest limitations of using AI agents for data analysis without touching SQL. No hype—just what works, what doesn't, and how to get the most out of these tools.


Natural Language Querying: Ask Questions, Get Data

Natural language querying lets you ask questions like "What were total sales by region in Q3?" and get structured data back without writing SQL or pandas code. The AI agent translates your intent into executable code, runs it, and returns results.

Tools Worth Using

PandasAI is a Python library that wraps a pandas DataFrame with an LLM-powered interface. You ask questions in plain English, and it generates and executes pandas code behind the scenes.

import pandas as pd
from pandasai import SmartDataframe

sales_df = pd.read_csv("sales_data.csv")
sdf = SmartDataframe(sales_df, config={"llm": "openai"})

# Ask in plain English
result = sdf.chat("What are the top 5 products by revenue?")
print(result)

# Follow up with context
result2 = sdf.chat("Now show me the month-over-month growth for those products")

PandasAI is genuinely useful for exploratory analysis. It handles follow-up questions reasonably well because it maintains conversational context. The main limitation: it struggles with complex multi-step logic. Ask it to do a window function equivalent or a multi-table merge, and it often hallucinates incorrect code.

Vanna.ai takes a different approach—it generates SQL for you against your actual database, but you never write the SQL yourself. It learns your schema and builds a retrieval-augmented generation (RAG) model over your database metadata.

from vanna.chromadb import ChromaDB_VectorStore
from vanna.openai import OpenAI_Chat

class MyVanna(ChromaDB_VectorStore, OpenAI_Chat):
    def __init__(self, config=None):
        ChromaDB_VectorStore.__init__(self, config=config)
        OpenAI_Chat.__init__(self, config=config)

vn = MyVanna(config={'model': 'gpt-4'})
vn.connect_to_sqlite('sales.db')

# Train on your schema
vn.train(ddl="CREATE TABLE sales (id INT, product TEXT, region TEXT, revenue FLOAT, date DATE)")

# Ask questions
sql = vn.generate_sql("Which region had the highest average order value last quarter?")
df = vn.run_sql(sql)

Vanna's strength is accuracy on structured database queries. Because it's grounded in your actual schema and trains on your DDL, it produces more reliable SQL than a generic LLM. The tradeoff: initial setup requires training it on your schema, and it's still fundamentally a SQL generator—you just don't have to write the SQL.

Julius AI is a hosted platform (no code required) where you upload a file and chat with it. It's the fastest path from "I have data" to "I have answers." The free tier handles most basic queries; paid plans unlock larger datasets and more complex analysis.

What Actually Works Well

Simple aggregation queries are the sweet spot. "Total revenue by month," "average order value by customer segment," "count of transactions by day of week"—these work reliably across all the tools above.

What fails: queries requiring domain-specific logic. If "active customer" in your business means "purchased in the last 90 days and has a subscription status of 'pro'," you need to tell the agent that definition explicitly. It won't infer it.

Practical Tip

Always validate the generated code. PandasAI and Vanna both let you inspect what code they actually ran:

# PandasAI - see the generated code
sdf.chat("What is the average revenue per transaction by region?")
print(sdf.last_code_generated)  # Shows the actual pandas code

This is non-negotiable in any professional context. I've seen agents silently drop null values, apply wrong groupings, or misinterpret date formats. The code inspection is your QA layer.


Automatic Visualization: From Question to Chart in Seconds

The leap from "show me a table" to "show me a chart" is where AI agents start feeling like magic—when they pick the right chart type.

How It Works

Most tools follow the same pattern: the agent analyzes the data shape (categorical vs. continuous, number of unique values, time-series nature) and selects a visualization accordingly. Under the hood, they're generating matplotlib, plotly, or seaborn code.

Tools and Examples

ChatGPT with Advanced Data Analysis (Code Interpreter) remains the gold standard for ad-hoc visualization. Upload a CSV, describe what you want, and it generates interactive Plotly charts or publication-quality matplotlib figures.

The workflow looks like this:

You: "Here's my sales data. Show me revenue trends by product category 
      over the last 12 months, highlighting any seasonal patterns."

ChatGPT: [Generates Plotly line chart with multiple traces, adds annotations 
          for peak months, includes a moving average trendline]

What makes it good: it makes reasonable aesthetic choices (color palettes, chart types, labeling) without you specifying them. What makes it frustrating: you can't easily iterate on the styling in a programmatic, reproducible way.

For reproducible pipelines, build visualization into a LangChain agent:

from langchain_experimental.agents import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
import pandas as pd

df = pd.read_csv("sales_data.csv")
llm = ChatOpenAI(model="gpt-4", temperature=0)

agent = create_pandas_dataframe_agent(
    llm,
    df,
    verbose=True,
    allow_dangerous_code=True  # Required for code execution
)

response = agent.invoke({
    "input": """Create a grouped bar chart showing total revenue by region 
    and product category for 2024. Use plotly, save it as 'revenue_chart.html'. 
    Include proper titles and axis labels."""
})

This generates a self-contained HTML file with an interactive Plotly chart. The agent handles the entire matplotlib-vs-plotly decision, data transformation, and rendering.

For teams that need a no-code interface, Julius AI and Observable's AI assistant both provide drag-and-drop-plus-chat interfaces. Julius is better for quick analysis; Observable is better if you want to build shareable, interactive data apps.

Chart Selection: What the Agents Get Right and Wrong

Data Pattern Agent's Typical Choice Correct?
Trend over time Line chart ✅ Yes
Comparison across categories Bar chart ✅ Yes
Distribution of values Histogram ✅ Usually
Relationship between two variables Scatter plot ✅ Yes
Part-of-whole (5+ categories) Pie chart ❌ Should use bar chart
High-cardinality categorical Bar chart (unreadable) ❌ Should aggregate or use treemap
Time series with seasonality Line chart (no decomposition) ⚠️ Misses the insight

The pie chart problem is persistent. Agents default to pie charts for proportional data even when there are too many categories. You'll often need to explicitly say "don't use a pie chart" or "use a horizontal bar chart instead."

Practical Tip

When requesting visualizations, specify the output format explicitly:

# Vague - agent picks whatever
"Show me revenue by region"

# Better - you control the output
"Create a horizontal bar chart of total revenue by region using plotly. 
 Sort bars in descending order. Use a blue color gradient. 
 Save as 'region_revenue.html'."

The more specific you are, the less iteration you'll need. Think of it like giving specs to a junior analyst—ambiguity leads to rework.


Statistical Analysis: Beyond Basic Aggregations

This is where AI agents start to both impress and concern me. They can run sophisticated statistical tests, but interpreting results correctly requires understanding that LLMs sometimes lack.

What Agents Can Do Well

Descriptive statistics and distribution analysis are reliable:

from langchain_experimental.agents import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
import pandas as pd

df = pd.read_csv("customer_data.csv")
agent = create_pandas_dataframe_agent(ChatOpenAI(model="gpt-4"), df, verbose=True)

# This works well
response = agent.invoke({
    "input": """Analyze the distribution of customer lifetime value. 
    Calculate mean, median, standard deviation, skewness, and kurtosis. 
    Create a histogram with a KDE overlay. Test for normality using 
    the Shapiro-Wilk test and report the results."""
})

The agent will correctly import scipy.stats, run the test, and report the statistic and p-value. It'll even explain what the results mean in context.

Correlation analysis and feature relationships are also solid ground:

response = agent.invoke({
    "input": """Compute the correlation matrix for all numeric columns. 
    Identify the top 5 strongest correlations. For each, create a scatter 
    plot and calculate the Pearson r and p-value. Flag any correlations 
    above 0.7 that might indicate multicollinearity."""
})

Where You Need to Be Careful

Hypothesis testing is where I've seen agents make subtle errors. Common issues:

  1. Choosing the wrong test. Ask it to compare means between two groups, and it might use a t-test when the data is heavily skewed and a Mann-Whitney U test is more appropriate. It defaults to parametric tests.

  2. Misinterpreting p-values. I've seen agents describe a p-value of 0.06 as "marginally significant" and suggest the result "approaches significance"—language that reflects common misuse in published research, not sound statistical practice.

  3. Ignoring assumptions. A t-test assumes normality and equal variances. Agents often run the test without checking these assumptions unless you explicitly ask.

Here's how to prompt more defensively:

response = agent.invoke({
    "input": """Compare average order value between the 'premium' and 
    'standard' customer segments.
    
    Before running any test:
    1. Check the distribution of each group (normality test, visual inspection)
    2. Check for equal variances (Levene's test)
    3. Based on the results, choose the appropriate test (t-test vs Mann-Whitney)
    4. Report effect size (Cohen's d) alongside the p-value
    5. State your conclusions conservatively"""
})

Regression analysis works but requires careful prompting:

response = agent.invoke({
    "input": """Build a linear regression model to predict customer lifetime 
    value based on: tenure_months, support_tickets, product_count, and 
    avg_session_duration.
    
    Steps:
    1. Check for and handle missing values
    2. Check VIF for multicollinearity
    3. Fit the model with train/test split (80/20)
    4. Report R², adjusted R², RMSE, and MAE
    5. Show coefficient significance and confidence intervals
    6. Plot residuals vs fitted values to check homoscedasticity
    7. Flag any concerning patterns"""
})

Tools for Statistical Depth

Julius AI handles statistical analysis better than most chat-based tools because it runs code in a controlled Python environment and can produce formatted statistical reports. It's particularly good at choosing appropriate tests based on data characteristics.

For production pipelines, I'd recommend wrapping statistical analysis in a custom LangChain agent with a specialized prompt:

from langchain.agents import AgentExecutor
from langchain.agents.openai_functions_agent.base import OpenAIFunctionsAgent
from langchain.tools import Tool
from langchain.prompts import MessagesPlaceholder

STATISTICIAN_SYSTEM_PROMPT = """You are a senior data analyst. When performing 
statistical tests:
- Always check assumptions before running parametric tests
- Report effect sizes alongside p-values  
- Use 95% confidence intervals
- Flag small sample sizes (n < 30) explicitly
- Never claim causation from observational data
- If a test result is ambiguous, say so"""

System prompts like this materially improve the quality of statistical analysis. Without them, agents default to the most common (often sloppy) analytical patterns they've seen in training data.


Report Generation: From Analysis to Deliverable

This is the final mile, and it's where AI agents save the most time. The gap between "I have charts and numbers" and "I have a polished report" is traditionally hours of formatting work.

Approaches That Work

Approach 1: Jupyter Notebook → PDF/HTML

The simplest workflow: have the agent generate a complete Jupyter notebook with analysis, visualizations, and narrative text, then convert it.

from langchain_experimental.agents import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
import pandas as pd

df = pd.read_csv("quarterly_sales.csv")
agent = create_pandas_dataframe_agent(ChatOpenAI(model="gpt-4"), df, verbose=True)

response = agent.invoke({
    "input": """Create a complete quarterly sales analysis report as a 
    Python script that generates a Jupyter notebook (.ipynb). Include:
    
    1. Executive summary (3-4 bullet points with key findings)
    2. Revenue analysis by region with charts
    3. Product performance comparison
    4. Customer segment trends
    5. Statistical tests on key metrics
    6. Recommendations section
    
    Use markdown cells for narrative text and code cells for analysis.
    All charts should use plotly for interactivity."""
})

Then convert: jupyter nbconvert --to html report.ipynb

Approach 2: LangChain + Document Generation

For more polished output, combine LangChain with document generation tools:

from langchain.tools import Tool
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain.prompts import PromptTemplate
import json

# Define tools
def generate_chart(chart_spec_json):
    """Generate a plotly chart from a JSON specification"""
    import plotly.graph_objects as go
    spec = json.loads(chart_spec_json)
    # ... chart generation logic
    fig.write_image(spec.get("output", "chart.png"))
    return f"Chart saved to {spec.get('output', 'chart.png')}"

def write_markdown_report(content):
    """Write content to a markdown file"""
    with open("report.md", "w") as f:
        f.write(content)
    return "Report written to report.md"

tools = [
    Tool(name="generate_chart", func=generate_chart, description="Generate a chart"),
    Tool(name="write_report", func=write_markdown_report, description="Write report content")
]

Approach 3: PandasAI + Streamlit for Interactive Reports

If your audience needs to explore the data themselves, have the agent build a Streamlit app:

response = agent.invoke({
    "input": """Generate a complete Streamlit app (save as 'dashboard.py') 
    that provides:
    - A sidebar with filters for date range, region, and product category
    - KPI cards showing total revenue, average order value, and transaction count
    - A revenue trend line chart
    - A breakdown table by product category
    - Download button for filtered data as CSV
    
    Use st.cache_data for performance. Make it production-ready."""
})

Then run: streamlit run dashboard.py

This approach turns a one-time report into an interactive tool. The agent-generated Streamlit code works surprisingly well for standard dashboard patterns.

Report Quality: Honest Assessment

AI-generated reports are good enough for internal team reviews and quick stakeholder updates. They are not good enough for board presentations, investor materials, or published research without human editing.

Common issues I've seen:

  • Narrative doesn't match the data. The agent will sometimes write "revenue increased significantly" when the chart shows a 2% change.
  • Boilerplate recommendations. "Consider investing in underperforming regions" without specifics.
  • Inconsistent formatting. Number formats, date formats, and decimal places vary across sections.
  • Missing context. The agent doesn't know that Q3 was historically slow because of your industry's seasonal pattern.

Budget 30-60 minutes of human editing for every AI-generated report. That's still a massive time savings compared to building from scratch.


Putting It All Together: A Complete Workflow

Here's a realistic end-to-end workflow using open-source tools:

import pandas as pd
from pandasai import SmartDataframe
from langchain_experimental.agents import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI

# Step 1: Load and explore with PandasAI
df = pd.read_csv("sales_2024.csv")
sdf = SmartDataframe(df, config={"llm": "openai"})

# Quick exploration
print(sdf.chat("Summarize this dataset: columns, row count, date range, and any missing values"))
print(sdf.chat("What are the main trends in the data?"))

# Step 2: Deep analysis with LangChain agent
llm = ChatOpenAI(model="gpt-4", temperature=0)
agent = create_pandas_dataframe_agent(llm, df, verbose=True)

analysis = agent.invoke({
    "input": """Perform a complete analysis:
    1. Revenue trends by month with YoY comparison
    2. Top and bottom performing products (Pareto analysis)
    3. Customer segmentation using RFM analysis
    4. Statistical test: is there a significant difference in AOV between segments?
    5. Forecast next quarter's revenue using simple linear extrapolation
    
    Save all charts as HTML files in a 'charts/' directory."""
})

# Step 3: Generate report
report = agent.invoke({
    "input": """Based on the analysis just completed, write a markdown report 
    saved as 'Q3_report.md'. Include embedded references to the chart files 
    in the charts/ directory. Write an executive summary suitable for a 
    non-technical VP of Sales."""
})

Limitations You Should Know About

Cost. Running GPT-4 on large DataFrames is expensive. A single complex analysis session can cost $1-5 in API calls. For routine analysis, use GPT-3.5-turbo or local models (Llama 3, Mistral) with Ollama.

Data size. Most tools struggle beyond ~100K rows. The data gets summarized or sampled before being sent to the LLM, which means you lose granularity. For large datasets, use Vanna.ai (which pushes computation to the database) or pre-aggregate your data.

Reproducibility. LLM outputs are non-deterministic. The same prompt can produce different code on different runs. For reproducible analysis, save the generated code and re-run it directly rather than re-prompting.

Privacy. Your data goes to OpenAI's API (or whichever LLM you use). For sensitive data, run local models via Ollama or use Azure OpenAI with data processing agreements.


The Bottom Line

AI agents for data analysis are genuinely useful—not as replacements for data analysts, but as force multipliers. They compress the exploration phase from hours to minutes, make statistical analysis accessible to non-specialists, and automate the tedious parts of report generation.

The tools that deliver the most value right now: PandasAI for quick exploration, Vanna.ai for database-backed querying, ChatGPT Code Interpreter for one-off analysis, and LangChain custom agents for reproducible pipelines.

Start with a simple workflow—upload a CSV, ask five questions, generate a chart—and iterate from there. The technology is good enough to be useful today and improving fast enough to be significantly better in six months.

Keywords

AI agentdata-agents