AI Data Agents: How to Analyze Datasets Without Writing SQL
Sarah Kim
Quantitative researcher turned AI writer. Specializes in financial AI agents.
You have a CSV with 50,000 rows of sales data. Your VP wants to know which customer segments are underperforming by region, with a visual breakdown, by end of day. You could write SQL, wrangle pandas,...
AI Agents for Data Analysis Without SQL: A Practical Guide for 2024
You have a CSV with 50,000 rows of sales data. Your VP wants to know which customer segments are underperforming by region, with a visual breakdown, by end of day. You could write SQL, wrangle pandas, build charts in matplotlib, and assemble a report in Google Slides. Or you could have an AI agent do most of that in minutes.
This guide covers the real tools, the actual workflows, and the honest limitations of using AI agents for data analysis without touching SQL. No hype—just what works, what doesn't, and how to get the most out of these tools.
Natural Language Querying: Ask Questions, Get Data
Natural language querying lets you ask questions like "What were total sales by region in Q3?" and get structured data back without writing SQL or pandas code. The AI agent translates your intent into executable code, runs it, and returns results.
Tools Worth Using
PandasAI is a Python library that wraps a pandas DataFrame with an LLM-powered interface. You ask questions in plain English, and it generates and executes pandas code behind the scenes.
import pandas as pd
from pandasai import SmartDataframe
sales_df = pd.read_csv("sales_data.csv")
sdf = SmartDataframe(sales_df, config={"llm": "openai"})
# Ask in plain English
result = sdf.chat("What are the top 5 products by revenue?")
print(result)
# Follow up with context
result2 = sdf.chat("Now show me the month-over-month growth for those products")
PandasAI is genuinely useful for exploratory analysis. It handles follow-up questions reasonably well because it maintains conversational context. The main limitation: it struggles with complex multi-step logic. Ask it to do a window function equivalent or a multi-table merge, and it often hallucinates incorrect code.
Vanna.ai takes a different approach—it generates SQL for you against your actual database, but you never write the SQL yourself. It learns your schema and builds a retrieval-augmented generation (RAG) model over your database metadata.
from vanna.chromadb import ChromaDB_VectorStore
from vanna.openai import OpenAI_Chat
class MyVanna(ChromaDB_VectorStore, OpenAI_Chat):
def __init__(self, config=None):
ChromaDB_VectorStore.__init__(self, config=config)
OpenAI_Chat.__init__(self, config=config)
vn = MyVanna(config={'model': 'gpt-4'})
vn.connect_to_sqlite('sales.db')
# Train on your schema
vn.train(ddl="CREATE TABLE sales (id INT, product TEXT, region TEXT, revenue FLOAT, date DATE)")
# Ask questions
sql = vn.generate_sql("Which region had the highest average order value last quarter?")
df = vn.run_sql(sql)
Vanna's strength is accuracy on structured database queries. Because it's grounded in your actual schema and trains on your DDL, it produces more reliable SQL than a generic LLM. The tradeoff: initial setup requires training it on your schema, and it's still fundamentally a SQL generator—you just don't have to write the SQL.
Julius AI is a hosted platform (no code required) where you upload a file and chat with it. It's the fastest path from "I have data" to "I have answers." The free tier handles most basic queries; paid plans unlock larger datasets and more complex analysis.
What Actually Works Well
Simple aggregation queries are the sweet spot. "Total revenue by month," "average order value by customer segment," "count of transactions by day of week"—these work reliably across all the tools above.
What fails: queries requiring domain-specific logic. If "active customer" in your business means "purchased in the last 90 days and has a subscription status of 'pro'," you need to tell the agent that definition explicitly. It won't infer it.
Practical Tip
Always validate the generated code. PandasAI and Vanna both let you inspect what code they actually ran:
# PandasAI - see the generated code
sdf.chat("What is the average revenue per transaction by region?")
print(sdf.last_code_generated) # Shows the actual pandas code
This is non-negotiable in any professional context. I've seen agents silently drop null values, apply wrong groupings, or misinterpret date formats. The code inspection is your QA layer.
Automatic Visualization: From Question to Chart in Seconds
The leap from "show me a table" to "show me a chart" is where AI agents start feeling like magic—when they pick the right chart type.
How It Works
Most tools follow the same pattern: the agent analyzes the data shape (categorical vs. continuous, number of unique values, time-series nature) and selects a visualization accordingly. Under the hood, they're generating matplotlib, plotly, or seaborn code.
Tools and Examples
ChatGPT with Advanced Data Analysis (Code Interpreter) remains the gold standard for ad-hoc visualization. Upload a CSV, describe what you want, and it generates interactive Plotly charts or publication-quality matplotlib figures.
The workflow looks like this:
You: "Here's my sales data. Show me revenue trends by product category
over the last 12 months, highlighting any seasonal patterns."
ChatGPT: [Generates Plotly line chart with multiple traces, adds annotations
for peak months, includes a moving average trendline]
What makes it good: it makes reasonable aesthetic choices (color palettes, chart types, labeling) without you specifying them. What makes it frustrating: you can't easily iterate on the styling in a programmatic, reproducible way.
For reproducible pipelines, build visualization into a LangChain agent:
from langchain_experimental.agents import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
import pandas as pd
df = pd.read_csv("sales_data.csv")
llm = ChatOpenAI(model="gpt-4", temperature=0)
agent = create_pandas_dataframe_agent(
llm,
df,
verbose=True,
allow_dangerous_code=True # Required for code execution
)
response = agent.invoke({
"input": """Create a grouped bar chart showing total revenue by region
and product category for 2024. Use plotly, save it as 'revenue_chart.html'.
Include proper titles and axis labels."""
})
This generates a self-contained HTML file with an interactive Plotly chart. The agent handles the entire matplotlib-vs-plotly decision, data transformation, and rendering.
For teams that need a no-code interface, Julius AI and Observable's AI assistant both provide drag-and-drop-plus-chat interfaces. Julius is better for quick analysis; Observable is better if you want to build shareable, interactive data apps.
Chart Selection: What the Agents Get Right and Wrong
| Data Pattern | Agent's Typical Choice | Correct? |
|---|---|---|
| Trend over time | Line chart | ✅ Yes |
| Comparison across categories | Bar chart | ✅ Yes |
| Distribution of values | Histogram | ✅ Usually |
| Relationship between two variables | Scatter plot | ✅ Yes |
| Part-of-whole (5+ categories) | Pie chart | ❌ Should use bar chart |
| High-cardinality categorical | Bar chart (unreadable) | ❌ Should aggregate or use treemap |
| Time series with seasonality | Line chart (no decomposition) | ⚠️ Misses the insight |
The pie chart problem is persistent. Agents default to pie charts for proportional data even when there are too many categories. You'll often need to explicitly say "don't use a pie chart" or "use a horizontal bar chart instead."
Practical Tip
When requesting visualizations, specify the output format explicitly:
# Vague - agent picks whatever
"Show me revenue by region"
# Better - you control the output
"Create a horizontal bar chart of total revenue by region using plotly.
Sort bars in descending order. Use a blue color gradient.
Save as 'region_revenue.html'."
The more specific you are, the less iteration you'll need. Think of it like giving specs to a junior analyst—ambiguity leads to rework.
Statistical Analysis: Beyond Basic Aggregations
This is where AI agents start to both impress and concern me. They can run sophisticated statistical tests, but interpreting results correctly requires understanding that LLMs sometimes lack.
What Agents Can Do Well
Descriptive statistics and distribution analysis are reliable:
from langchain_experimental.agents import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
import pandas as pd
df = pd.read_csv("customer_data.csv")
agent = create_pandas_dataframe_agent(ChatOpenAI(model="gpt-4"), df, verbose=True)
# This works well
response = agent.invoke({
"input": """Analyze the distribution of customer lifetime value.
Calculate mean, median, standard deviation, skewness, and kurtosis.
Create a histogram with a KDE overlay. Test for normality using
the Shapiro-Wilk test and report the results."""
})
The agent will correctly import scipy.stats, run the test, and report the statistic and p-value. It'll even explain what the results mean in context.
Correlation analysis and feature relationships are also solid ground:
response = agent.invoke({
"input": """Compute the correlation matrix for all numeric columns.
Identify the top 5 strongest correlations. For each, create a scatter
plot and calculate the Pearson r and p-value. Flag any correlations
above 0.7 that might indicate multicollinearity."""
})
Where You Need to Be Careful
Hypothesis testing is where I've seen agents make subtle errors. Common issues:
Choosing the wrong test. Ask it to compare means between two groups, and it might use a t-test when the data is heavily skewed and a Mann-Whitney U test is more appropriate. It defaults to parametric tests.
Misinterpreting p-values. I've seen agents describe a p-value of 0.06 as "marginally significant" and suggest the result "approaches significance"—language that reflects common misuse in published research, not sound statistical practice.
Ignoring assumptions. A t-test assumes normality and equal variances. Agents often run the test without checking these assumptions unless you explicitly ask.
Here's how to prompt more defensively:
response = agent.invoke({
"input": """Compare average order value between the 'premium' and
'standard' customer segments.
Before running any test:
1. Check the distribution of each group (normality test, visual inspection)
2. Check for equal variances (Levene's test)
3. Based on the results, choose the appropriate test (t-test vs Mann-Whitney)
4. Report effect size (Cohen's d) alongside the p-value
5. State your conclusions conservatively"""
})
Regression analysis works but requires careful prompting:
response = agent.invoke({
"input": """Build a linear regression model to predict customer lifetime
value based on: tenure_months, support_tickets, product_count, and
avg_session_duration.
Steps:
1. Check for and handle missing values
2. Check VIF for multicollinearity
3. Fit the model with train/test split (80/20)
4. Report R², adjusted R², RMSE, and MAE
5. Show coefficient significance and confidence intervals
6. Plot residuals vs fitted values to check homoscedasticity
7. Flag any concerning patterns"""
})
Tools for Statistical Depth
Julius AI handles statistical analysis better than most chat-based tools because it runs code in a controlled Python environment and can produce formatted statistical reports. It's particularly good at choosing appropriate tests based on data characteristics.
For production pipelines, I'd recommend wrapping statistical analysis in a custom LangChain agent with a specialized prompt:
from langchain.agents import AgentExecutor
from langchain.agents.openai_functions_agent.base import OpenAIFunctionsAgent
from langchain.tools import Tool
from langchain.prompts import MessagesPlaceholder
STATISTICIAN_SYSTEM_PROMPT = """You are a senior data analyst. When performing
statistical tests:
- Always check assumptions before running parametric tests
- Report effect sizes alongside p-values
- Use 95% confidence intervals
- Flag small sample sizes (n < 30) explicitly
- Never claim causation from observational data
- If a test result is ambiguous, say so"""
System prompts like this materially improve the quality of statistical analysis. Without them, agents default to the most common (often sloppy) analytical patterns they've seen in training data.
Report Generation: From Analysis to Deliverable
This is the final mile, and it's where AI agents save the most time. The gap between "I have charts and numbers" and "I have a polished report" is traditionally hours of formatting work.
Approaches That Work
Approach 1: Jupyter Notebook → PDF/HTML
The simplest workflow: have the agent generate a complete Jupyter notebook with analysis, visualizations, and narrative text, then convert it.
from langchain_experimental.agents import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
import pandas as pd
df = pd.read_csv("quarterly_sales.csv")
agent = create_pandas_dataframe_agent(ChatOpenAI(model="gpt-4"), df, verbose=True)
response = agent.invoke({
"input": """Create a complete quarterly sales analysis report as a
Python script that generates a Jupyter notebook (.ipynb). Include:
1. Executive summary (3-4 bullet points with key findings)
2. Revenue analysis by region with charts
3. Product performance comparison
4. Customer segment trends
5. Statistical tests on key metrics
6. Recommendations section
Use markdown cells for narrative text and code cells for analysis.
All charts should use plotly for interactivity."""
})
Then convert: jupyter nbconvert --to html report.ipynb
Approach 2: LangChain + Document Generation
For more polished output, combine LangChain with document generation tools:
from langchain.tools import Tool
from langchain_openai import ChatOpenAI
from langchain.agents import AgentExecutor, create_react_agent
from langchain.prompts import PromptTemplate
import json
# Define tools
def generate_chart(chart_spec_json):
"""Generate a plotly chart from a JSON specification"""
import plotly.graph_objects as go
spec = json.loads(chart_spec_json)
# ... chart generation logic
fig.write_image(spec.get("output", "chart.png"))
return f"Chart saved to {spec.get('output', 'chart.png')}"
def write_markdown_report(content):
"""Write content to a markdown file"""
with open("report.md", "w") as f:
f.write(content)
return "Report written to report.md"
tools = [
Tool(name="generate_chart", func=generate_chart, description="Generate a chart"),
Tool(name="write_report", func=write_markdown_report, description="Write report content")
]
Approach 3: PandasAI + Streamlit for Interactive Reports
If your audience needs to explore the data themselves, have the agent build a Streamlit app:
response = agent.invoke({
"input": """Generate a complete Streamlit app (save as 'dashboard.py')
that provides:
- A sidebar with filters for date range, region, and product category
- KPI cards showing total revenue, average order value, and transaction count
- A revenue trend line chart
- A breakdown table by product category
- Download button for filtered data as CSV
Use st.cache_data for performance. Make it production-ready."""
})
Then run: streamlit run dashboard.py
This approach turns a one-time report into an interactive tool. The agent-generated Streamlit code works surprisingly well for standard dashboard patterns.
Report Quality: Honest Assessment
AI-generated reports are good enough for internal team reviews and quick stakeholder updates. They are not good enough for board presentations, investor materials, or published research without human editing.
Common issues I've seen:
- Narrative doesn't match the data. The agent will sometimes write "revenue increased significantly" when the chart shows a 2% change.
- Boilerplate recommendations. "Consider investing in underperforming regions" without specifics.
- Inconsistent formatting. Number formats, date formats, and decimal places vary across sections.
- Missing context. The agent doesn't know that Q3 was historically slow because of your industry's seasonal pattern.
Budget 30-60 minutes of human editing for every AI-generated report. That's still a massive time savings compared to building from scratch.
Putting It All Together: A Complete Workflow
Here's a realistic end-to-end workflow using open-source tools:
import pandas as pd
from pandasai import SmartDataframe
from langchain_experimental.agents import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
# Step 1: Load and explore with PandasAI
df = pd.read_csv("sales_2024.csv")
sdf = SmartDataframe(df, config={"llm": "openai"})
# Quick exploration
print(sdf.chat("Summarize this dataset: columns, row count, date range, and any missing values"))
print(sdf.chat("What are the main trends in the data?"))
# Step 2: Deep analysis with LangChain agent
llm = ChatOpenAI(model="gpt-4", temperature=0)
agent = create_pandas_dataframe_agent(llm, df, verbose=True)
analysis = agent.invoke({
"input": """Perform a complete analysis:
1. Revenue trends by month with YoY comparison
2. Top and bottom performing products (Pareto analysis)
3. Customer segmentation using RFM analysis
4. Statistical test: is there a significant difference in AOV between segments?
5. Forecast next quarter's revenue using simple linear extrapolation
Save all charts as HTML files in a 'charts/' directory."""
})
# Step 3: Generate report
report = agent.invoke({
"input": """Based on the analysis just completed, write a markdown report
saved as 'Q3_report.md'. Include embedded references to the chart files
in the charts/ directory. Write an executive summary suitable for a
non-technical VP of Sales."""
})
Limitations You Should Know About
Cost. Running GPT-4 on large DataFrames is expensive. A single complex analysis session can cost $1-5 in API calls. For routine analysis, use GPT-3.5-turbo or local models (Llama 3, Mistral) with Ollama.
Data size. Most tools struggle beyond ~100K rows. The data gets summarized or sampled before being sent to the LLM, which means you lose granularity. For large datasets, use Vanna.ai (which pushes computation to the database) or pre-aggregate your data.
Reproducibility. LLM outputs are non-deterministic. The same prompt can produce different code on different runs. For reproducible analysis, save the generated code and re-run it directly rather than re-prompting.
Privacy. Your data goes to OpenAI's API (or whichever LLM you use). For sensitive data, run local models via Ollama or use Azure OpenAI with data processing agreements.
The Bottom Line
AI agents for data analysis are genuinely useful—not as replacements for data analysts, but as force multipliers. They compress the exploration phase from hours to minutes, make statistical analysis accessible to non-specialists, and automate the tedious parts of report generation.
The tools that deliver the most value right now: PandasAI for quick exploration, Vanna.ai for database-backed querying, ChatGPT Code Interpreter for one-off analysis, and LangChain custom agents for reproducible pipelines.
Start with a simple workflow—upload a CSV, ask five questions, generate a chart—and iterate from there. The technology is good enough to be useful today and improving fast enough to be significantly better in six months.