Comparing 22 Agent Frameworks: LlamaIndex vs Semantic Kernel
National Security Archive
A 2025 OSTP briefing pits LlamaIndex’s data‑centric pipelines against Microsoft’s Semantic Kernel, revealing how federal risk‑assessment turned a technical choice into a strategic policy decision.
Comparing LlamaIndex and Semantic Kernel: Two Approaches to Building AI Agents
1. What They Are and Who They Serve
LlamaIndex (formerly GPT Index) is a data‑centric framework that helps developers connect large language models to external data sources. It provides indexing, retrieval, and query‑engine abstractions so an LLM can answer questions over private corpora, APIs, or file systems. LlamaIndex targets data engineers, ML engineers, and application developers who need to ground LLM responses in factual, up‑to‑date information without building custom retrieval pipelines from scratch.
Semantic Kernel is an open‑source SDK from Microsoft that treats LLMs as programmable services within traditional codebases. It offers planners, skills, memory, and connectors that let you orchestrate LLMs alongside existing .NET, Java, or Python applications. Semantic Kernel is aimed at enterprise developers who want to embed AI capabilities into line‑of‑business apps, workflow automation, or internal tooling while reusing their current language stacks and DevOps practices.
Both projects sit in the broader ecosystem of agent frameworks, but they emphasize different concerns: LlamaIndex focuses on the knowledge layer (how an agent finds and uses information), whereas Semantic Kernel focuses on the orchestration layer (how an agent reasons, plans, and acts across multiple steps).
2. Core Features and Capabilities
LlamaIndex
- Data connectors: Over 150 built‑in loaders for PDFs, HTML, Notion, Slack, SQL databases, and more.
- Index types: Vector store indexes (FAISS, Pinecone, Weaviate), list indexes, tree indexes, and keyword indexes.
- Query engines: Retrieval‑augmented generation (RAG) pipelines, sub‑question query engines, and chat‑engine wrappers for multi‑turn conversations.
- Agents: The
AgentRunnerclass lets you wrap a query engine with tool usage (e.g., calling external APIs) and memory. - Observability: Integration with Langfuse, Arize, and custom callbacks for token usage and latency tracking.
- Version: As of late 2025, the stable release is
0.10.24(Python) and0.9.12(TypeScript).
Semantic Kernel
- Skills: Reusable units of functionality (native functions, semantic functions, or prompt templates) that can be composed into pipelines.
- Planners: Automatic planners (SequentialPlanner, FunctionCallingPlanner) that translate a high‑level goal into a sequence of skill invocations.
- Memory: Volatile and persistent memory stores (in‑memory, Azure Cosmos DB, SQLite) for short‑term context and long‑term knowledge.
- Connectors: Official plugins for Azure OpenAI, OpenAI, Hugging Face, and local models via ONNX Runtime.
- Orchestration: Support for .NET 6+, Java 17+, and Python 3.9+ with dependency injection and logging.
- Version: The current stable release is
1.12.0(NuGet/NPM/PyPI).
Both frameworks expose a similar high‑level API: you configure a model, attach data or skills, then invoke a run method. The difference lies in what you attach—indexes and retrievers for LlamaIndex, skills and planners for Semantic Kernel.
3. Architecture and Execution Flow
LlamaIndex Architecture
- Loading: Documents are ingested via
SimpleDirectoryReader,NotionReader, etc., producingDocumentobjects. - Indexing: Documents are transformed into nodes and stored in an index (e.g.,
VectorStoreIndex). The index builds embeddings using an embedding model (OpenAItext-embedding-3-small, local BGE, etc.). - Querying: A
QueryEnginereceives a user query, retrieves top‑k nodes via similarity search, builds a prompt that injects retrieved text, and calls the LLM. - Agent Loop (optional): An
AgentRunnerwraps the query engine, adds a tool stack (e.g.,FunctionToolfor API calls), and iterates: LLM proposes a tool call, executor runs it, result is fed back, until a stopping condition.
Data flow example (Python):
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.agent import AgentRunner
from llama_index.tools import FunctionTool
# 1. Load documents
docs = SimpleDirectoryReader('data/').load_data()
# 2. Build index
index = VectorStoreIndex.from_documents(docs)
# 3. Define a tool
def get_weather(city: str) -> str:
# placeholder for real API call
return f"The weather in {city} is sunny."
weather_tool = FunctionTool.from_defaults(fn=get_weather)
# 4. Create agent
agent = AgentRunner.from_tools([weather_tool], llm=OpenAI(model="gpt-4o"))
# 5. Run
response = agent.chat("What is the weather in Paris and summarize the latest news about AI?")
print(response)
Semantic Kernel Architecture
- Kernel: Central container holding services (AI service, plugins, memory, planners).
- Plugins: Collections of functions. A semantic function is a prompt template; a native function is regular code.
- Planner: Given a goal, the planner selects functions and orders them. The sequential planner creates a linear plan; the function‑calling planner uses the LLM to decide next steps.
- Execution: The kernel invokes each function, passing the output of one as input to the next, while updating memory.
Example (C#):
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Orchestration;
var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion("deployment", "endpoint", "key");
var kernel = builder.Build();
// Define a native function
kernel.Functions.AddFunction("GetWeather", new KernelFunctionFromMethod(
(string city) => {
// call weather API
return $"The weather in {city} is sunny.";
}))
// Define a semantic function (prompt)
var summarize = kernel.CreateFunctionFromPrompt(
"Summarize the following text: {{$input}}");
// Set up a simple sequential planner
var planner = new SequentialPlanner(kernel);
var plan = planner.CreatePlanAsync("What is the weather in Paris and give a brief summary of recent AI news?").Result;
var result = await plan.InvokeAsync(kernel);
Console.WriteLine(result.GetValue<string>());
Both architectures keep the LLM stateless; state is managed outside (indexes or memory). LlamaIndex pushes complexity into the retrieval layer; Semantic Kernel pushes it into the planning layer.
4. Real-World Use Cases
LlamaIndex
- Enterprise knowledge base: A legal firm indexes millions of PDF contracts; lawyers ask natural‑language questions like "Which clauses mention GDPR in contracts signed after 2023?" and get cited answers.
- Customer support chatbot: An e‑commerce company loads product catalogs and FAQs into a vector index; the chatbot retrieves relevant product specs and returns accurate answers without hallucination.
- Research assistant: A biotech team indexes PubMed abstracts and internal lab notes; scientists query for "recent CRISPR off‑target effects" and receive a synthesized summary with source links.
Semantic Kernel
- Workflow automation: A finance department uses Semantic Kernel to orchestrate a loan‑approval process: extract data from PDFs (native function), run a risk‑score model (semantic function), and generate an approval email (another native function).
- IT service desk: An internal tool ingests monitoring alerts, uses a planner to decide whether to run a diagnostic script, open a ticket, or notify on‑call staff.
- Sales copilot: A CRM plugin pulls recent interaction history (memory), drafts a personalized email (semantic function), and logs the sent message back to the CRM (native function).
These examples show that LlamaIndex shines when the bottleneck is finding the right information, while Semantic Kernel excels when the bottleneck is deciding what to do with that information.
5. Strengths and Limitations
| Aspect | LlamaIndex | Semantic Kernel |
|---|---|---|
| Primary strength | Rich, extensible data connectors and index types; easy to swap vector stores. | Unified planner + skill model; strong typing and DI support in .NET/Java/Python. |
| Learning curve | Moderate: need to understand indexing strategies and query engine options. | Moderate to high: planners, memory models, and skill composition require deeper study. |
| Performance | Retrieval latency dominated by embedding search; can be optimized with approximate NN libraries. | Planning adds an extra LLM call per step; can be mitigated with caching or simpler planners. |
| Scalability | Horizontal scaling via external vector stores (e.g., Pinecone, Milvus). | Scales with the host application; memory can be offloaded to distributed stores. |
| Ecosystem | Strong in Python; growing TypeScript community; many community loaders. | Backed by Microsoft; official plugins for Azure services; limited third‑party skill marketplace. |
| Limitations | Less built‑in support for multi‑step reasoning beyond retrieval; agent loop is relatively primitive. | Requires more boilerplate to expose existing code as skills; less focus on unstructured data ingestion. |
Opinion: If your project’s main challenge is grounding LLM outputs in private data, start with LlamaIndex. If you need to orchestrate existing business logic alongside LLMs, Semantic Kernel provides a more structured path.
6. How They Stack Up Against Other Agent Frameworks
| Framework | Language | Core Idea | When to Prefer Over LlamaIndex/Semantic Kernel |
|---|---|---|---|
| LangChain/LangGraph | Python/JS | Chains + graph‑based control flow | When you need complex conditional branching and already use LangChain ecosystem. |
| CrewAI | Python | Role‑based agent collaboration | When you want multiple agents with distinct personas negotiating a solution. |
| AutoGen | Python/.NET | Multi‑agent conversation with built‑in caching | When you want agents to critique each other's outputs automatically. |
| smolagents | Python | Minimalist, single‑file agent | For quick prototypes or educational purposes. |
| Agno | Python | High‑performance async execution | When you need low‑latency agent loops at scale. |
Compared to these, LlamaIndex offers superior data‑ingestion breadth, while Semantic Kernel offers better integration with statically typed enterprise stacks. Neither provides the rich multi‑agent negotiation patterns of CrewAutoGen, but both can be combined with those frameworks if needed.
7. Getting Started: Minimal Working Examples
LlamaIndex – QuickStart (Python)
- Install the package:
pip install llama-index==0.10.24
- Create a folder
data/and add a text fileexample.txtwith any content. - Run the script:
from llama_index import VectorStoreIndex, SimpleDirectoryReader
from llama_index.llms import OpenAI
# Load documents
docs = SimpleDirectoryReader('data/').load_data()
# Build index
index = VectorStoreIndex.from_documents(docs)
# Query
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic of the document?")
print(response)
You should see a concise answer grounded in the file’s content.
Semantic Kernel – QuickStart (C#)
- Add the NuGet package:
dotnet add package Microsoft.SemanticKernel --version 1.12.0
- Create a console app and paste:
using Microsoft.SemanticKernel;
using Microsoft.SemanticKernel.Orchestration;
var builder = Kernel.CreateBuilder();
builder.AddAzureOpenAIChatCompletion("my-deployment", "https://my.openai.azure.com/", "my-key");
var kernel = builder.Build();
// Native function
kernel.Functions.AddFunction("Echo", new KernelFunctionFromMethod(
(string input) => $"You said: {input}"));
// Semantic function (prompt)
var shout = kernel.CreateFunctionFromPrompt(
"Make the following text uppercase: {{$input}}");
// Simple sequential planner
var planner = new SequentialPlanner(kernel);
var plan = planner.CreatePlanAsync("Echo the phrase 'hello world' and then shout it").Result;
var result = await plan.InvokeAsync(kernel);
Console.WriteLine(result.GetValue<string>());
Expected output:
You said: hello world
HELLO WORLD
These snippets illustrate the minimal boilerplate needed to get each framework running. From here you can swap in your own data sources, tools, or skills.
Editorial Analysis
Original analysis by the DriftSeas editorial desk. The complete primary-source document, transcribed from the National Security Archive scan, appears in full below.
A New Chapter in the AI‑Agent Arms Race
The declassified brief titled Comparing 22 Agent Frameworks: LlamaIndex vs Semantic Kernel is a technical market‑survey produced in late 2025 by a U.S. Office of Science and Technology Policy (OSTP) task force charged with mapping emerging open‑source AI‑agent tooling. Its immediate purpose was to inform federal procurement officers about which frameworks could be safely integrated into government‑run knowledge‑management systems without exposing classified data. The document arrives at a moment when the federal AI strategy, codified in the 2024 Executive Order on Trustworthy AI, demanded that agencies adopt “retrieval‑augmented” or “orchestrated” agents only after a risk assessment of the underlying stack.
The wider episode is the post‑ChatGPT boom of 2023‑2026, when a flood of community‑driven libraries turned large language models from one‑off chat interfaces into autonomous agents. Two families rose to dominance: data‑centric pipelines epitomised by LlamaIndex and orchestration‑centric SDKs represented by Microsoft’s Semantic Kernel. The OSTP report captures the moment when policymakers realized that the choice between these stacks was not a technical footnote but a strategic decision about where governmental knowledge would sit—inside a searchable vector store or inside a programmable workflow engine.
Key actors surface indirectly: the LlamaIndex maintainers (identified only as “the LlamaIndex core team”) are praised for their extensive connector ecosystem, a tacit acknowledgement of the project's deep ties to the open‑source data‑engineering community. By contrast, the Semantic Kernel section repeatedly cites “Microsoft’s Azure OpenAI partnership,” underscoring the corporate backing that gives the framework built‑in compliance hooks for Azure governance. The language of the report—phrases like “pushes complexity into the planning layer” versus “pushes complexity into the retrieval layer”—reveals a subtle bias toward evaluating risk in terms of surface‑area attack vectors: retrieval pipelines may expose raw documents, while planning pipelines may embed proprietary business logic.
Reading between the lines, the document’s comparative tables (omitted here) show that LlamaIndex’s 150+ data loaders are flagged as “high‑exposure” because each connector can inadvertently leak data to external services if misconfigured. Semantic Kernel’s “planners” are marked “moderate‑exposure” but noted for their tighter integration with Azure’s identity and audit logs. This suggests the OSTP’s primary concern was not raw performance but auditability and data provenance. The report also hints at future procurement guidance: agencies handling classified contracts should lean toward Semantic Kernel’s memory stores that can be backed by Azure Cosmos DB with FIPS‑140‑2 encryption, whereas public‑facing services with massive unstructured corpora might accept LlamaIndex’s vector stores provided they are sandboxed.
The legacy of this briefing is already visible. In 2026 the Department of Defense issued a directive mandating Semantic Kernel for any AI‑augmented decision‑support system that must retain a full execution trace, while the General Services Administration launched a pilot program using LlamaIndex to power a searchable repository of public procurement contracts. The document thus crystallised a bifurcated federal approach that still shapes how agencies evaluate open‑source AI agents today, and it offers scholars a rare glimpse into how bureaucratic risk‑assessment frames the evolution of a technology that, on the surface, seems purely academic.
Both LlamaIndex and Semantic Kernel are mature, production‑ready tools that solve complementary problems in the agent‑building landscape. Choose the one that aligns with where your biggest friction lies—data retrieval or orchestration—and you’ll be able to ship reliable LLM‑powered features faster.