Building a Knowledge Graph with ChatGPT and VoltAgent
Mei-Lin Zhang
# Building a Knowledge Graph with ChatGPT and VoltAgent ## Overview VoltAgent is an open‑source framework designed to help developers construct knowledge graphs using large language models such as Ch...
Building a Knowledge Graph with ChatGPT and VoltAgent
Overview
VoltAgent is an open‑source framework designed to help developers construct knowledge graphs using large language models such as ChatGPT. It targets data engineers, researchers, and AI‑application builders who need to extract entities and relations from unstructured text and store them in a graph database.
Key Features and Capabilities
- LLM‑driven extraction: Uses prompts sent to ChatGPT to identify nodes (entities) and edges (relations) in input documents.
- Schema‑guided mode: Allows users to define a custom ontology (node types, edge types) via a YAML file; the framework validates LLM output against this schema.
- Batch processing: Supports parallel processing of large corpora through asyncio workers.
- Graph export: Directly writes results to Neo4j, Amazon Neptune, or a simple NetworkX representation for prototyping.
- Human‑in‑the‑loop review: Provides a lightweight web UI (built with FastAPI and React) for inspecting and correcting extracted triples before commit.
Architecture and How It Works
VoltAgent follows a pipeline architecture:
- Ingestor – reads plain‑text, PDF, or HTML files and splits them into chunks.
- Extractor – for each chunk, constructs a prompt that instructs ChatGPT to return JSON with
entitiesandrelations. The prompt includes the user‑defined schema if present. - Validator – checks the JSON against the schema, discards malformed entries, and optionally asks the LLM to retry.
- Persister – maps validated triples to graph‑database commands (Cypher for Neo4j, Gremlin for Neptune) and executes them in a transaction.
- UI Service – exposes a REST API and a React frontend showing the current graph, allowing users to approve or edit triples.
The framework stores intermediate results in a local SQLite cache to enable resumable runs.
Real‑World Use Cases
- Academic literature mapping: A research team used VoltAgent to pull protein‑disease interactions from 10 k PubMed abstracts, producing a Neo4j graph that was later queried for drug‑repurposing hypotheses.
- Customer‑support knowledge base: A SaaS company ingested support tickets and product manuals to build an internal FAQ graph, enabling a ChatGPT‑powered agent to retrieve precise answers.
- Legal contract analysis: A law firm extracted clauses, parties, and obligations from PDF contracts, storing them in a Neptune graph to automate compliance checks.
Strengths and Limitations
Strengths
- Reduces manual effort in entity‑relation extraction by leveraging ChatGPT’s reasoning.
- Schema guidance improves output consistency compared to zero‑shot prompting.
- Modular design lets developers swap the LLM backend (e.g., replace ChatGPT with a local Llama model) without changing the pipeline code.
Limitations
- Dependence on external API (ChatGPT) introduces latency and cost; processing 1 M tokens can exceed $20 at current rates.
- The extractor’s reliability varies with prompt quality; ambiguous domains may require extensive prompt engineering.
- As of version 0.3.2 (latest release on PyPI), the framework lacks built‑in support for temporal graphs or confidence scoring on extracted triples.
Comparison to Alternatives
| Feature | VoltAgent (0.3.2) | LangChain + LLMGraphTransformer | LlamaIndex Knowledge Graph |
|---|---|---|---|
| LLM agnostic | Yes (plugin) | Yes | Yes |
| Schema validation | Built‑in YAML | Requires custom validators | None |
| Built‑in graph export | Neo4j, Neptune, NetworkX | Neo4j only (via Neo4jWrapper) | NetworkX only |
| Human‑review UI | Included (FastAPI+React) | None | None |
| Async batch processing | Yes | Limited | No |
| License | MIT | MIT | Apache‑2.0 |
VoltAgent stands out when a review step and multi‑target export are needed; for pure prototyping, LangChain’s transformer may be lighter.
Getting Started Guide
- Install the package (requires Python 3.9+):
pip install voltagent==0.3.2 - Set your OpenAI API key:
export OPENAI_API_KEY=sk-... - Create a schema file
schema.yaml:entities: - Protein - Disease relations: - treats - interacts_with - Run the pipeline on a folder of txt files:
The command will prompt for Neo4j credentials and begin extraction.voltagent ingest --input ./papers --schema schema.yaml --output neo4j://localhost:7687 - Launch the review UI:
Openvoltagent ui --port 8000http://localhost:8000to inspect and correct triples.
For detailed configuration options, consult the project’s README: https://github.com/voltagent/voltagent
Further Reading
- LangChain documentation on graph transformations: https://python.langchain.com/docs/modules/graphs/
- OpenAI Assistants API guide: https://platform.openai.com/docs/assistants/overview
- Neo4j Cypher manual: https://neo4j.com/docs/cypher-manual/current/