Building a Quant Trading Bot with FinGPT and Semantic Kernel
Emma Liu
# Building a Quant Trading Bot with FinGPT and Semantic Kernel ## Overview FinGPT is an open‑source large language model family fine‑tuned on financial text, news, and time‑series data. It excels at ...
Building a Quant Trading Bot with FinGPT and Semantic Kernel
Overview
FinGPT is an open‑source large language model family fine‑tuned on financial text, news, and time‑series data. It excels at tasks such as sentiment extraction from earnings calls, forecasting price moves, and generating trading signals. Semantic Kernel is Microsoft’s SDK for orchestrating LLMs with plugins, memory, and planners. Together they let developers create agents that can reason about market data, execute trades, and adapt strategies without hard‑coding every rule.
This guide targets quantitative researchers, data engineers, and algo‑trading developers who already have Python experience and access to a brokerage API or historical data feed. It assumes you can run a GPU‑enabled environment (e.g., an AWS g5.xlarge instance) and are comfortable installing Python packages via pip or conda.
Key Features and Capabilities
- FinGPT‑v2 (the latest released checkpoint) provides ~7B parameters and has been trained on a mix of Bloomberg news, SEC filings, and high‑frequency tick data up to Q4‑2024.
- Semantic Kernel supplies a plug‑in model where each capability (data fetch, indicator calculation, order execution) is a native Python function that the LLM can call via function calling.
- The agent can maintain a short‑term memory of recent market events and a long‑term memory of past trade outcomes, enabling simple reinforcement‑learning‑style adjustments.
- Built‑in planners (SequentialPlanner, FunctionCallingStepwisePlanner) let the model decompose a high‑level goal like "achieve 5% monthly return with max drawdown <10%" into concrete steps: data collection, signal generation, risk check, order placement.
- The framework supports both synchronous backtesting (using historical CSVs) and live paper‑trading modes via connectors to Alpaca, Interactive Brokers, or CCXT for crypto.
Architecture and How It Works
The bot consists of four layers:
- Data Layer – a set of Semantic Kernel plugins that pull price bars, order‑book snapshots, and news headlines. Example plugin for fetching 1‑minute bars from Alpaca:
import semantic_kernel as sk
from semantic_kernel.functions import kernel_function
class MarketDataPlugin:
@kernel_function(description="Get recent OHLCV bars for a symbol")
def get_bars(self, symbol: str, limit: int = 100) -> str:
# Assume alpaca_trade_api is installed and configured
import alpaca_trade_api as tradeapi
api = tradeapi.REST()
bars = api.get_bars(symbol, tradeapi.TimeFrame.Minute, limit=limit).df
return bars.to_json()
kernel = sk.Kernel()
kernel.add_plugin(MarketDataPlugin(), plugin_name="market")
- Reasoning Layer – FinGPT is loaded via Hugging Face Transformers with 4‑bit quantization to fit a single GPU:
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model_name = "FinGPT/fingpt-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map="auto",
load_in_4bit=True
)
The model receives a prompt that includes the latest market data (as JSON), a natural‑language goal, and a list of available plugin functions. FinGPT outputs a structured JSON action, e.g., {"action":"call_plugin","plugin":"market","function":"get_bars","args":{"symbol":"AAPL","limit":20}}. 3. Planning & Execution Layer – Semantic Kernel’s FunctionCallingStepwisePlanner parses the model’s JSON, invokes the corresponding plugin, and feeds the result back into the prompt for the next iteration. This loop continues until the planner returns a final action such as placing an order. 4. Action Layer – a thin wrapper around the brokerage API that translates Semantic Kernel’s output into real orders. Example order plugin:
class OrderPlugin:
@kernel_function(description="Submit a market order")
def create_order(self, symbol: str, qty: int, side: str) -> str:
api = tradeapi.REST()
order = api.submit_order(
symbol=symbol,
qty=qty,
side=side,
type='market',
time_in_force='day'
)
return order.id
The agent’s state (recent prices, open positions, performance metrics) is stored in a simple SQLite database that Semantic Kernel’s memory plugin can read and write.
Real‑World Use Cases
- Equities Mean‑Reversion: The agent scans the S&P 500 for stocks whose 20‑minute RSI falls below 30, uses FinGPT to assess recent news sentiment, and places a long position if sentiment is neutral or positive.
- Crypto Arbitrage: By pulling order‑book depth from Binance and Coinbase via CCXT plugins, the model identifies cross‑exchange price discrepancies >0.15% and executes simultaneous buy/sell orders, hedging with futures to mitigate execution risk.
- Event‑Driven Trading: Before an earnings release, the agent ingests the last 8‑K filing, asks FinGPT to predict the likely price move magnitude, and sizes a straddle option position accordingly.
- Portfolio Rebalancing: A weekly goal prompt tells the model to bring sector weights back to target allocations; the planner calls plugins that compute current weights, generate trade lists, and execute them while respecting a max‑turnover constraint.
Strengths and Limitations
Strengths
- The LLM can ingest unstructured text (news, filings) alongside numeric data, something traditional rule‑based bots struggle with.
- Semantic Kernel’s plugin abstraction lets you swap data sources or execution venues without rewriting the agent logic.
- FinGPT’s finance‑specific pretraining reduces hallucination rates on financial queries compared to generic LLMs.
- The iterative planning loop provides a form of self‑correction: if a plugin returns an error, the model can retry with altered parameters.
Limitations
- Latency: each reasoning step involves a forward pass through a 7B model; end‑to‑end decision latency averages 1.2 seconds on an A10G GPU, which may be too slow for sub‑second scalping.
- Model size: even with 4‑bit quantization the model consumes ~5 GB VRAM, limiting deployment to modest‑cost GPUs.
- Explainability: while the agent’s actions are logged, tracing why FinGPT chose a particular plugin call requires inspecting raw prompt tokens, which is less transparent than a deterministic rule set.
- Dependency on plugin correctness: if a data‑plugin returns stale or malformed data, the LLM may propagate the error; robust validation layers are still needed.
Comparison with Alternatives
| Feature | FinGPT + Semantic Kernel | LangChain + FinBERT | AutoGen + GPT‑4‑Turbo | CrewAI + Llama‑3‑70B |
|---|---|---|---|---|
| Domain‑specific pretraining | Yes (FinGPT) | Partial (FinBERT) | No (generic GPT‑4) | No (Llama‑3) |
| Plugin‑based tool use | Native via Semantic Kernel | Requires custom Tool agents | Built‑in function calling | Manual tool wrapping |
| Memory handling | Short‑term + long‑term SQLite plugin | Conversation memory only | Conversation memory + external store | Shared blackboard |
| Planner options | Sequential, Stepwise | LLM‑driven chains | AutoGen’s group chat | Role‑based conversation |
| Typical latency (per step) | 1.2 s (7B 4‑bit) | 0.8 s (FinBERT) | 2.5 s (GPT‑4‑Turbo) | 3.0 s (Llama‑3‑70B) |
| Deployment cost (GPU‑hour) | Low‑moderate | Low | High (API) | High (large model) |
| Ease of adding new data source | Simple plugin | Moderate (custom chain) | Moderate (function definition) | Moderate (agent definition) |
The table shows that FinGPT + Semantic Kernel offers a strong trade‑off between financial domain awareness and operational flexibility, while remaining cheaper to run than API‑heavy alternatives.
Getting Started Guide
- Environment setup
# Create a conda environment with Python 3.11 conda create -n fingpt-sk python=3.11 -y conda activate fingpt-sk # Install core packages pip install torch==2.4.0 transformers==4.41.0 semantic_kernel==0.9.0 pip install alpaca-trade-api ccxt pandas sqlite3 - Obtain FinGPT weights
The model is hosted on Hugging Face under
FinGPT/fingpt-v2. Accept the license and generate an access token, then run:huggingface-cli login # enter your token git lfs install git clone https://huggingface.co/FinGPT/fingpt-v2 - Create a plugin for market data (save as
market_plugin.py):import semantic_kernel as sk from semantic_kernel.functions import kernel_function import alpaca_trade_api as tradeapi import pandas as pd class MarketDataPlugin: @kernel_function(description="Fetch recent minute bars for a symbol") def get_bars(self, symbol: str, limit: int = 200) -> str: api = tradeapi.REST() bars = api.get_bars(symbol, tradeapi.TimeFrame.Minute, limit=limit).df return bars.to_json() kernel = sk.Kernel() kernel.add_plugin(MarketDataPlugin(), plugin_name="market") - Load FinGPT and wrap it as a Semantic Kernel skill (
fingpt_skill.py):from transformers import AutoModelForCausalLM, AutoTokenizer import torch, json, re class FinGPTLLM: def __init__(self): self.tokenizer = AutoTokenizer.from_pretrained("FinGPT/fingpt-v2") self.model = AutoModelForCausalLM.from_pretrained( "FinGPT/fingpt-v2", torch_dtype=torch.float16, device_map="auto", load_in_4bit=True ) def generate(self, prompt: str, max_new_tokens: int = 150) -> str: inputs = self.tokenizer(prompt, return_tensors="pt").to(self.model.device) output = self.model.generate(**inputs, max_new_tokens=max_new_tokens) text = self.tokenizer.decode(output[0], skip_special_tokens=True) # Extract the first JSON‑like block match = re.search(r'\{.*\}', text, re.DOTALL) return match.group(0) if match else "{}" # Register as a Semantic Kernel function from semantic_kernel.functions import kernel_function class FinGPTSkill: def __init__(self): self.llm = FinGPTLLM() @kernel_function(description="Reason over market data and return a trading action") def reason(self, context: str) -> str: prompt = f"""You are a quant trading agent. Goal: achieve positive returns with controlled risk. Available plugins: market.get_bars, order.create_order Context (JSON): {context} Respond with a single JSON object describing the next plugin call, e.g., {{"action":"call_plugin","plugin":"market","function":"get_bars",{"symbol":"AAPL","limit":50}}} If you are ready to place an order, use the order plugin. """ return self.llm.generate(prompt) skill = FinGPTSkill() kernel.add_plugin(skill, plugin_name="reasoning") - Define the planner and run a simple loop (
run_bot.py):import asyncio import semantic_kernel as sk from semantic_kernel.planners import FunctionCallingStepwisePlanner async def main(): kernel = sk.Kernel() # plugins already added in the imported modules planner = FunctionCallingStepwisePlanner(kernel) # Initial context: empty, we will fill it each iteration context = "{}" for _ in range(5): # limit iterations for demo # Ask the reasoning plugin for next step result = await kernel.invoke( plugin_name="reasoning", function_name="reason", input=context ) action_json = result.value print("LLM decided:", action_json) # Parse and execute the action via the planner plan = await planner.create_plan(goal="Execute the decided action", available_plugins=kernel.plugins) # The planner will invoke the appropriate plugin (market or order) plan_result = await plan.invoke() print("Plan result:", plan_result.value) # Update context with the newest observation # For simplicity we just re‑fetch the latest bars for AAPL bars = await kernel.invoke( plugin_name="market", function_name="get_bars", input={"symbol":"AAPL","limit":10} ) context = bars.value await asyncio.sleep(1) # throttle if __name__ == "__main__": asyncio.run(main()) - Run the bot
You should see the LLM request market data, receive a JSON bar series, and then either request more data or propose an order. Replace the dummy loop with your own risk checks, position sizing, and execution logic to move from paper‑trading to live deployment.python run_bot.py
Next Steps
- Integrate a risk‑management plugin that computes VaR or max‑drawdown limits and feeds it back into the reasoning prompt.
- Replace the simple SQLite memory with a vector store (e.g., FAISS) to enable similarity search over past trade narratives.
- Experiment with FinGPT‑v3 (expected release Q2‑2026) which adds reinforcement‑learning fine‑tuning on P&L curves.
- Deploy the agent as a Kubernetes pod with GPU autoscaling to handle bursts of market volatility.
By combining FinGPT’s finance‑tuned language understanding with Semantic Kernel’s plugin‑driven orchestration, you obtain a flexible foundation for building quant trading bots that can ingest both numbers and news, adapt their plans, and execute trades through a unified interface.