Home

4. Function Calling & MCP

Li

Li Wei

February 22, 202611 min read

Title: 4. Function Calling & MCP

Function Calling

The Problem to Solve

Traditional chat LLMs can only talk; they lack tool‑calling capability, which means they:

  • Cannot perceive the environment: they cannot interact with external data sources, such as querying the web via API, reading a user’s local files, accessing remote databases, etc.
  • Cannot change the environment: they cannot actually perform tasks for the user, such as running code, sending emails, uploading assignments, and so on.

How to Solve It

Backend + LLM

Traditional Approach

Workflow

Issues

  • Whether to call a tool and which tool to call is decided by the backend, leading to complex logic and easy mis‑judgment.
  • The AI is so smart—why not let it help decide?
  • The backend builds the tool‑call parameters, which is difficult.
  • The AI is so smart—why not let it generate the parameters?

Function Calling Approach

What Is Function Calling?

Broadly, Function Calling refers to any technique that enables a large model to invoke external tools: you give the model a list of available functions and their descriptions, and during the conversation the model intelligently decides whether a function is needed, automatically generates the required arguments, and finally returns a textual request that follows a predefined function‑call format.

Narrowly, Function Calling is a capability that model providers have built into the model itself and exposed via the API. It was first introduced by OpenAI:

  • Model‑level: providers must specially fine‑tune the model (e.g., supervised fine‑tuning, reinforcement learning) so that it can correctly select the appropriate function from context and generate valid arguments.
  • API‑level: providers must expose an extra parameter for Function Calling (e.g., the functions parameter in the GPT API).

Prompt‑Based Function Calling

Workflow

Problems

  • Output format is unstable. Extra natural‑language text may appear in the call instruction.

  • Hallucinations are common. The model may fabricate nonexistent function names or arguments.

    Can the model provider fine‑tune or RL‑train the model to improve its performance in this area?

  • High developer dependency. Function descriptions, call‑instruction format, and prompt logic are all designed by the developer.

    Can the provider dictate the function description and call‑instruction format? Could the “explanations and rules” in the system prompt be handled by the provider?

  • Context becomes verbose, consuming many tokens. To ensure correct calling logic, a large amount of description and rules often need to be placed in the system prompt.

API‑Based Function Calling

Workflow

  • User asks a question
    The user asks in natural language, e.g., “What’s the weather in Guangzhou today? Is it good to go out?”

  • Backend makes the first API request to obtain a function‑call instruction
    The backend sends the user’s raw input, function descriptions, and any other context to the LLM API. Function descriptions include the function name, purpose, and parameter schema.

    • Model generates a call instruction
      The model decides whether a function is needed, selects the appropriate one, and automatically creates a structured call instruction (function name + arguments), e.g.:

    • Backend parses the instruction and executes the real function
      After receiving the model’s call instruction, the backend parses it, extracts the function name and arguments, invokes the corresponding method (e.g., a weather‑lookup function), and obtains the result. An example call instruction might look like:

  • Backend makes a second API request, sending the function result plus other context to generate the final reply
    The backend sends the function’s output together with the original user input and any other context back to the model. The model now has enough information to answer the question without further calls, and it replies, for example: “Guangzhou is 35 °C with heavy rain today; it’s best to stay indoors.”

Issues

  • When adapting to different LLMs, the backend code becomes heavily duplicated.
  • The choice of models that support this feature is limited.

MCP Protocol

The Problem to Solve

  • Redundant development when integrating tools
    When an AI application wants to use a new tool created by someone else, developers must copy the tool’s code and its function description. Adding several tools means copying several times.

  • Tool reuse is difficult
    Environment differences may make the copied code fail to run; many companies do not provide source code that can be copied; cross‑language code that is copied is often unusable.

How to Solve It

If you were to solve these problems from scratch (assuming no MCP protocol exists yet), what would you do?

Starting from the Problem

→ Redundant development and reuse difficulties stem from copying code. How can we use others’ methods without copying their code?

  • Idea 1: “Import‑style” integration – AI developers pull the tool’s code into their own project and call it locally.

    • Feasible? ⚠️ Cross‑language calls cannot be solved directly.
    • Is the cross‑language issue unsolvable? → Could we launch a separate process that runs the tool in its native language and communicate via IPC (pipes, sockets) to get standardized results?
    • If so, ✅ this becomes a “local‑service‑style” integration.
  • Idea 2: Remote‑service‑style integration – Tool developers deploy the tool as a standalone service with a standardized API. AI developers only need to send parameters and parse the response, without caring about the implementation language, runtime, or internal logic.

    • Feasible? ✅

Starting from the Goal

→ What is the ideal way for developers to integrate and reuse tools?

  • Developers should be able to add a single configuration entry (e.g., a tool’s unique identifier or endpoint) and instantly gain access to their own or others’ tools.

→ What is the current non‑ideal state?

  • For each new tool, developers must perform two manual adaptations in the pipeline:
    1. Add the tool’s description.
    2. Add the tool’s code.

→ What conditions must be met to move from the non‑ideal to the ideal state?

  • “Configuration replaces manual adaptation” requires:
    1. After adding a configuration, the AI‑app backend must automatically fetch the tool’s description.
    2. After adding a configuration, the backend must automatically locate the tool’s entry point and execute the call.

From Problem to Technical Requirements

→ Local‑service scenario

  • How can any AI application, given a single standardized configuration,
    • Pull any tool’s package locally, start a process that runs the tool as a service (provided the tool author publishes the package), and
    • Automatically obtain the tool’s description and perform calls via local IPC?

→ Remote‑service scenario

  • How can any AI application, given a single standardized configuration,
    • Access any tool’s remote service (provided the tool author exposes one), and
    • Automatically obtain the tool’s description and perform calls via remote service invocation?

Technical Solution Sketch

  • Tools and AI applications must be decoupled.

  • Interaction between tools and AI applications must be standardized.

    • A unified communication protocol is needed (local IPC protocol / remote service protocol).
    • A unified interface definition is needed (which endpoints exist, what parameters they require).
    • A unified data‑exchange format is needed (request/response JSON schema, etc.).
    • The configuration format for adding a tool must be standardized.
    • Every tool service must expose a standardized access method so that it can be loaded via configuration alone.
    • Every AI application must implement a standardized tool‑loading‑and‑calling logic to support configuration‑driven loading.

System Architecture Design

What Is the MCP Protocol?

  • Origin:
    Proposed in November 2024 by Anthropic (a U.S. AI startup), see the official documentation.

  • Definition:

MCP is an open protocol that standardizes how applications provide context to large language models (LLMs). Think of MCP like a USB‑C port for AI applications. Just as USB‑C provides a standardized way to connect your devices to various peripherals and accessories, MCP provides a standardized way to connect AI models to different data sources and tools. MCP enables you to build agents and complex workflows on top of LLMs and connects your models with the world. [1]

MCP is an open protocol that standardizes how applications supply context to large language models (LLMs). You can think of MCP as the USB‑C port for AI apps—just as USB‑C offers a universal way to attach peripherals, MCP offers a universal way to attach data sources and tools to AI models. With MCP you can construct agents and sophisticated workflows atop LLMs and link your models to the external world.

How to Understand It:

  • Application: A concrete product that embeds an LLM—e.g., any online chat site for a major model, IDEs that embed a model (such as Claude Desktop), various Agents (Cursor is an example), and any other software that uses a model.
  • Context: All information the model can access when making decisions, including the current user input, conversation history, external tool information, external data‑source information, prompt information, etc. (here we focus on tools).

How does this differ from a traditional API?

Core MCP Architecture

MCP follows a client‑server architecture where an MCP host — an AI application like Claude Code or Claude Desktop — establishes connections to one or more MCP servers. The MCP host accomplishes this by creating one MCP client for each MCP server. Each MCP client maintains a dedicated one‑to‑one connection with its corresponding MCP server.
The key participants in the MCP architecture are:
MCP Host: The AI application that coordinates and manages one or multiple MCP clients
MCP Client: A component that maintains a connection to an MCP server and obtains context from an MCP server for the MCP host to use
MCP Server: A program that provides context to MCP clients
For example: Visual Studio Code acts as an MCP host. When Visual Studio Code establishes a connection to an MCP server, such as the Sentry MCP server, the Visual Studio Code runtime instantiates an MCP client object that maintains the connection to the Sentry MCP server. When Visual Studio Code subsequently connects to another MCP server, such as the local filesystem server, the Visual Studio Code runtime instantiates an additional MCP client object to maintain this connection, hence maintaining a one‑to‑one relationship of MCP clients to MCP servers.
Note that MCP server refers to the program that serves context data, regardless of where it runs. MCP servers can execute locally or remotely. For example, when Claude Desktop launches the filesystem server, the server runs locally on the same machine because it uses the STDIO transport. This is commonly referred to as a “local” MCP server. The official Sentry MCP server runs on the Sentry platform, and uses the Streamable HTTP transport. This is commonly referred to as a “remote” MCP server. [2]

MCP follows a client‑server model. An MCP Host (e.g., Claude Code https://www.anthropic.com/claude-code or Claude Desktop https://www.claude.ai/download) creates an MCP Client for each MCP Server it wants to talk to. Each client holds a dedicated one‑to‑one connection with its server.

  • MCP Host: The AI application that coordinates one or more MCP Clients.
  • MCP Client: Maintains a connection to an MCP Server and fetches context for the Host.
  • MCP Server: Supplies context data to Clients (can run locally or remotely).

Example: Visual Studio Code acts as an MCP Host. When it connects to the Sentry MCP Server https://docs.sentry.io/product/sentry-mcp/, VS Code creates an MCP Client that talks to that server. If it later connects to a local filesystem server https://github.com/modelcontextprotocol/servers/tree/main/src/filesystem, it creates a second client, preserving a one‑to‑one mapping between clients and servers.

MCP Transport Protocol

MCP supports two transport mechanisms:
Stdio transport: Uses standard input/output streams for direct process communication between local processes on the same machine, providing optimal performance with no network overhead.
Streamable HTTP transport: Uses HTTP POST for client‑to‑server messages with optional Server‑Sent Events for streaming capabilities. This transport enables remote server communication and supports standard HTTP authentication methods including bearer tokens, API keys, and custom headers. MCP recommends using OAuth to obtain authentication tokens.
The transport layer abstracts communication details from the protocol layer, enabling the same JSON‑RPC 2.0 message format across all transport mechanisms. [3]

Stdio Transport

Stdio transport is essentially a form of local inter‑process communication (IPC), most commonly implemented with pipes.

  • What is stdio?
    Stdio (standard I/O) is the standard input/output interface of a process. When a process starts, the operating system assigns it three file descriptors:

    • In code, printf, scanf, cin, cout, read, write all communicate with the outside world through these descriptors.
  • What is a pipe?
    A pipe is an OS‑provided IPC mechanism that lets the output of one process flow directly into the input of another, enabling data to move between the two processes.

  • Summary: What is S… (content truncated)


Originally written by Li Wei (李唯_) and published in Chinese on 后端技术栈全书 (Full-Stack Backend Engineering). Translated and adapted for DriftSeas with permission.

Keep reading

More related articles from DriftSeas.