What protocol does MCP use under the hood?

MCP is built on JSON-RPC 2.0, a lightweight remote procedure call protocol where the client sends a JSON object with a method name and parameters and the server returns a result or error with the same request ID. Three message types exist: requests which expect responses, responses which carry results, and notifications which are fire-and-forget. MCP adds session state and tool discovery on top of this base protocol.

What is Streamable HTTP transport in MCP?

Streamable HTTP is the current standard MCP transport for remote servers. Clients send JSON-RPC requests as HTTP POST bodies to the server's MCP endpoint. The server can respond with a direct JSON object for fast operations or open an SSE stream for long-running operations, sending progress notifications before the final result. This transport enables standard infrastructure patterns: load balancers, CDNs, and containerized deployments all work without modification.

How does tool discovery work in an MCP server?

Tool discovery happens during the initialize handshake when a client first connects. The client calls tools/list and receives the complete catalog of available tools with their names, descriptions, and JSON Schema input specifications. The description field is read by the AI agent to decide when to call each tool, so it functions as a contract between the server author and any agent that connects. Some servers support dynamic tools that change based on session state, requiring clients to refresh the tool list after state changes.

How does session management work in MCP?

When a client connects, the server assigns a session ID typically returned in an Mcp-Session-Id header. Subsequent requests include this ID so the server can associate state with the session: authenticated user, resolved project context, cached data. Sessions have a time-to-live and are cleaned up after inactivity. Session state enables multi-step workflows where an agent resolves a project once and subsequent tool calls are automatically scoped to that project without repeating the resolution on every call.

How do MCP servers handle scaling for many concurrent agent sessions?

MCP servers face unique scaling challenges because sessions are long-lived and SSE connections consume persistent server resources. The production approach is stateless session rehydration: session data is stored in a database rather than server memory so any instance can handle any request. This enables horizontal scaling behind a load balancer. Active sessions are drained gracefully during deploys by stopping new connection acceptance, completing in-flight operations, and migrating sessions before restart.

MCP Server Architecture Explained

EngineeringMCP

The Model Context Protocol has become the standard way AI agents access external tools and data. But most guides focus on configuring MCP servers — like adding an MCP server to Cursor — not understanding how they work. This post is the deep dive: the protocol layer, the transport mechanism, the server lifecycle, and the architectural decisions that make MCP servers reliable at scale. If you are building an MCP server, extending one, or debugging why your agent's tool calls are failing, this is the reference.

The Protocol Layer: JSON-RPC 2.0

MCP is built on JSON-RPC 2.0, a lightweight remote procedure call protocol. Every interaction between client (the AI agent or editor) and server follows the same pattern: the client sends a JSON object with a method name and parameters, the server processes it and returns a JSON object with the result or an error. The protocol is stateless at the message level — each request-response pair is independent — though MCP adds session state on top.

Three message types exist. Requests have an id, method, and params — these expect a response. Responses have the same id and either a result or error. Notifications have a method and params but no id — these are fire-and-forget messages that the receiver should not reply to. MCP uses all three: tool calls are requests, tool results are responses, and progress updates are notifications.

// JSON-RPC 2.0 request (tool call)
{
  "jsonrpc": "2.0",
  "id": 1,
  "method": "tools/call",
  "params": {
    "name": "get_api_spec",
    "arguments": { "project_id": "abc-123" }
  }
}

// JSON-RPC 2.0 response (tool result)
{
  "jsonrpc": "2.0",
  "id": 1,
  "result": {
    "content": [
      { "type": "text", "text": "{\"endpoints\": [...]}" }
    ]
  }
}

Transport: Streamable HTTP and SSE

The original MCP transport was stdio — the editor launched the MCP server as a child process and communicated over stdin/stdout. This works for local servers but not for remote hosted services. The current standard is Streamable HTTP: the client sends JSON-RPC requests as HTTP POST bodies to the server's MCP endpoint, and the server responds with either a direct JSON response or an SSE (Server-Sent Events) stream for long-running operations.

SSE is critical for operations that take more than a few seconds. When an agent runs a long operation — a bulk ticket update, a large context aggregation — the server opens an SSE stream and sends progress notifications followed by the final result. The client displays progress to the user and can cancel the operation by closing the stream. Without SSE, long-running tools would appear to hang with no feedback.

The HTTP transport also enables standard infrastructure patterns. MCP servers can sit behind load balancers, CDNs, and API gateways. They can be deployed to any platform that supports HTTP servers — containers, serverless functions (with caveats for SSE), traditional VMs. The protocol is transport-agnostic by design; adding a new transport (WebSocket, gRPC) requires only implementing the message framing layer.

Server Lifecycle

An MCP server goes through a defined lifecycle. On startup, the server initializes its tool registry, connects to backing services (databases, APIs), and begins listening for connections. When a client connects, the server performs capability negotiation — the initialize handshake — where both sides declare what they support: which tools the server offers, whether the client supports progress notifications, whether the server supports argument auto-completion.

After initialization, the server enters the active phase where it processes tool calls. Each tool call is dispatched to the registered handler, which executes the operation and returns a result. The server manages concurrency — multiple tool calls can be in flight simultaneously from the same client — and enforces timeouts to prevent hung operations from blocking the session.

Shutdown involves draining active connections. The server stops accepting new tool calls, waits for in-flight operations to complete (with a deadline), sends close notifications to connected clients, and then exits. Graceful shutdown is essential for remote servers that may be redeployed during active agent sessions.

Tool Registration and Discovery

Tools are the core abstraction in MCP. Each tool has a name, a description (used by agents to decide when to call it), and a typed input schema defined in JSON Schema (typically authored with Zod in TypeScript servers). The schema is not just documentation — clients validate arguments against it before sending the request, and servers validate again on receipt.

Tool discovery happens during the initialize handshake. The client calls tools/list and receives the full catalog of available tools with their schemas. Some servers support dynamic tools that change based on session state — for example, after an agent resolves a project, additional project-specific tools may become available. The client can re-fetch the tool list at any time to discover newly available tools.

// Tool registration in @modelcontextprotocol/sdk
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';

const server = new McpServer({
  name: 'apphandoff',
  version: '2.0.0',
  capabilities: { tools: {}, prompts: {}, resources: {} }
});

server.tool(
  'get_project_summary',
  'Get full project context including tickets, roles, and milestones',
  {
    project_id: z.string().uuid().describe('Project ID'),
    include_schema: z.boolean().optional().describe('Include DB schema')
  },
  async ({ project_id, include_schema }, extra) => {
    // Handler implementation — access backing services,
    // build response, return structured content
    const summary = await buildProjectSummary(project_id, { include_schema });
    return {
      content: [{ type: 'text', text: JSON.stringify(summary) }]
    };
  }
);

Session Management

MCP sessions maintain state across tool calls. When a client connects, the server assigns a session ID (typically returned in a Mcp-Session-Id header). Subsequent requests include this session ID, allowing the server to associate state — authenticated user, resolved project, cached data — with the session. Sessions have a TTL and are cleaned up after inactivity.

Session state is particularly important for multi-step agent workflows. An agent might call resolve_project to set the current project, then call get_api_spec, get_db_schema, and get_handoff_requests — all of which need to know which project to query. Rather than passing project_id in every call, the session remembers the resolved project. AppHandoff's MCP server uses AsyncLocalStorage in Node.js to thread session context through the entire request handling chain without explicit parameter passing.

Authentication Patterns

MCP servers support multiple authentication patterns depending on the deployment model. API key authentication is simplest: the client includes a bearer token in the Authorization header, and the server validates it against a database of issued keys. This works well for headless CI agents and personal developer setups.

OAuth authentication is used for interactive editor integrations where users need to authorize access without sharing API keys. The MCP server acts as an OAuth resource server, the editor acts as the OAuth client, and a separate authorization server handles the consent flow. AppHandoff implements a full OAuth consent screen at /oauth/consent that lets users authorize editor connections with scoped permissions.

For multi-tenant servers, authentication also determines authorization — which projects, tools, and data the caller can access. A single MCP server might serve hundreds of teams, each with their own projects and tickets. The auth layer maps the token to a user, the user to their authorized projects, and scopes all tool responses accordingly.

Scaling Considerations

MCP servers face unique scaling challenges compared to typical APIs. Agent sessions are long-lived (minutes to hours), tool calls can be computationally expensive (large context aggregations), and SSE connections consume server resources for the duration of the session. A server handling 1,000 concurrent agent sessions with SSE streams needs to manage 1,000 persistent HTTP connections — fundamentally different from a REST API handling 1,000 short-lived requests per second.

AppHandoff's production MCP server addresses this with several strategies: stateless session rehydration (session data stored in the database, not in server memory, so any instance can handle any request), connection draining during deploys (active sessions are migrated gracefully), and progressive tool responses (large results like full API specs are streamed rather than buffered). The server runs on Fly.io with a minimum of two always-on machines, ensuring zero-downtime deploys.

Debugging MCP Connections

When MCP tool calls fail, the issue is almost always in one of four places: transport (the client cannot reach the server), authentication (the token is invalid or expired), tool dispatch (the server received the call but the handler errored), or serialization (the response is too large or contains invalid JSON). MCP servers should log the full JSON-RPC request and response at debug level, and return structured error objects with error codes and messages that agents can interpret.

For hands-on debugging, AppHandoff provides an MCP Inspector tool that lets you connect to any MCP server, browse available tools, and execute test calls with live request/response inspection. To build and run your own MCP server, start with the @modelcontextprotocol/sdk documentation. For a practical example of MCP in action, the full MCP server reference at /mcp-server documents every tool AppHandoff exposes.