The Model Context Protocol has become the standard way AI agents access external tools and data. But most guides focus on configuring MCP servers, not understanding how they work. This post is the deep dive: the protocol layer, the transport mechanism, the server lifecycle, and the architectural decisions that make MCP servers reliable at scale. If you are building an MCP server, extending one, or debugging why your agent's tool calls are failing, this is the reference.
The Protocol Layer: JSON-RPC 2.0
MCP is built on JSON-RPC 2.0, a lightweight remote procedure call protocol. Every interaction between client (the AI agent or editor) and server follows the same pattern: the client sends a JSON object with a method name and parameters, the server processes it and returns a JSON object with the result or an error. The protocol is stateless at the message level — each request-response pair is independent — though MCP adds session state on top.
Three message types exist. Requests have an id, method, and params — these expect a response. Responses have the same id and either a result or error. Notifications have a method and params but no id — these are fire-and-forget messages that the receiver should not reply to. MCP uses all three: tool calls are requests, tool results are responses, and progress updates are notifications.
// JSON-RPC 2.0 request (tool call)
{
"jsonrpc": "2.0",
"id": 1,
"method": "tools/call",
"params": {
"name": "get_api_spec",
"arguments": { "project_id": "abc-123" }
}
}
// JSON-RPC 2.0 response (tool result)
{
"jsonrpc": "2.0",
"id": 1,
"result": {
"content": [
{ "type": "text", "text": "{\"endpoints\": [...]}" }
]
}
}Transport: Streamable HTTP and SSE
The original MCP transport was stdio — the editor launched the MCP server as a child process and communicated over stdin/stdout. This works for local servers but not for remote hosted services. The current standard is Streamable HTTP: the client sends JSON-RPC requests as HTTP POST bodies to the server's MCP endpoint, and the server responds with either a direct JSON response or an SSE (Server-Sent Events) stream for long-running operations.
SSE is critical for operations that take more than a few seconds. When an agent triggers a full project scan, the server opens an SSE stream and sends progress notifications ('scanning file 42 of 300...') followed by the final result. The client displays progress to the user and can cancel the operation by closing the stream. Without SSE, long-running tools would appear to hang with no feedback.
The HTTP transport also enables standard infrastructure patterns. MCP servers can sit behind load balancers, CDNs, and API gateways. They can be deployed to any platform that supports HTTP servers — containers, serverless functions (with caveats for SSE), traditional VMs. The protocol is transport-agnostic by design; adding a new transport (WebSocket, gRPC) requires only implementing the message framing layer.
Server Lifecycle
An MCP server goes through a defined lifecycle. On startup, the server initializes its tool registry, connects to backing services (databases, APIs), and begins listening for connections. When a client connects, the server performs capability negotiation — the initialize handshake — where both sides declare what they support: which tools the server offers, whether the client supports progress notifications, whether the server supports argument auto-completion.
After initialization, the server enters the active phase where it processes tool calls. Each tool call is dispatched to the registered handler, which executes the operation and returns a result. The server manages concurrency — multiple tool calls can be in flight simultaneously from the same client — and enforces timeouts to prevent hung operations from blocking the session.
Shutdown involves draining active connections. The server stops accepting new tool calls, waits for in-flight operations to complete (with a deadline), sends close notifications to connected clients, and then exits. Graceful shutdown is essential for remote servers that may be redeployed during active agent sessions.
Tool Registration and Discovery
Tools are the core abstraction in MCP. Each tool has a name, a description (used by agents to decide when to call it), and a typed input schema defined in JSON Schema (typically authored with Zod in TypeScript servers). The schema is not just documentation — clients validate arguments against it before sending the request, and servers validate again on receipt.
Tool discovery happens during the initialize handshake. The client calls tools/list and receives the full catalog of available tools with their schemas. Some servers support dynamic tools that change based on session state — for example, after an agent resolves a project, additional project-specific tools may become available. The client can re-fetch the tool list at any time to discover newly available tools.
// Tool registration in @modelcontextprotocol/sdk
import { McpServer } from '@modelcontextprotocol/sdk/server/mcp.js';
import { z } from 'zod';
const server = new McpServer({
name: 'apphandoff',
version: '2.0.0',
capabilities: { tools: {}, prompts: {}, resources: {} }
});
server.tool(
'get_project_summary',
'Get full project context including API spec, DB schema, and mismatches',
{
project_id: z.string().uuid().describe('Project ID'),
include_schema: z.boolean().optional().describe('Include DB schema')
},
async ({ project_id, include_schema }, extra) => {
// Handler implementation — access backing services,
// build response, return structured content
const summary = await buildProjectSummary(project_id, { include_schema });
return {
content: [{ type: 'text', text: JSON.stringify(summary) }]
};
}
);Session Management
MCP sessions maintain state across tool calls. When a client connects, the server assigns a session ID (typically returned in a Mcp-Session-Id header). Subsequent requests include this session ID, allowing the server to associate state — authenticated user, resolved project, cached data — with the session. Sessions have a TTL and are cleaned up after inactivity.
Session state is particularly important for multi-step agent workflows. An agent might call resolve_project to set the current project, then call get_api_spec, get_db_schema, and get_handoff_requests — all of which need to know which project to query. Rather than passing project_id in every call, the session remembers the resolved project. AppHandoff's MCP server uses AsyncLocalStorage in Node.js to thread session context through the entire request handling chain without explicit parameter passing.
Authentication Patterns
MCP servers support multiple authentication patterns depending on the deployment model. API key authentication is simplest: the client includes a bearer token in the Authorization header, and the server validates it against a database of issued keys. This works well for headless CI agents and personal developer setups.
OAuth authentication is used for interactive editor integrations where users need to authorize access without sharing API keys. The MCP server acts as an OAuth resource server, the editor acts as the OAuth client, and a separate authorization server handles the consent flow. AppHandoff implements a full OAuth consent screen at /oauth/consent that lets users authorize editor connections with scoped permissions.
For multi-tenant servers, authentication also determines authorization — which projects, tools, and data the caller can access. A single MCP server might serve hundreds of teams, each with their own projects and tickets. The auth layer maps the token to a user, the user to their authorized projects, and scopes all tool responses accordingly.
Scaling Considerations
MCP servers face unique scaling challenges compared to typical APIs. Agent sessions are long-lived (minutes to hours), tool calls can be computationally expensive (scanning entire codebases), and SSE connections consume server resources for the duration of the session. A server handling 1,000 concurrent agent sessions with SSE streams needs to manage 1,000 persistent HTTP connections — fundamentally different from a REST API handling 1,000 short-lived requests per second.
AppHandoff's production MCP server addresses this with several strategies: stateless session rehydration (session data stored in the database, not in server memory, so any instance can handle any request), connection draining during deploys (active sessions are migrated gracefully), and progressive tool responses (large results like full API specs are streamed rather than buffered). The server runs on Fly.io with a minimum of two always-on machines, ensuring zero-downtime deploys.
Debugging MCP Connections
When MCP tool calls fail, the issue is almost always in one of four places: transport (the client cannot reach the server), authentication (the token is invalid or expired), tool dispatch (the server received the call but the handler errored), or serialization (the response is too large or contains invalid JSON). MCP servers should log the full JSON-RPC request and response at debug level, and return structured error objects with error codes and messages that agents can interpret.
For hands-on debugging, AppHandoff provides an MCP Inspector tool that lets you connect to any MCP server, browse available tools, and execute test calls with live request/response inspection. To build and run your own MCP server, start with the @modelcontextprotocol/sdk documentation. For a practical example of MCP in action, the setup guide at /blog/how-to-set-up-mcp-with-lovable walks through connecting an agent to a production MCP server, and the full MCP server reference at /mcp-server documents every tool AppHandoff exposes.