What is the gateway pattern for MCP tools?

Instead of registering every operation as its own MCP tool, you expose a small intent surface — often a single tool like ask_apphandoff. The agent sends natural-language intent, and a router model resolves it to the right underlying call(s), invoking them behind the gateway.

Why does exposing every MCP tool degrade at scale?

Three reasons: context cost scales with the catalogue because every tool's schema loads on every request; routing ambiguity grows as near-synonym tools (get_ticket vs get_ticket_thread) multiply; and the security surface is the whole catalogue at once, since every mutating tool is one confused inference away from firing.

What are the costs of a gateway?

A gateway adds an extra LLM hop (the router is its own inference with its own latency, token cost, and failure modes), increases latency from two hops instead of one, requires a confirm flow to stash and apply pending mutations, and moves debuggability from reading one call to tracing intent through routing to invocation.

How does the confirm flow protect mutations?

Read-only calls run directly, but any state change funnels through one gate: the router paraphrases what it's about to do, stashes the pending action, and confirms before committing. That policy is enforced once at the gateway rather than bolted onto every dangerous tool.

Why a Gateway Beats Exposing Every MCP Tool

Q: When should I use direct tool exposure instead of a gateway?

Use direct exposure when the surface is small and stable, mostly read-only, latency-critical, or when callers genuinely need to compose raw operations themselves. Wrapping a handful of safe tools in a gateway is over-engineering.

ArchitectureOpinionMCP

When you wire an agent up to a system over MCP, the obvious move is to expose everything. You have 50 operations the agent might want — list projects, read tickets, open tickets, publish contracts, close handoffs — so you register 50 tools. Each one gets a name, a description, and a JSON schema. The agent reads the catalogue and picks what it needs.

This works beautifully at five tools. It starts to wobble at twenty. By fifty it has quietly become one of the biggest reliability and cost problems in your stack — and the failure mode is subtle enough that most teams don't trace it back to the tool surface.

This piece argues for a different default: put a gateway in front of your tools. Expose one (or a few) intent-shaped entry points, and let a router resolve natural-language intent to the right underlying calls. It's not free, and it's not always right. But for large, mutation-heavy tool catalogues, it's the better starting point.

The instinct to expose everything is reasonable — at first

Direct tool exposure has real virtues. The schema is the contract: the agent sees exactly what arguments each tool takes and what it returns. There's no extra inference layer to debug, no second model deciding what you meant. Latency is one hop. When something breaks, you can read the tool call and the response and know what happened.

For a small, stable, mostly-read-only surface, this is the correct design. Don't add a gateway to wrap three tools. You'd be paying for machinery you don't need.

The problem is that tool surfaces rarely stay small.

Where direct exposure degrades

Context cost scales with the catalogue, not with the task. Every registered tool's name, description, and schema gets loaded into the model's context — on every request — whether or not the agent uses it. A rich tool with nested arguments and enum constraints can run hundreds of tokens. Fifty of those is a standing tax on every turn, eating context budget that should belong to the actual work. The agent pays to know about tools it will never call this turn.

Routing ambiguity grows faster than the catalogue. With fifty tools, you inevitably have clusters that look alike: get_ticket, get_ticket_thread, get_ticket_activity, get_ticket_audit_log. The agent now has to disambiguate between near-synonyms based on terse descriptions. It picks the plausible-but-wrong one, gets a response that sort of looks right, and proceeds on bad data. More tools means more of these collisions, and they're hard to catch because nothing errors — the agent just quietly does the wrong thing.

Security surface is the whole catalogue, all the time. If every mutating tool is directly callable, every one of them is one confused inference away from firing. A delete_handoff_request sitting next to get_handoff_requests in the catalogue is a live wire. You can add per-tool guards, but now you're enforcing safety in fifty places instead of one.

The gateway alternative

A gateway inverts the arrangement. Instead of exposing every operation, you expose a small intent surface — in AppHandoff's MCP server, that's a single ask_apphandoff tool. The agent sends natural-language intent ("close the ticket about the broken login redirect"). A router model interprets it, selects the right underlying tool(s), invokes them, paraphrases what it's about to do, and — for anything that mutates state — confirms before committing.

The wins map directly onto the problems above:

Context stays flat. The agent loads one tool schema, not fifty. The full catalogue lives behind the gateway and is consulted by the router, not carried in the calling agent's context every turn. Adding a 51st tool costs the agent nothing.

Routing becomes a first-class job. Disambiguating get_ticket from get_ticket_thread is now the router's explicit responsibility, with the full request in view — not a side effect of an agent skimming descriptions while juggling everything else.

Mutations funnel through one gate. Every state change passes the same confirm step. Read-only calls run directly; writes stop and ask. You enforce that policy once, at the gateway, instead of bolting guards onto every dangerous tool. It's the same instinct behind the confirm flow everywhere in AppHandoff — the system checks intent against reality before it commits.

The tradeoffs — paid honestly

A gateway is not a free lunch. Be clear-eyed about what it costs.

An extra LLM hop. Routing is itself an inference. You've added a model call between intent and execution, with its own latency, its own token cost, and its own failure modes. The router can misroute just as a direct agent can mispick — you've relocated the ambiguity, not abolished it. The bet is that a model whose only job is routing, with the full request in front of it, does that job better than an agent doing it as a distraction. That bet usually pays off, but it is a bet.

Latency. Two hops are slower than one. For interactive, read-heavy use this is often invisible against network and tool time. For tight latency budgets it matters, and you should measure rather than assume.

You now need a confirm flow. Routing to a mutating action means you need somewhere to stash the pending action, surface the paraphrase, and apply or drop it on the agent's response. That's real machinery — state for the pending action, a clean way to confirm or cancel — and it's mandatory, not optional, the moment a router can trigger writes on inferred intent.

Debuggability moves. With direct tools you read one call. With a gateway you trace intent → routing decision → invocation → result. Good logging at the routing layer makes this tractable; without it, you've added a black box.

When each approach fits

Reach for direct exposure when the surface is small and stable, mostly read-only, latency-critical, or when callers genuinely need to compose raw operations themselves. Wrapping a handful of safe tools in a gateway is over-engineering.

Reach for a gateway when the catalogue is large or growing, when tools cluster into confusable families, when a meaningful share of operations mutate state, or when context budget is tight because the agent is already doing heavy work. The more your tool count climbs and the more of it is destructive, the more the gateway's fixed costs amortize into a clear net win.

A useful rule of thumb: if you find yourself writing per-tool documentation explaining which of several similar tools to use when, the model is going to have the same trouble you're having — and that's the signal to put a router in front.

AppHandoff's MCP server runs the gateway pattern in production at https://api.apphandoff.com/api/mcp-bot: one ask_apphandoff entry point, an LLM router, paraphrase-and-confirm on every mutation, direct execution for reads. If you're designing an MCP server and your tool list is past twenty and climbing, the rest of the blog walks through more of the patterns behind it.