MCP Gateway

The tools — how MCP Gateway connects agents to any API through eight server types and a unified gateway endpoint.

Overview

MCP servers are the first pillar of MCP Gateway — the tools. Each MCP server exposes tools (functions) that connect to external systems: GitHub, Slack, databases, internal APIs, cloud services. An agent equipped with MCP servers can take action in the real world.

Instead of configuring each AI agent to connect to each MCP server individually, you register servers with the gateway once. The gateway aggregates all tools into a unified catalog, handles authentication and connection management, routes tool calls to the correct server, and records everything through the observability pipeline.

This design solves several problems: agents no longer need to manage multiple connections, server credentials are centralized and encrypted, connection pools are shared across requests, and every tool call passes through a consistent audit pipeline.

Server Types

MCP Gateway supports eight server types to cover every deployment scenario:

Remote

External MCP servers accessible over HTTP/SSE. These are always-running services hosted externally — third-party APIs, cloud-hosted MCP servers, or self-hosted servers on your network. The gateway connects using the Streamable HTTP transport with HTTP/2 connection pooling.

NPX

Node.js MCP servers installed and run via npx. The gateway spawns the process on demand using stdio transport (stdin/stdout). This covers the entire npm ecosystem of MCP server packages.

UVX

Python MCP servers installed and run via uvx. Like NPX servers, these are spawned on demand with stdio transport, covering PyPI-hosted MCP server packages.

Container

Docker containers in local development, or Kubernetes Deployments in production (via the gateway's K8sHostedServerService). Full isolation, useful for servers with complex dependencies or strict security requirements. The gateway manages the container lifecycle and supports both stdio and HTTP transports.

Generated

AI-created MCP servers built from API documentation. When you provide an API spec (OpenAPI, documentation URL, or plain text), MCP Gateway's AI generation pipeline creates a working FastMCP server and loads it in-process — no container, no subprocess. This is the most efficient runtime: tool execution is a direct function call with zero IPC overhead.

Virtual

Curated catalog entries: REST and connector tool definitions materialized into MCP tools in-process. The gateway hosts the REST-to-MCP mapping and injects credentials from the resolved connection on each call — no child process, no container.

Local

Metadata-only tool registry for tools authored inside the gateway (bulk-synced via PUT /api/v1/servers/{server_id}/tools). No runtime deployment — the tool rows act as attachment targets for script tools and gateway-internal tooling.

Bundle

A logical grouping of multiple servers presented as a single unit. Bundles aggregate tools from their member servers keeping each tool's server-qualified name (e.g. GITHUB__create_issue, GITLAB__create_mr) — bundles do not re-qualify.

How It Works

Registration

Servers are registered via the REST API or the web UI. Each registration includes the server type, connection details (URL, package name, Docker image, etc.), and optional configuration like environment variables, authentication credentials, and resource limits.

Once registered, the gateway connects to the server, discovers its tools, and adds them to the aggregated tool catalog.

Runtime Adapters

The gateway uses the adapter pattern to normalize all server types behind a unified interface. Each server type has a dedicated adapter that handles connection, tool listing, tool execution, and disconnection. This means the rest of the system — routing, caching, session management, observability — works identically regardless of the underlying server type.

Gateway Modes — Progressive Tool Loading

The biggest challenge in production AI tooling is context engineering: giving the agent exactly the right tools at the right time without overwhelming its context window. Anthropic's research shows that tool selection accuracy degrades significantly past 30-50 tools. A typical setup with 5 MCP servers exposes 150+ tools, consuming 55,000+ tokens in tool definitions alone — before the agent does any work.

MCP Gateway offers three configurable modes to solve this:

Mode	How it works	Best for
LIST	Returns all tool definitions directly	Small catalogs (under the AUTO threshold — default 20 tools)
SEARCH_EXECUTE	Returns two meta-tools: `SEARCH_TOOLS` and `EXECUTE_TOOL`. The agent searches by intent, gets back the top-K matches with names and schemas, then executes the best match.	Large catalogs. Benchmarks show 100-160x token reduction.
AUTO	Switches between LIST and SEARCH_EXECUTE dynamically based on a configurable tool count threshold (default 20, range 1–1000)	Mixed deployments where tool counts vary

In SEARCH_EXECUTE mode, the gateway uses pgvector-powered semantic search to match the agent's intent against tool names, descriptions, and parameter schemas. The administrator configures how many results (top-K) the search returns, balancing accuracy against token cost.

This is the same pattern Anthropic built into their Tool Search Tool — but MCP Gateway applies it at the infrastructure level, so every connected agent benefits automatically.

Tool Routing

When an agent calls a tool, the gateway resolves which server owns that tool and routes the request through the appropriate adapter. Every tool is stored with a globally-unique qualified name of the form server_name__tool_name; the qualification happens at creation time and the server name is immutable once set. In SEARCH_EXECUTE mode, the agent first searches for tools by keyword, then executes a specific tool by qualified name — the gateway handles the routing transparently.

Connection Pooling

For remote servers, the gateway maintains HTTP/2 connection pools with configurable limits. Pools use LRU eviction when the maximum count is reached. HTTP/2 multiplexing allows multiple concurrent requests over a single connection, significantly reducing latency and resource usage compared to HTTP/1.1.

Session Management

The gateway tracks sessions using the Mcp-Session-Id header on the MCP Streamable HTTP transport (protocol version 2025-03-26). For backwards compatibility it also accepts the legacy X-MCP-Session-ID header on input. Each session records which servers were accessed and maintains per-server external session state, enabling stateful conversations where agents maintain context across multiple tool calls.

Credential Management

Most MCP servers require authentication — OAuth tokens for GitHub, Google, Slack, Microsoft 365, or API keys for internal services. In a typical setup, tokens expire and require a human to re-authenticate, which defeats the purpose of autonomous agents.

MCP Gateway solves this with automatic background token refresh:

How It Works

A user authenticates once with an OAuth provider (GitHub, Google, Microsoft Entra, Okta, Slack, etc.)
The gateway stores the encrypted access and refresh tokens in a per-user token vault (AES-256-GCM encryption with unique IVs)
A background daemon runs every 60 seconds, checking which tokens are due for refresh
Tokens are refreshed 5 minutes before expiry, with random jitter to prevent thundering herd on provider endpoints
Even if a provider doesn't return expiry information, tokens are refreshed at least every 24 hours (safety net against silent revocation)

Intelligent Error Handling

The refresh service distinguishes between recoverable and unrecoverable failures:

Recoverable (network timeout, rate limit) — retries with exponential backoff, respects Retry-After headers
Unrecoverable (invalid_grant, access_denied, token revoked) — immediately invalidates the connection and flags it for user attention

Connections auto-invalidate after 3 unrecoverable or 10 consecutive recoverable failures, preventing credential debt from accumulating silently.

Per-User Isolation

Each user maintains isolated OAuth connections to each server. This means:

User A's GitHub token is separate from User B's GitHub token
Token refresh happens independently per user
Revoking one user's access doesn't affect others
The audit trail shows exactly which user's credentials were used for each tool call

Credential Health Dashboard

The gateway provides real-time credential health monitoring:

Status	Meaning
Healthy	Token valid, not near expiry
Auto-refresh active	Expiring soon, but refresh is scheduled and working
Warning	Less than 24 hours until expiry
Critical	Less than 1 hour until expiry
Expired	Token expired or connection revoked — needs re-authentication

How MCP Servers Connect to Skills and Sandboxes

MCP servers don't work in isolation — they form part of the complete agent runtime:

Skills teach agents how to use the tools that MCP servers provide. A skill can reference specific tool names, define workflows that chain multiple tool calls, and provide domain context that makes tool usage more effective. Skills can be linked to specific servers for tool validation.
Sandboxes provide the execution environment where agents can run code alongside tool calls. An agent might call a tool to fetch data from an API (MCP server), then execute a Python script to analyze it (sandbox), following a workflow defined by a skill.

MCP servers can also host script tools — user-authored Python scripts packaged as skills and attached to a server. Script tools appear alongside REST tools in tools/list and SEARCH_TOOLS. With script_only exposure mode, raw REST tools are hidden — the agent can only interact through curated script tools. See Scripting Layer for details.

Key Features

Eight server types covering remote, Node.js, Python, Docker, AI-generated, virtual (curated REST), local (metadata-only), and bundled servers
Automatic tool discovery — the gateway connects to each server and discovers available tools
Connection pooling with HTTP/2 multiplexing and LRU eviction
Stateful sessions with per-server external session tracking
Encrypted credentials — server authentication details encrypted at rest with AES-256-GCM
Health monitoring — server status tracking with connection validation
On-demand lifecycle — NPX and UVX servers are started only when needed and stopped when idle
In-process execution — generated servers run as direct function calls for zero-overhead tool execution

API Reference

List servers — retrieve all registered MCP servers
Register a server — add a new MCP server to the gateway
Server details — get configuration, tools, and status for a specific server

MCP Servers

On this page