OBSERVABILITY

See everything your agents do

Distributed tracing, Prometheus metrics, and a full audit trail across all three pillars. Monitor every tool call, sandbox execution, and skill invocation — with automatic secret masking built in.

Get Started

Real-time dashboard with tool call volume, success rates, and latency percentiles

HOW IT WORKS

Full visibility across tools, skills, and sandboxes

Distributed Tracing

Push traces to any OpenTelemetry-compatible backend — Datadog, Azure Application Insights, AWS CloudWatch, Google Cloud Operations, Grafana Cloud, New Relic, Splunk, or any custom OTLP endpoint. Auto-instruments FastAPI, SQLAlchemy, and HTTPX with zero code changes. Swap providers at runtime without restarting.

Telemetry Export settings — choose from 8 observability providers

Prometheus Metrics

Pull-based metrics at /api/v1/metrics in Prometheus exposition format. HTTP request counts, latency percentile histograms (p50/p90/p95/p99), tool call success rates, active connections, and token usage — all with automatic UUID normalization to prevent high-cardinality explosion.

Latency percentile monitoring — p50 through p99 over 24 hours

Full Audit Trail

Every MCP tool call, sandbox command, and skill execution is logged to PostgreSQL with full-text search. See the operation type, source, arguments, response, latency, status, and user — with automatic secret masking for API keys, tokens, and passwords. Filter by status, source, date range, and export to CSV.

Request history — tool calls with arguments, responses, and latency

Token Usage Analytics

Track LLM token consumption across every MCP server. Input tokens, output tokens, and total usage with time-range filtering and per-server breakdowns. Know exactly where your AI spend is going and optimize accordingly.

Token usage breakdown — input vs output tokens across servers

API REFERENCE

Complete Metrics API

Every metric is queryable. Build dashboards, set up alerts, or integrate with your existing monitoring stack.

GET/api/v1/metrics

# HELP http_requests_total Total HTTP requests

# TYPE http_requests_total counter

http_requests_total{endpoint="/api/v1/search",method="POST",status="200"} 156.0

# HELP tool_latency_seconds Tool execution latency

# TYPE tool_latency_seconds histogram

tool_latency_seconds_bucket{server="github",tool="create_pr",le="0.5"} 42.0

# HELP server_status MCP server health

# TYPE server_status gauge

server_status{server="github",type="remote"} 1.0

server_status{server="slack",type="remote"} 1.0

Prometheus exposition format

GET/api/v1/audit/tool-calls?status=error

{

"items": [

{

"tool": "github__create_pr",

"server": "github",

"status": "error",

"latency_ms": 2341,

"error": "rate_limit_exceeded",

"created_at": "2026-02-26T14:30:00Z"

}

"total": 3,

"page": 1

}

Full-text search with pagination

Ready to see what your agents are doing?

Deploy MCP Gateway and get full visibility into every tool call, sandbox execution, skill invocation, and token.

Get Started