Telemetry¶
What you'll learn here: what observability signals the adapter emits, how to enable them, and what to look for once your collector is receiving data.
Signals¶
The adapter emits two types of OpenTelemetry signals:
- Metrics — counters, histograms, and gauges tracking requests, tool calls, uploads, artifacts, circuit breaker state, session lifecycle, and cleanup activity. These are always available when telemetry is enabled.
- Logs — optionally, application log records can be forwarded as OTel log records to your collector. This is controlled separately by
telemetry.emit_logs. It requires HTTP transport when using a dedicatedlogs_endpoint.
Distributed traces are not currently emitted. Do not configure a trace exporter — there is nothing to receive.
Enabling telemetry¶
Telemetry is off by default. To enable it, add a telemetry section to your config.yaml pointing at your OTLP collector:
telemetry:
enabled: true
transport: "grpc"
endpoint: "http://otel-collector:4317"
insecure: true
service_name: "remote-mcp-adapter"
export_interval_seconds: 15
For HTTP transport (useful with managed observability platforms that accept OTLP/HTTP):
telemetry:
enabled: true
transport: "http"
endpoint: "https://otel.example.com/v1/metrics"
insecure: false
headers:
Authorization: "Bearer ${OTEL_API_TOKEN}"
emit_logs: true
logs_endpoint: "https://otel.example.com/v1/logs"
The adapter lazy-imports the OpenTelemetry SDK at startup. If the SDK is not installed, telemetry is silently disabled at runtime with a log warning. Make sure opentelemetry-sdk, opentelemetry-exporter-otlp-proto-grpc (or -http), and opentelemetry-api are in your environment.
What to look for¶
Once data is flowing, these metrics give you the most useful operational picture:
Request throughput and latency
adapter_http_requests_total counts every HTTP request handled by the adapter, labelled by server, HTTP method, route group, and response status class. adapter_http_request_duration_seconds is the matching latency histogram. Sudden spikes in 5xx responses or elevated p99 latency here usually point to an upstream problem.
Upstream tool call performance
adapter_upstream_tool_calls_total counts proxied calls by server, tool name, and outcome (ok or the error type). adapter_upstream_tool_call_duration_seconds histograms the round-trip time. Use these to find slow or flaky tools on specific upstream servers.
Circuit breaker state
adapter_upstream_circuit_breaker_state is a gauge per server (0 = closed, 1 = half-open, 2 = open). An open state means the adapter is rejecting all calls to that server without trying to reach it. Alert on this metric if you need to know when an upstream becomes unavailable.
adapter_upstream_ping_total counts health pings by result. A rising failure count is an early warning before the breaker opens.
Upload activity
adapter_upload_batches_total, adapter_upload_files_total, and adapter_upload_bytes_total track staged file volume. adapter_upload_failures_total counts rejections by reason (size exceeded, expired nonce, etc.).
Artifact downloads
adapter_artifact_downloads_total counts resource-read and HTTP download requests for artifacts. adapter_artifact_download_bytes_total and adapter_artifact_download_duration_seconds give volume and latency.
Cleanup
adapter_cleanup_cycles_total and adapter_cleanup_removed_records_total confirm that the background cleanup loop is running and removing expired records. A stalled cleanup loop (no cycles for several minutes) usually means the process is overloaded.
Session lifecycle
adapter_sessions_lifecycle_total counts session create, expire, and revival transitions. Use this to understand session churn in multi-user deployments.
Metric catalog¶
The following table lists every metric name emitted by the adapter.
Note
This catalog must be kept in sync with src/remote_mcp_adapter/telemetry/otel_bootstrap.py. If the code changes, this table may become stale.
| Metric | Type | Description |
|---|---|---|
adapter_http_requests_total | Counter | Total HTTP requests by server, method, route group, and status class |
adapter_http_request_duration_seconds | Histogram | HTTP request latency |
adapter_upload_batches_total | Counter | Upload batches accepted |
adapter_upload_files_total | Counter | Files accepted by upload endpoint |
adapter_upload_bytes_total | Counter | Total bytes persisted by upload endpoint |
adapter_auth_rejections_total | Counter | Auth-related rejections by reason and route group |
adapter_upstream_tool_calls_total | Counter | Proxied upstream tool calls by tool name and outcome |
adapter_upstream_tool_call_duration_seconds | Histogram | Upstream tool call latency |
adapter_upstream_ping_total | Counter | Active upstream pings by result |
adapter_upstream_ping_latency_seconds | Histogram | Upstream ping latency |
adapter_upstream_circuit_breaker_state | Gauge | Circuit breaker state per server (0=closed, 1=half_open, 2=open) |
adapter_persistence_policy_transitions_total | Counter | Persistence policy transitions by action and source |
adapter_nonce_operations_total | Counter | Upload nonce operations by backend and result |
adapter_upload_credentials_total | Counter | Signed upload credential issue/validate outcomes |
adapter_artifact_downloads_total | Counter | Artifact download attempts by result |
adapter_artifact_download_bytes_total | Counter | Total bytes served by artifact download endpoint |
adapter_artifact_download_duration_seconds | Histogram | Artifact download latency |
adapter_upload_failures_total | Counter | Upload endpoint failures by reason |
adapter_request_rejections_total | Counter | Non-auth rejections by reason and route group |
adapter_adapter_wiring_runs_total | Counter | Adapter wiring pass outcomes |
adapter_adapter_wiring_not_ready_servers | Gauge | Number of servers not yet wired after last wiring run |
adapter_cleanup_cycles_total | Counter | Completed cleanup cycles by outcome |
adapter_cleanup_removed_records_total | Counter | Records/files removed per cleanup cycle by bucket |
adapter_sessions_lifecycle_total | Counter | Session lifecycle transitions |
Next steps¶
- See also: Configuration — add the telemetry block to your config.
- See also: Config Reference — all
telemetry.*fields. - See also: Health — the health endpoint for operational diagnostics.