Skip to content

Telemetry

What you'll learn here: what observability signals the adapter emits, how to enable them, and what to look for once your collector is receiving data.


Signals

The adapter emits two types of OpenTelemetry signals:

  • Metrics — counters, histograms, and gauges tracking requests, tool calls, uploads, artifacts, circuit breaker state, session lifecycle, and cleanup activity. These are always available when telemetry is enabled.
  • Logs — optionally, application log records can be forwarded as OTel log records to your collector. This is controlled separately by telemetry.emit_logs. It requires HTTP transport when using a dedicated logs_endpoint.

Distributed traces are not currently emitted. Do not configure a trace exporter — there is nothing to receive.


Enabling telemetry

Telemetry is off by default. To enable it, add a telemetry section to your config.yaml pointing at your OTLP collector:

telemetry:
  enabled: true
  transport: "grpc"
  endpoint: "http://otel-collector:4317"
  insecure: true
  service_name: "remote-mcp-adapter"
  export_interval_seconds: 15

For HTTP transport (useful with managed observability platforms that accept OTLP/HTTP):

telemetry:
  enabled: true
  transport: "http"
  endpoint: "https://otel.example.com/v1/metrics"
  insecure: false
  headers:
    Authorization: "Bearer ${OTEL_API_TOKEN}"
  emit_logs: true
  logs_endpoint: "https://otel.example.com/v1/logs"

The adapter lazy-imports the OpenTelemetry SDK at startup. If the SDK is not installed, telemetry is silently disabled at runtime with a log warning. Make sure opentelemetry-sdk, opentelemetry-exporter-otlp-proto-grpc (or -http), and opentelemetry-api are in your environment.


What to look for

Once data is flowing, these metrics give you the most useful operational picture:

Request throughput and latency

adapter_http_requests_total counts every HTTP request handled by the adapter, labelled by server, HTTP method, route group, and response status class. adapter_http_request_duration_seconds is the matching latency histogram. Sudden spikes in 5xx responses or elevated p99 latency here usually point to an upstream problem.

Upstream tool call performance

adapter_upstream_tool_calls_total counts proxied calls by server, tool name, and outcome (ok or the error type). adapter_upstream_tool_call_duration_seconds histograms the round-trip time. Use these to find slow or flaky tools on specific upstream servers.

Circuit breaker state

adapter_upstream_circuit_breaker_state is a gauge per server (0 = closed, 1 = half-open, 2 = open). An open state means the adapter is rejecting all calls to that server without trying to reach it. Alert on this metric if you need to know when an upstream becomes unavailable.

adapter_upstream_ping_total counts health pings by result. A rising failure count is an early warning before the breaker opens.

Upload activity

adapter_upload_batches_total, adapter_upload_files_total, and adapter_upload_bytes_total track staged file volume. adapter_upload_failures_total counts rejections by reason (size exceeded, expired nonce, etc.).

Artifact downloads

adapter_artifact_downloads_total counts resource-read and HTTP download requests for artifacts. adapter_artifact_download_bytes_total and adapter_artifact_download_duration_seconds give volume and latency.

Cleanup

adapter_cleanup_cycles_total and adapter_cleanup_removed_records_total confirm that the background cleanup loop is running and removing expired records. A stalled cleanup loop (no cycles for several minutes) usually means the process is overloaded.

Session lifecycle

adapter_sessions_lifecycle_total counts session create, expire, and revival transitions. Use this to understand session churn in multi-user deployments.


Metric catalog

The following table lists every metric name emitted by the adapter.

Note

This catalog must be kept in sync with src/remote_mcp_adapter/telemetry/otel_bootstrap.py. If the code changes, this table may become stale.

Metric Type Description
adapter_http_requests_total Counter Total HTTP requests by server, method, route group, and status class
adapter_http_request_duration_seconds Histogram HTTP request latency
adapter_upload_batches_total Counter Upload batches accepted
adapter_upload_files_total Counter Files accepted by upload endpoint
adapter_upload_bytes_total Counter Total bytes persisted by upload endpoint
adapter_auth_rejections_total Counter Auth-related rejections by reason and route group
adapter_upstream_tool_calls_total Counter Proxied upstream tool calls by tool name and outcome
adapter_upstream_tool_call_duration_seconds Histogram Upstream tool call latency
adapter_upstream_ping_total Counter Active upstream pings by result
adapter_upstream_ping_latency_seconds Histogram Upstream ping latency
adapter_upstream_circuit_breaker_state Gauge Circuit breaker state per server (0=closed, 1=half_open, 2=open)
adapter_persistence_policy_transitions_total Counter Persistence policy transitions by action and source
adapter_nonce_operations_total Counter Upload nonce operations by backend and result
adapter_upload_credentials_total Counter Signed upload credential issue/validate outcomes
adapter_artifact_downloads_total Counter Artifact download attempts by result
adapter_artifact_download_bytes_total Counter Total bytes served by artifact download endpoint
adapter_artifact_download_duration_seconds Histogram Artifact download latency
adapter_upload_failures_total Counter Upload endpoint failures by reason
adapter_request_rejections_total Counter Non-auth rejections by reason and route group
adapter_adapter_wiring_runs_total Counter Adapter wiring pass outcomes
adapter_adapter_wiring_not_ready_servers Gauge Number of servers not yet wired after last wiring run
adapter_cleanup_cycles_total Counter Completed cleanup cycles by outcome
adapter_cleanup_removed_records_total Counter Records/files removed per cleanup cycle by bucket
adapter_sessions_lifecycle_total Counter Session lifecycle transitions

Next steps

  • See also: Configuration — add the telemetry block to your config.
  • See also: Config Reference — all telemetry.* fields.
  • See also: Health — the health endpoint for operational diagnostics.