Skip to content

Overview

API2OTEL is a YAML‑driven async scraper that turns uninstrumented HTTP/JSON APIs into first‑class OpenTelemetry metrics and logsβ€”without building one‑off exporters or wiring custom code. Point it at the APIs that hide your operational or business state and it will poll, extract, shape, deduplicate, and emit telemetry through the OTEL pipeline you already run.

Most API surfaces (SaaS, internal platforms, scheduled batch endpoints) already contain answers to questions teams ask in dashboards: queue depth, job runtimes, sync failures, external SLAs, integration throughput. They rarely expose native OTEL or Prometheus signals. The usual "solution" becomes a patchwork of cron scripts, throwaway Python, or bespoke collectors that are hard to extend and impossible to standardize.

API2OTEL focuses on turning that glue work into a declarative layer:

  • Define sources, auth, scrape cadence, and time windows in one file.
  • Map raw JSON fields to gauges, counters, histograms, and structured logs.
  • Apply record filtering, volume caps, and fingerprint‑based deduplication so backends stay lean.
  • Run historical backfills (range scrapes) and ongoing incremental polls side by side.
  • Observe the scraper itself (self‑telemetry) to catch stalls, slow scrapes, or ineffective dedupe.

Instead of "write a mini integration for every API", you version a config, commit it, and gain portable, reviewable observability coverage.

πŸ’‘ The Problem

Most teams run critical flows on systems they don't control:

  • SaaS platforms: Workday, ServiceNow, Jira, GitHub, Salesforce…
  • Internal tools: Only expose REST/HTTP APIs or "download report" endpoints
  • Batch runners: Emit JSON or CSV, not OTEL signals

They already have an observability stack built on OpenTelemetry, but bridging those APIs typically ends up as messy one-offs:

  • Python scripts + cron that nobody owns
  • SaaS-specific "exporters" that can't be reused across products
  • JSON dumps and screenshots instead of real metrics

🎯 The Solution

Make this reusable and standard:

API data β†’ extract records β†’ emit OTLP β†’ your collector

No code changes. No vendor lock-in. Everything flows through your existing OTEL stack.

πŸ“‹ What It Does

API2OTEL is a config-driven async service that:

  • Polls any HTTP API or data endpoint
  • Extracts records from JSON responses
  • Maps them to OTEL metrics (gauges, counters, histograms) and logs
  • Emits everything via OTLP to your collector
       [ APIs / data endpoints ]
                ↓ HTTP
       API2OTEL (this service)
                ↓ OTLP (gRPC/HTTP)
      OpenTelemetry Collector
                ↓
      Prometheus / Grafana / Loki / …

Entirely YAML-driven. Add/update sources by editing configβ€”no code needed.

βš™οΈ Key Features

πŸ”§ Config-Driven Scraping

Declare every source in YAML with:

  • Frequency (5min, 1h, 1d, …)
  • Scrape mode (range with start/end or relative windows; instant snapshots)
  • Time formats (global + per-source)
  • Query params (time keys, extra args, URL encoding rules)

Add/change sources by editing configβ€”no code changes required.
Full config explained: Click here

Download Config Template

πŸ” Rich Authentication Strategies

Built-in auth support:

  • Basic Auth: Username/password via environment variables
  • API Key Headers: Static or environment-sourced keys (e.g., X-API-Key)
  • OAuth: Static token or runtime fetch with configurable HTTP GET/POST body and response parsing
  • Azure AD: Client credentials flow for enterprise identity

Tokens are fetched asynchronously and reused per source.

⚑ Async Concurrency

  • Asyncio/httpx end-to-end
  • Global concurrency limit plus per-source limits
  • Range scrapes can split into sub-windows and run in parallel within limits
  • Stay within rate caps while scraping multiple systems

🧹 Filtering & Volume Control

  • Drop rules: Exclude records matching conditions
  • Keep rules: Only include records matching conditions
  • Per-scrape caps: Limit records emitted per execution
  • Protects metrics backends and logging costs from noisy sources

πŸ”„ Delta Detection via Fingerprinting

  • Fingerprints stored in SQLite or Valkey (Redis-compatible)
  • Configurable TTL and fingerprint keys/modes
  • Historical scrapes and frequent "last N hours" polls without duplicate spam
  • Scheduler/last-success share the same backend

πŸ“Š Metrics Mapping

  • Gauges, counters, histograms from dataKey or fixedValue
  • Attributes can emit counters via asMetric
  • Per-source logs with configurable emission
  • Severity mapping from record fields
  • Labels derived from attributes and optional metric labels

πŸ“ Log Emission with Severity Mapping

  • Records become OTEL logs with severity derived from a configured field
  • Attributes align with metrics for easy pivots in observability tools
  • Per-source opt-out for logs where they're not needed

βš–οΈ When to Use

βœ… Perfect For:

  • Metrics/logs about business processes only available as API responses
  • Adding new sources to an existing OTEL collector
  • Complex auth (OAuth, Azure AD) and time windows (historical backfills, relative ranges)
  • Data deduplication and volume control

❌ Not Needed For:

  • Systems already emitting OTLP or Prometheus natively
  • Simple uptime checks (use the collector's httpcheckreceiver)
  • One-off custom exporters for specific vendors

πŸš€ Quick Concepts

Sources

A source is a single API endpoint to scrape. Each source:

  • Has a name and frequency (how often to poll)
  • Uses an auth strategy (or none)
  • Defines scrape mode (instant or range-based)
  • Specifies how to extract records from the response (via dataKey)
  • Maps records to metrics and logs

Scrape Modes

  • Instant: Snapshot at a point in time. No time windows involved.
  • Range: Scrape a time range (e.g., "last 15 minutes"). Supports parallel sub-windows for efficiency.

Fingerprinting & Deduplication

Each record is fingerprinted (MD5 hash). On scrape:

  1. Extract records from API
  2. Pass through filters (drop/keep rules)
  3. Check fingerprint store: hit = skip (seen before), miss = emit
  4. Store new fingerprints with TTL

Prevents duplicate metrics while enabling historical backfills.

Self-Telemetry

When enabled, API2OTEL emits its own metrics about scraping health:

  • Scrape duration and success/error rates
  • Deduplication hit/miss rates
  • Cleanup job performance

Monitor the scraper itselfβ€”not just the data it extracts.

πŸ—οΈ Architecture at a Glance

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          Configuration (YAML)                    β”‚
β”‚  - Sources, auth, metrics, filters, attributes   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚        Scheduler (APScheduler)                   β”‚
β”‚  - Frequency-based job scheduling                β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      Scraper Engine (AsyncIO)                    β”‚
β”‚  - HTTP fetching, window calculation, concurrencyβ”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Record Pipeline                               β”‚
β”‚  - Filtering, limits, delta detection            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      Telemetry (OTEL SDK)                        β”‚
β”‚  - Metrics (gauges, counters, histograms)        β”‚
β”‚  - Logs with severity mapping                    β”‚
β”‚  - Self-telemetry (optional)                     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                       ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    OTLP Exporters (gRPC or HTTP)                 β”‚
β”‚  - Send to OpenTelemetry Collector               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Ready to turn your APIs into observable signals? Let's go! πŸš€