Global Configuration Reference¶
This document describes all global-level configuration options under the scraper section. These settings control the overall behavior of the API-to-OTEL scraper controller.
Table of Contents¶
- Telemetry and Observability
- Service Configuration
- Time and Formatting
- Concurrency Control
- Fingerprint Store (Delta Detection)
- Source-Level Overrides
Telemetry and Observability¶
enableSelfTelemetry¶
- Type:
boolean - Default:
false - Source Override: ❌ No
- Description: When enabled, the controller generates telemetry (metrics and logs) about its own operation and sends them to the configured OTEL collector. This includes internal metrics like scrape durations, error counts, and resource usage.
- Use Case: Enable this in production to monitor the health and performance of the scraper itself.
serviceName¶
- Type:
string - Default:
"otel-api-scrapper" - Source Override: ✅ Yes (via
namefield per source) - Description: The default service name used in OTEL resource attributes when emitting telemetry. Each API source can override this with its own
namefield, which becomes the service name for that source's telemetry. - Use Case: Set a meaningful global name for your scraper deployment. Individual sources will use their own names to distinguish telemetry streams.
logLevel¶
- Type:
string - Default:
"debug" - Options:
debug,info,warn,error - Source Override: ❌ No
- Description: Controls the verbosity of controller logs. This affects logs from the scraper framework itself, not the logs generated from API data (which are controlled per source).
- Use Case: Use
debugduring development,infoorwarnin production.
dryRun¶
- Type:
boolean - Default:
false - Source Override: ❌ No
- Description: When enabled, the scraper will NOT send any telemetry to the OTEL collector. Instead, it logs what would have been emitted (metrics and logs) to stdout for validation and debugging. No OTEL exporters are initialized.
- Use Case: Test your configuration and see what metrics/logs would be generated without actually sending them.
Service Configuration¶
otelCollectorEndpoint¶
- Type:
string - Required: ✅ Yes
- Source Override: ❌ No
- Description: The endpoint URL of your OpenTelemetry collector where all generated metrics and logs are sent.
- Examples:
"http://otel-collector:4317"(gRPC in Docker network)"http://localhost:4318"(HTTP on host machine)"https://otel-collector.example.com:4317"(TLS-enabled)- Use Case: Point to your OTEL collector. Use container DNS names when running in Docker,
localhostwhen running on the host.
otelTransport¶
- Type:
string - Default:
"grpc" - Options:
grpc,http - Source Override: ❌ No
- Description: The transport protocol to use for OTLP exports. gRPC is more efficient; HTTP is more firewall-friendly.
- Use Case: Use
grpcfor better performance (port 4317),httpif gRPC is blocked (port 4318).
enforceTls¶
- Type:
boolean - Default:
true - Source Override: ❌ No
- Description: When
true, enforces TLS/SSL when connecting to the OTEL collector. Set tofalsefor local development withhttp://endpoints. - Use Case: Set to
falsefor local testing,truein production with HTTPS endpoints.
terminateGracefully¶
- Type:
boolean - Default:
true - Source Override: ❌ No
- Description: When the scraper receives a shutdown signal, it waits for in-flight scrapes to complete before exiting if this is
true. Iffalse, it terminates immediately. - Use Case: Keep as
trueto avoid data loss. Set tofalseif you need fast shutdowns.
servicePort¶
- Type:
integer - Default:
80 - Source Override: ❌ No
- Description: The HTTP port on which the scraper exposes its admin API, health checks, and other HTTP endpoints.
- Use Case: Change this if port 80 conflicts with other services (e.g., use 8080).
enableAdminApi¶
- Type:
boolean - Default:
false - Source Override: ❌ No
- Description: Enables the admin REST API which allows you to trigger manual scrapes, list configured sources, and view the effective configuration at runtime.
- Use Case: Enable this for operational control. Secure it with
adminSecretEnv.
adminSecretEnv¶
- Type:
string - Required: Only when
enableAdminApi: true - Source Override: ❌ No
- Description: Name of the environment variable containing the secret token for authenticating admin API requests.
- Example:
"ADMIN_SECRET"→ the scraper readsos.environ["ADMIN_SECRET"] - Use Case: Set this to secure your admin API. Clients must send this token in requests.
Time and Formatting¶
defaultTimeFormat¶
- Type:
string(Pythonstrftimeformat) - Default:
"%Y-%m-%dT%H:%M:%S%z" - Source Override: ✅ Yes (via
scrape.timeFormatorrangeKeys.dateFormat) - Description: The default format string used to format datetime values when making API requests (start/end times). This is a Python
strftimeformat string. - Examples:
"%Y-%m-%dT%H:%M:%S%z"→2025-11-28T10:15:00+0000"%Y-%m-%d %H:%M:%S"→2025-11-28 10:15:00"%s"→ Unix timestamp (seconds since epoch)- Use Case: Set this to match what your APIs expect. Override per source if different APIs need different formats.
Concurrency Control¶
allowOverlapScans¶
- Type:
boolean - Default:
false - Source Override: ✅ Yes (via source-level
allowOverlapScans) - Description: Global default for whether scrapes can overlap. If
false, a source will not start a new scrape while a previous one is still running. Each source can override this. - Use Case: Set to
trueif your APIs can handle overlapping requests. Override per source for fine-grained control.
maxGlobalConcurrency¶
- Type:
integer - Default:
10 - Source Override: ❌ No (but sources have their own limits)
- Description: Maximum number of concurrent HTTP requests allowed across ALL sources at any given time. This is a hard global limit enforced by the controller.
- Use Case: Prevent the scraper from overwhelming your network or downstream APIs. Adjust based on your infrastructure capacity.
defaultSourceConcurrency¶
- Type:
integer - Default:
4 - Source Override: ✅ Yes (via
scrape.maxConcurrency) - Description: Default maximum number of concurrent HTTP requests per individual source. Each source can override this with its own
scrape.maxConcurrencysetting. - Use Case: Set a reasonable default. Override for sources that can handle higher concurrency or need to be throttled.
Fingerprint Store (Delta Detection)¶
The fingerprint store is used for delta detection - preventing duplicate records from being emitted when the same data is scraped multiple times.
fingerprintStore.backend¶
- Type:
string - Default:
"sqlite" - Options:
sqlite,valkey,redis - Source Override: ❌ No
- Description: The storage backend for fingerprints.
sqlite: Local file-based storage, suitable for single-instance deploymentsvalkey/redis: External Valkey/Redis instance, suitable for distributed/HA deployments- Use Case: Use
sqlitefor simple deployments. Usevalkey/redisif running multiple scraper instances that need to share fingerprint state.
fingerprintStore.maxEntriesPerSource¶
- Type:
integer - Default:
50000 - Source Override: ✅ Yes (via
deltaDetection.maxEntries) - Description: Maximum number of fingerprints to store per source. When this limit is reached, older fingerprints are evicted using an LRU policy.
- Use Case: Adjust based on your data volume. Higher values mean longer duplicate detection windows but more memory/storage usage.
fingerprintStore.defaultTtlSeconds¶
- Type:
integer - Default:
86400(24 hours) - Source Override: ✅ Yes (via
deltaDetection.ttlSeconds) - Description: How long fingerprints are kept before being eligible for cleanup. After TTL expires, duplicates may be emitted again.
- Use Case: Set based on how long you want to suppress duplicates. 24 hours is reasonable for daily scrapes.
fingerprintStore.cleanupIntervalSeconds¶
- Type:
integer - Default:
3600(1 hour) - Source Override: ❌ No
- Description: How often the background cleanup job runs to remove expired fingerprints.
- Use Case: Lower values = more frequent cleanup (less storage) but more overhead. Adjust based on your TTL and data volume.
fingerprintStore.lockRetries¶
- Type:
integer - Default:
5 - Source Override: ❌ No
- Description: Number of retries when SQLite reports "database is locked" (applies to sqlite backend only). This helps prevent failures when multiple processes access the database concurrently.
- Use Case: Increase this value if you experience lock contention with SQLite. Set to 0 to disable retries.
fingerprintStore.lockBackoffSeconds¶
- Type:
float - Default:
0.1(100ms) - Source Override: ❌ No
- Description: Initial backoff in seconds between lock retries. The backoff doubles exponentially up to 1 second maximum.
- Use Case: Adjust this if you need to tune retry timing. Lower values retry faster, higher values reduce CPU spinning.
fingerprintStore.sqlite.path¶
- Type:
string - Default:
"./scraper_fingerprints.db" - Source Override: ❌ No
- Description: File path for the SQLite database (only used when
backend: sqlite). The file is created if it doesn't exist. - Use Case: Change this if you want to store the database in a different location (e.g., a persistent volume in Docker).
fingerprintStore.valkey.*¶
- Source Override: ❌ No
- Description: Valkey/Redis connection settings (only used when
backend: valkeyorredis). host: Redis server hostname/IPport: Redis server port (default 6379)db: Redis database index (default 0)password: Authentication password (optional)ssl: Enable SSL/TLS connection (default false)- Use Case: Configure these when using an external Redis/Valkey instance.
Source-Level Overrides¶
The following global settings can be overridden at the individual source level:
| Global Setting | Source-Level Override | Field Path |
|---|---|---|
serviceName | ✅ Yes | sources[].name |
defaultTimeFormat | ✅ Yes | sources[].scrape.timeFormat or sources[].scrape.rangeKeys.dateFormat |
allowOverlapScans | ✅ Yes | sources[].allowOverlapScans |
defaultSourceConcurrency | ✅ Yes | sources[].scrape.maxConcurrency |
fingerprintStore.maxEntriesPerSource | ✅ Yes | sources[].deltaDetection.maxEntries |
fingerprintStore.defaultTtlSeconds | ✅ Yes | sources[].deltaDetection.ttlSeconds |
Override Priority: Source-level settings always take precedence over global defaults when both are specified.
Example Configuration¶
scraper:
enableSelfTelemetry: true
serviceName: "my-api-scraper"
logLevel: "info"
otelCollectorEndpoint: "http://otel-collector:4318"
otelTransport: "http"
enforceTls: false
defaultTimeFormat: "%Y-%m-%dT%H:%M:%S%z"
maxGlobalConcurrency: 20
defaultSourceConcurrency: 5
enableAdminApi: true
adminSecretEnv: "SCRAPER_ADMIN_SECRET"
fingerprintStore:
backend: "sqlite"
maxEntriesPerSource: 100000
defaultTtlSeconds: 86400
cleanupIntervalSeconds: 3600
sqlite:
path: "/data/fingerprints.db"
sources:
- name: "my-api"
# This source overrides the default concurrency
scrape:
maxConcurrency: 10
# ... rest of source config