Documentation

Telemetry & Observability

Monitor request flow, model performance, and infrastructure health inside your deployment boundary.

Last updated on February 16, 2026

Atlas ships with structured logs, metrics, and optional tracing so your team can operate reliably at scale. Telemetry is generated locally and stays within your infrastructure unless you explicitly export it.

Signals You Can Track

Request Metrics

Latency, throughput, token counts, and routing decisions.

Infrastructure Health

GPU/CPU utilization, memory pressure, queue depth, and errors.

Model Behavior

Completion length, streaming usage, cache hits, and fallbacks.

Activity Journaling

The infrastructure activity journal records API usage, configuration changes, and operational events. Journaling is optional and retention is controlled by your policies.

Enable journaling for production audits, incident response, or cost attribution. Disable it for strict zero‑retention environments.

Exporting Metrics

Metrics and logs can be routed to your observability stack through deployment configuration. Export destinations vary by environment and security posture.

yaml

1telemetry:
2  metrics:
3    enabled: true
4    sink: "internal"
5  logs:
6    enabled: true
7    retention_days: 14
8  tracing:
9    enabled: false

Operational Dashboards

SLO Monitoring

Track p50/p95 latency per model
Monitor queue depth and backpressure
Detect abnormal error spikes

Cost & Efficiency

Token usage by team or API key
Routing distribution across models
Cache hit rate and batching efficiency

Data Retention

Telemetry data stays inside your infrastructure by default. Retention windows and export rules are set by your security and operations teams.