We are now part of the NVIDIA Inception Program.Read the announcement
Atlas Core API v1.0

The secure gateway for Sovereign AI

OpenAI-compatible API gateway for sovereign models. Route, secure, and observe every request from one endpoint.

POST /v1/chat/completions HTTP/1.1
Host: api.internal.mx4
Authorization: Bearer mx4_sk_...
Content-Type: application/json

{
  "model": "auto-router",
  "messages": [{"role": "user", "content": "..."}]
}

CORE API

Built for sovereign production

Routing and observability baked in for enterprise deployments.

Unified Interface

One endpoint for Falcon, Jais, Llama, Qwen, and your fine-tuned checkpoints.

Predictable Costs

Per-team budgets, token caps, and attribution you can review.

Activity Journal

Local infrastructure activity journal with residency evidence.

Low Latency

On-prem inference next to your data, not across oceans.

GET STARTED

Launch in 3 steps

Your first request in under five minutes.

Step 11 min

Provision an API key

Create team-scoped keys, rotate on schedule, and set IP allowlists.

Step 22 min

Point your base URL

Use the OpenAI SDKs and switch `baseURL` to your Atlas endpoint.

Step 32 min

Send your first prompt

Pick a model or let routing handle cost, latency, and sovereignty.

INTEGRATION

Drop-in compatibility

Switch in minutes. Point `baseURL` to Atlas and keep the same OpenAI tooling.

Client Integration (Node.js / Python)TypeScript
import OpenAI from 'openai';

// 1) Point the OpenAI client to Atlas
const client = new OpenAI({
  apiKey: 'mx4_sk_live_...',
  baseURL: 'https://api.internal.mx4/v1',
});

// 2) Use standard Chat Completions
const completion = await client.chat.completions.create({
  model: 'mx4-atlas-core',
  messages: [
    { role: 'user', content: 'لخص هذا التقرير المالي' }
  ],
  // Optional Atlas routing hints
  extra_body: {
    routing_preference: 'cost', // cost | performance | balanced
  }
});

console.log(completion.choices[0].message);

Supported Frameworks

LangChain
LlamaIndex
Semantic Kernel
AutoGen
Vercel AI SDK
Flowise

Supported Models

Falcon-7B7B, 40BArabic Native
JAIS-13B13BArabic Native
Llama 3.18B, 70BEnglish / Code
Qwen-1.5-7B14B, 72BMultilingual

DEVELOPER TOOLS

Official SDKs

Start fast with officially supported client libraries.

🐍

Python SDK

pip install mx4-atlasv1.2.4
📦

Node.js SDK

npm install @mx4/atlasv1.1.0
🐹

Go SDK

go get github.com/mx4/atlas-gov0.9.8

Java SDK

maven: com.mx4.atlasv1.0.1

ROUTING

Intelligent model routing

Rule-driven routing across cost, performance, language, and residency requirements.

  • Cost Optimization

    Route simple queries to smaller, cheaper models like Llama 3 8B.

  • Data Sovereignty

    Ensure sensitive data stays on-prem by forcing local routes.

  • Language Specialization

    Automatically route Arabic prompts to Jais or Falcon-7B.

routing_config.yamlYAML
routes:
  # Rule 1: Route Arabic to specialized model
  - name: "arabic-native"
    condition: "language == 'ar'"
    model: "jais-13b"
    
  # Rule 2: Cost optimization for simple queries
  - name: "fast-path"
    condition: "prompt_tokens < 100"
    model: "llama-3-8b-quantized"

API REFERENCE

Core endpoints

Production-ready endpoints with OpenAI-compatible semantics.

POST/v1/chat/completions
Standard chat generation with intelligent routing
POST/v1/embeddings
Vector embeddings (Multilingual-e5, Jais-embed)
POST/v1/documents/upload
Ingest PDF/DOCX into sovereign vector store
POST/v1/rag/query
Retrieve context with citations and confidence scores
GET/v1/models
List available sovereign models (Falcon, Jais, Llama)
POST/v1/activity/logs
Retrieve infrastructure activity journal entries

Security & sovereignty headers

HeaderDescription
X-Sovereign-IDUnique identifier for the sovereign enclave processing the request.
X-Audit-Trace-IDTrace ID linking the request to the infrastructure activity journal.
X-Route-PolicyRouting policy applied (e.g., local-only, latency-optimized).
X-Data-ResidencyConfirmed location of data processing (e.g., "KSA-Riyadh-Local").

Standard Error Codes

CodeMeaning
400 Bad RequestInvalid input or malformed JSON.
401 UnauthorizedInvalid or missing API key.
403 ForbiddenInsufficient permissions (RBAC) or data sovereignty rule violation.
429 Too Many RequestsRate limit exceeded (per-key or per-IP).
451 Unavailable For Legal ReasonsBlocked by Sovereignty Guard (e.g., data residency violation).

SECURITY

Authentication

API Key Authentication

All requests include a valid API key. Keys are scoped by team and tracked for usage and quotas.

Authentication ExampleBash
curl -X POST https://api.mx4.ai/v1/chat/completions \
  -H "Authorization: Bearer mx4_sk_live_abc123..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "falcon-h1-7b",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Key Management

  • Create multiple keys per team for rotation
  • Set expiration dates and IP allowlists
  • Monitor usage per key in real-time
  • Revoke compromised keys instantly

mTLS (Enterprise)

  • Mutual TLS for zero-trust environments
  • Certificate-based authentication
  • Integrates with internal PKI
  • Required for air-gapped deployments
Sovereignty-grade logging

OBSERVABILITY

Infrastructure activity journal

Requests can be logged locally with a cryptographic hash chain and residency evidence.

Append-only

Entries are stored locally with hash-chaining for integrity.

Local retention

You control retention, access, and export on your infrastructure.

{
  "timestamp": "2026-02-04T14:23:45Z",
  "request_id": "req-uuid-12345",
  "actor_id": "user-xyz",
  "action": "chat.completions",
  "model": "falcon-h1-7b",
  "prompt_hash": "sha256:a3f7d2...",
  "routing_policy": "sovereign_enforcement",
  "residency_boundary": "UAE On-Prem",
  "journal_hash": "sha256:8b12c4...",
  "status": "success"
}

RATE LIMITS

Rate limits & quotas

Control spend with flexible limits per API key, team, or endpoint.

PlanRequests/minTokens/dayBurst
Development60100K2x for 10s
Professional60010M5x for 30s
EnterpriseCustomCustomCustom

Rate Limit Headers

Every response includes headers showing your current limits and remaining quota:

X-RateLimit-Limit: 600
X-RateLimit-Remaining: 542
X-RateLimit-Reset: 1675432800

WEBHOOKS

Webhooks (beta)

Real-time notifications for async operations, routing rules, and cost thresholds.

Event Types

request.completed
Long-running request finished
routing.rule
Routing rule triggered
cost.threshold
Budget limit approached
model.fallback
Primary model failed, fallback used

Webhook Payload

Example PayloadJSON
{
  "event": "guardrail.violation",
  "timestamp": "2026-02-04T14:30:00Z",
  "request_id": "req_abc123",
  "details": {
    "violation_type": "pii_detected",
    "entities": ["email", "phone"],
    "action_taken": "request_blocked"
  },
  "metadata": {
    "team_id": "team_xyz",
    "user_id": "user_789"
  }
}

VERSIONING

Versioning & changelog

Current Version: v1.0

Atlas follows semantic versioning. Minor updates are backwards compatible; major versions require migration.

All endpoints stable and production-ready

Recent Changes

v1.0.3

2026-02-01

  • Added Webhooks beta
  • Improved activity journal export
  • Fixed routing preference header edge case

v1.0.2

2026-01-15

  • New models: Qwen 2.5 72B
  • Streaming support for RAG endpoints
  • Performance improvements

v1.0.1

2026-01-01

  • Launch: Chat completions API
  • Document ingestion pipeline
  • mTLS authentication

FAQ

API FAQs

Common questions about integrating with Atlas.

Can I use the OpenAI Python library directly?

Yes! Just change the base_url to your Atlas instance and use your Atlas API key. Our API is 100% compatible with OpenAI's chat completions format.

How do I handle streaming responses?

Set stream: true in your request. Atlas returns Server-Sent Events (SSE) compatible with OpenAI's streaming format.

What happens if my quota is exceeded?

You'll receive a 429 error with X-RateLimit-Reset header indicating when your quota resets. Enterprise plans include burst allowances.

Can I test the API without deploying on-prem?

Yes. We offer a managed sandbox environment for development and testing. Contact sales for access.

How are embeddings priced differently from chat?

Embeddings are charged per million tokens at 1/10th the cost of chat completions for the same model size.