Documentation

Sovereign Architecture

Deep dive into the Atlas Runtime environment and how we guarantee data sovereignty at every layer.

Last updated on February 2, 2026

Design Principles

Atlas is built on four non-negotiable principles that inform every architectural decision—from network topology to GPU scheduling.

Zero Data Egress

No data leaves the deployment boundary. Ever. Not for telemetry, not for error reporting, not for model improvement.

Defense in Depth

mTLS, RBAC, and infrastructure activity journaling at every layer—not just the perimeter.

Hardware Isolation

Dedicated GPU memory per tenant. No shared caches, no shared VRAM, no cross-tenant inference.

Operational Visibility

Cryptographically signed activity journal entries for operational visibility.

The Zero-Trust Enclave

Unlike traditional API providers that process data in a shared, multi-tenant public cloud, MX4 Atlas is built on a "Sovereign Enclave" architecture. The entire inference stack—from the load balancer to the GPU memory—runs within a strictly defined security boundary.

Data Flow Diagram

Every request passes through five security layers before reaching the model

Client

mTLS 1.3 + HMAC

Gateway

Runtime

Air-Gapped Boundary

① API Key Validation & RBAC

② Residency Router

③ Rate Limiter & Guardrails

④ Activity Journal

⑤ Model Inference (GPU)H100 Cluster

Storage

Encrypted Activity Store (AES-256-GCM)Tamper-Proof

Runtime Components

The Atlas Runtime is a self-contained inference engine. These are the key subsystems that handle every request.

API Gateway

<2ms overhead

Terminates mTLS, validates API keys, enforces tenant-level RBAC policies

Stack: Envoy proxy with custom auth filter

Residency Router

~5ms per request

Routes sensitive workflows to local models to keep data within the boundary

Stack: Regex + NER model pipeline with Arabic morphology support

Guardrail Layer

~8ms per request

Routing rules for residency, rate limits, and workload isolation

Stack: Classifier ensemble + rule engine

Inference Engine

TTFT <200ms (Atlas Core, 8B)

Model loading, batching, KV-cache management, and GPU scheduling across multi-GPU clusters

Stack: vLLM-based serving with continuous batching

Activity Journal

Async, zero impact on inference

Writes cryptographically signed activity records for infrastructure events

Stack: Append-only log with SHA-256 chain

Deployment Models

Atlas deploys exclusively on customer infrastructure. Choose a deployment model based on your connectivity and sovereignty requirements.

Private Cloud

Deploy Atlas Runtime into your existing VPC (AWS, Azure, GCP, Oracle). Full stack on your infrastructure, managed by your team.

Your VPCZero PeeringFull Control

Terraform / Helm deployment in < 2 hours
Auto-scaling based on GPU utilization
Integrated with your existing IAM & secrets management

Air-Gapped

Physical deployment on your own hardware with zero internet connectivity. Maximum sovereignty for classified environments.

On-PremiseNo EgressComplete Isolation

USB-based model distribution & updates
Hardware Security Module (HSM) key storage
Offline license validation via signed tokens

Atlas vs. Traditional API Providers

How a sovereign architecture differs from mainstream LLM API services.

Characteristic	MX4 Atlas	Typical Cloud API
Data residency	Customer-controlled	Provider region
Tenancy	Single-tenant or dedicated GPU	Multi-tenant shared
Network egress	Zero — air-gap capable	Required for every call
Activity journal	Local append-only journal	Provider-managed logs
Data boundary control	Local routing rules	Sent to cloud
Model updates	Customer-approved via USB/private	Auto-updated by provider
Deployment control	Customer-owned stack	Provider-managed

Data Sovereignty Guarantees

Atlas enforces data sovereignty at the infrastructure level—your data never leaves your boundary.

Data Residency: All processing occurs on your infrastructure—cloud, on-prem, or air-gapped. No cross-border data movement.
Infrastructure Activity Journal: Cryptographically signed SHA-256 chain for operational visibility and integrity.
Zero External Calls: No outbound connections to MX4 or third parties during inference. No telemetry, no phone-home, no metrics collection.
Encryption at Every Layer: TLS 1.3 in transit, AES-256-GCM at rest. Model weights encrypted with customer-managed keys (BYOK).
No Training on Your Data: Inference data is never used for model improvement. Zero data retention post-response unless you configure local activity journaling.

Recommended Hardware

Atlas Runtime is optimized for NVIDIA H100 and A100 GPUs. Minimum specifications for production deployments:

Component	Minimum	Recommended
GPU	2× NVIDIA A100 80GB	4× NVIDIA H100 80GB
CPU	32 vCPUs	64 vCPUs (AMD EPYC)
RAM	256 GB	512 GB
Storage	1 TB NVMe SSD	2 TB NVMe RAID-1
Network	10 Gbps	25 Gbps (RDMA for multi-node)

What Atlas Does Not Do

Clarity about boundaries is as important as feature lists.

Atlas does not send telemetry or crash reports to MX4 servers.
Atlas does not auto-update models — all updates require explicit customer approval.
Atlas does not share GPU memory between tenants in any deployment mode.
Atlas does not retain prompt or completion data after response delivery.
Atlas does not require internet access for inference — air-gap is a first-class mode.