Native Arabic intelligence. Built, not translated.
Most "Arabic" AI is just English models forced to translate. MX4 Atlas is different. We rebuild open-source foundations with native Arabic tokenization and cultural alignment, delivering the MENA region's most capable and compliant LLMs.
The MX4 Methodology
From Generalist to Specialist
Generic models treat Arabic as a second-class citizen. We rebuild them from the token level up.
Open Source Base
We start with world-class open weights models (Llama 3, Mistral) as our cognitive engine.
- 7B-70B Parameters
- English Fluency
- Reasoning Core
Vocabulary Expansion
We reconstruct the tokenizer, adding 20,000+ native Arabic tokens to reduce fragmentation.
- +250% Efficiency
- Native Script Support
- Dialect Coverage
Continued Pre-training
Injecting 100 Billion tokens of high-quality Arabic data (Modern Standard & Dialects).
- Regional History
- Legal Frameworks
- Cultural Nuance
Cultural Fine-Tuning
Instruction tuning and RLHF specifically designed for MENA cultural and ethical values.
- Sovereign-ready
- Safety tuning
- Regional Values
Performance Metrics
Sovereign, Yet Superior
MX4 Atlas outperforms standard open-source models on Arabic tasks and rivals proprietary clouds.
Linguistic Diversity
One Model, Many Voices
The Arab world is not a monolith. MX4 Atlas is the first foundational model trained on a balanced corpus of Modern Standard Arabic and regional dialects.
From formal government decrees in MSA to customer service chatbots in Saudi dialect, we cover the full spectrum of communication.
Challenge
Why standard models fail
Standard models (like GPT-4 or Llama base) chop Arabic words into many small, meaningless fragments. This increases cost, latency, and hallucination rates.
The MX4 Solution: We expanded the vocabulary by 20,000+ native tokens. Our models "see" whole Arabic words, not just letters.
- Standard Model
- 4.2 Tokens
- per Arabic word
- MX4 Atlas
- 1.6 Tokens
- per Arabic word
Open Source
Powered by open source
We don't reinvent the wheel; we reinforce it. By building upon the world's best open-weights models—Meta's Llama 3, Mistral, and others—we focus our energy on the last mile: Cultural Alignment and Sovereign Deployment.
Deploy Arabic-First AI
Bring sovereign Arabic intelligence on-prem in weeks, not months.
Talk to MX4 Atlas specialists to scope a dialect-focused deployment, benchmarks, and data residency plan.