Chat Completions
Generate responses for a given conversation.
Creates a model response for the given chat conversation. This endpoint is fully compatible with the OpenAI Chat Completions API.
Request Body
ID of the model to use. See the Model Garden for available options (e.g., 'mx4-atlas-core').
A list of messages comprising the conversation so far.
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
The maximum number of tokens that can be generated in the chat completion. The total length of input tokens and generated tokens is limited by the model's context length.
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
If set, partial message deltas will be sent. Tokens will be sent as data-only server-sent events as they become available.
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text, decreasing the model's likelihood to repeat the same line verbatim.
Use Cases
Q&A Systems
Build smart question-answering systems, customer support chatbots, and knowledge base assistants.
Translation
Translate content between Arabic dialects and other languages with cultural context awareness.
Summarization
Create concise summaries of long documents, reports, and conversations in real-time.
Content Generation
Generate contextual responses for creative writing, code completion, and content creation.
Example Request
1curl https://api.mx4.ai/v1/chat/completions \2 -H "Content-Type: application/json" \3 -H "Authorization: Bearer $MX4_API_KEY" \4 -d '{5 "model": "mx4-atlas-core",6 "messages": [7 {8 "role": "system",9 "content": "You are a helpful assistant specialized in Arabic language and MENA markets."10 },11 {12 "role": "user",13 "content": "اشرح لي كيفية استخدام الذكاء الاصطناعي في التجارة الإلكترونية"14 }15 ],16 "temperature": 0.7,17 "max_tokens": 50018 }'
Example Response
1{2 "id": "chatcmpl-123",3 "object": "chat.completion",4 "created": 1677652288,5 "model": "mx4-atlas-core",6 "system_fingerprint": "fp_44709d6fcb",7 "choices": [{8 "index": 0,9 "message": {10 "role": "assistant",11 "content": "يمكن استخدام الذكاء الاصطناعي في التجارة الإلكترونية بعدة طرق... "12 },13 "logprobs": null,14 "finish_reason": "stop"15 }],16 "usage": {17 "prompt_tokens": 42,18 "completion_tokens": 156,19 "total_tokens": 19820 }21}
Streaming Example
Enable streaming to receive tokens in real-time as they are generated, perfect for responsive chat interfaces and live content generation.
1import openai2import os34client = openai.OpenAI(5 api_key=os.getenv("MX4_API_KEY"),6 base_url="https://api.mx4.ai/v1"7)89# Stream chat completion10with client.chat.completions.create(11 model="mx4-atlas-core",12 messages=[13 {"role": "system", "content": "You are a helpful assistant."},14 {"role": "user", "content": "اكتب مقالة قصيرة عن الابتكار"}15 ],16 stream=True,17 temperature=0.718) as stream:19 for text in stream.text_stream:20 print(text, end="", flush=True)
Best Practices
System Prompts
Use clear system prompts to define the assistant's behavior, tone, and expertise level for consistent responses.
Context Management
Maintain conversation history but trim old messages to stay within token limits and reduce costs.
Temperature Tuning
Use lower temperature (0.3-0.5) for factual tasks, higher (0.8-1.0) for creative generation.
Streaming for UX
Implement streaming for chat applications to show token-by-token generation and improve perceived responsiveness.
Troubleshooting
Tokens Exceeded
Model responds with "context_length_exceeded" error when messages exceed model's max_tokens limit.
Solution: Reduce conversation history, use summarization for old messages, or increase max_tokens parameter.
Rate Limit Errors
API returns 429 status code when rate limits are exceeded.
Solution: Implement exponential backoff retry logic, or upgrade plan for higher limits. See Rate Limits documentation.
Inconsistent Responses
Same prompt produces different outputs across requests.
Solution: Lower temperature for consistent outputs (recommend 0.0-0.3), or set top_p to control diversity.