API Documentation
One key, one base URL — every model. Set base_url to https://api.routeall.ai/v1, swap in your sk-ra- key, and your existing OpenAI SDK code runs against every supported model.
https://api.routeall.ai/v1·Keysk-ra-...Introduction
RouteAll exposes one unified, OpenAI-compatible API in front of many upstream providers (OpenAI, Anthropic, Gemini, DeepSeek and more). You never juggle per-provider keys or endpoints: pick any model from the marketplace, call it by its canonical name through the same base URL, and requests are routed by priority, weight and health with automatic failover. Prefer the Anthropic or Gemini SDKs? Native-format endpoints are built in too.
Authentication
Create an API key in Console → API Keys (shown once, prefixed sk-ra-). Send it as a Bearer token in the Authorization header. One key works for every model and every endpoint format. Keys are scoped to your account and user group; never embed them in client-side code.
Authorization: Bearer sk-ra-your-keyThree-line integration
Point your OpenAI SDK base_url at https://api.routeall.ai/v1 and set api_key to your key — that is the whole migration. Then call /v1/chat/completions exactly as you would with OpenAI, switching models by name only.
curl https://api.routeall.ai/v1/chat/completions \
-H "Authorization: Bearer sk-ra-..." \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-chat",
"messages": [{"role": "user", "content": "Hello"}]
}'Chat completions
/v1/chat/completionsPOST /v1/chat/completions is request-for-request identical to OpenAI. Pass a model name (a unified canonical name routed to whichever upstream can serve it) and a messages array.
curl https://api.routeall.ai/v1/chat/completions \
-H "Authorization: Bearer sk-ra-..." \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-chat",
"messages": [{"role": "user", "content": "Hello"}]
}'Streaming
/v1/chat/completions (stream)Set stream: true to receive Server-Sent Events. Each event is an OpenAI-style chunk with a delta; the stream ends with data: [DONE]. Billing settles on the actually produced tokens.
stream = client.chat.completions.create(
model="deepseek-chat",
messages=[{"role": "user", "content": "Hello"}],
stream=True,
)
for chunk in stream:
print(chunk.choices[0].delta.content or "", end="")Native Anthropic & Gemini
/v1/messages · /v1beta/...Already using the Anthropic or Gemini SDK? Call POST /v1/messages (Anthropic Messages) or POST /v1beta/models/{model}:generateContent (Gemini) and keep your existing code — the gateway converts to/from the internal format and bills the same way.
Anthropic · /v1/messages
from anthropic import Anthropic
client = Anthropic(base_url="https://api.routeall.ai", api_key="sk-ra-...")
msg = client.messages.create(
model="claude-sonnet-4",
max_tokens=256,
messages=[{"role": "user", "content": "Hello"}],
)
print(msg.content[0].text)Gemini · /v1beta
curl "https://api.routeall.ai/v1beta/models/gemini-1.5-pro:generateContent" \
-H "x-goog-api-key: sk-ra-..." -H "Content-Type: application/json" \
-d '{"contents":[{"role":"user","parts":[{"text":"Hello"}]}]}'Parameters
Common parameters: temperature (0–2), top_p (0–1), max_tokens, stop. Support varies by model — reasoning models may ignore temperature; vision models accept image parts. Unknown parameters are passed through to the upstream where safe.
Pricing: per-token, cache & per-call
Most models bill per token. When an upstream serves cached input tokens, they are billed at the model’s cache-read rate (cache creation at the cache-write rate) — header X-RouteAll-Usage-Cached-Tokens reports the hit count. Reasoning models that emit thinking tokens bill those at the model’s reasoning rate (or the output rate when none is set); your usage page reports reasoningTokens. Some models also bill per call: image generation charges per image (POST /v1/images/generations, header X-RouteAll-Usage-Calls), and web_search-style tools add a per-search fee on top of tokens. Your price = official price × your user-group multiplier; cost is never exposed.
X-RouteAll-Usage-Prompt: 972
X-RouteAll-Usage-Cached-Tokens: 896
X-RouteAll-Charge-Credit: 74Errors
Errors use the OpenAI envelope: { "error": { "type", "message", "code" } }. 401 = bad/missing key, 402 = insufficient balance, 429 = rate limited, 503 = no available channel. Retryable upstream errors fail over silently.
{ "error": { "type": "insufficient_balance", "message": "Insufficient balance", "code": 402 } }Rate limits
Requests are rate-limited per API key (requests per minute). Exceeding the limit returns 429 in OpenAI style. Keep concurrency reasonable and back off on 429.