API Documentation

One key, one base URL — every model. Set base_url to https://api.routeall.ai/v1, swap in your sk-ra- key, and your existing OpenAI SDK code runs against every supported model.

Base URLhttps://api.routeall.ai/v1·Keysk-ra-...

Introduction

RouteAll exposes one unified, OpenAI-compatible API in front of many upstream providers (OpenAI, Anthropic, Gemini, DeepSeek and more). You never juggle per-provider keys or endpoints: pick any model from the marketplace, call it by its canonical name through the same base URL, and requests are routed by priority, weight and health with automatic failover. Prefer the Anthropic or Gemini SDKs? Native-format endpoints are built in too.

Authentication

Create an API key in Console → API Keys (shown once, prefixed sk-ra-). Send it as a Bearer token in the Authorization header. One key works for every model and every endpoint format. Keys are scoped to your account and user group; never embed them in client-side code.

Authorization: Bearer sk-ra-your-key

Three-line integration

Point your OpenAI SDK base_url at https://api.routeall.ai/v1 and set api_key to your key — that is the whole migration. Then call /v1/chat/completions exactly as you would with OpenAI, switching models by name only.

curl https://api.routeall.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-ra-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Chat completions

POST/v1/chat/completions

POST /v1/chat/completions is request-for-request identical to OpenAI. Pass a model name (a unified canonical name routed to whichever upstream can serve it) and a messages array.

curl https://api.routeall.ai/v1/chat/completions \
  -H "Authorization: Bearer sk-ra-..." \
  -H "Content-Type: application/json" \
  -d '{
    "model": "deepseek-chat",
    "messages": [{"role": "user", "content": "Hello"}]
  }'

Streaming

POST/v1/chat/completions (stream)

Set stream: true to receive Server-Sent Events. Each event is an OpenAI-style chunk with a delta; the stream ends with data: [DONE]. Billing settles on the actually produced tokens.

stream = client.chat.completions.create(
    model="deepseek-chat",
    messages=[{"role": "user", "content": "Hello"}],
    stream=True,
)
for chunk in stream:
    print(chunk.choices[0].delta.content or "", end="")

Native Anthropic & Gemini

POST/v1/messages · /v1beta/...

Already using the Anthropic or Gemini SDK? Call POST /v1/messages (Anthropic Messages) or POST /v1beta/models/{model}:generateContent (Gemini) and keep your existing code — the gateway converts to/from the internal format and bills the same way.

Anthropic · /v1/messages

from anthropic import Anthropic

client = Anthropic(base_url="https://api.routeall.ai", api_key="sk-ra-...")
msg = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=256,
    messages=[{"role": "user", "content": "Hello"}],
)
print(msg.content[0].text)

Gemini · /v1beta

curl "https://api.routeall.ai/v1beta/models/gemini-1.5-pro:generateContent" \
  -H "x-goog-api-key: sk-ra-..." -H "Content-Type: application/json" \
  -d '{"contents":[{"role":"user","parts":[{"text":"Hello"}]}]}'

Parameters

Common parameters: temperature (0–2), top_p (0–1), max_tokens, stop. Support varies by model — reasoning models may ignore temperature; vision models accept image parts. Unknown parameters are passed through to the upstream where safe.

Pricing: per-token, cache & per-call

Most models bill per token. When an upstream serves cached input tokens, they are billed at the model’s cache-read rate (cache creation at the cache-write rate) — header X-RouteAll-Usage-Cached-Tokens reports the hit count. Reasoning models that emit thinking tokens bill those at the model’s reasoning rate (or the output rate when none is set); your usage page reports reasoningTokens. Some models also bill per call: image generation charges per image (POST /v1/images/generations, header X-RouteAll-Usage-Calls), and web_search-style tools add a per-search fee on top of tokens. Your price = official price × your user-group multiplier; cost is never exposed.

X-RouteAll-Usage-Prompt: 972
X-RouteAll-Usage-Cached-Tokens: 896
X-RouteAll-Charge-Credit: 74

Errors

Errors use the OpenAI envelope: { "error": { "type", "message", "code" } }. 401 = bad/missing key, 402 = insufficient balance, 429 = rate limited, 503 = no available channel. Retryable upstream errors fail over silently.

{ "error": { "type": "insufficient_balance", "message": "Insufficient balance", "code": 402 } }

Rate limits

Requests are rate-limited per API key (requests per minute). Exceeding the limit returns 429 in OpenAI style. Keep concurrency reasonable and back off on 429.