CtlTower

Headless, OpenAI-compatible AI router. Point your existing OpenAI client at CtlTower and it dispatches each request to the cheapest capable model across Anthropic, OpenAI, Google, and xAI — with per-key routing preferences and automatic provider fallback. Same request shape, same response shape.

Base URL https://ctltower.com/v1 · OpenAI-compatible · no SDK required

Quickstart

CtlTower speaks the OpenAI Chat Completions API. If your code already callsapi.openai.com, change two things: the base URL and the API key. That's the whole integration — the fetch call is the API, there is no client library.

const res = await fetch('https://ctltower.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${process.env.CTLTOWER_API_KEY}`,
  },
  body: JSON.stringify({
    model: 'auto',                 // let CtlTower pick the model (see Model selection)
    messages: [{ role: 'user', content: 'Hello' }],
  }),
})

const data = await res.json()
console.log(data.choices[0].message.content)

The response is the standard OpenAI chat.completion object — parse data.choices[0].messageexactly as you would OpenAI's. Works with the official OpenAI SDKs too: set baseURL tohttps://ctltower.com/v1 and apiKey to your secret.

Authentication

Every request needs a bearer token: Authorization: Bearer <secret>. You do not create your own — the CtlTower operator mints a labeled key and hands you the secret. Store it as a private env var (convention: CTLTOWER_API_KEY) and never ship it client-side.

A missing or invalid bearer returns 401. The same secret works on every endpoint below.

Model selection

The model field controls routing. All forms are optional — omit it and CtlTower classifies the request for you.

omit, "auto", or ""

CtlTower’s classifier picks the right tier for the request.

a tier name — "simple", "moderate", "orchestrator"

Skip the classifier; route to that tier. Provider fallback chain still applies.

a qualified id — "anthropic/claude-opus-4-7"

Pin to exactly that provider+model. Single attempt, no fallback.

anything else

400 Bad Request with the list of valid values.

Tiers map to model classes (e.g. simple → Haiku/Flash-class,moderate → Sonnet/GPT-4o-class, orchestrator → Opus-class). Your operator may also set a default_tier ormin_tier on your key — ask them if you need a guaranteed floor.

Streaming

Pass stream: true for an OpenAI-compatible Server-Sent Events response. Chunks are data: {...} lines; the stream ends withdata: [DONE].

const res = await fetch('https://ctltower.com/v1/chat/completions', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${process.env.CTLTOWER_API_KEY}`,
  },
  body: JSON.stringify({ messages, stream: true }),
})

const reader = res.body.getReader()
const decoder = new TextDecoder()
for (;;) {
  const { done, value } = await reader.read()
  if (done) break
  // decoder.decode(value) → one or more "data: {...}\n\n" lines, OpenAI-shape
}

Resolved model + fallbacks ride in the first chunk's non-standard x_routing field (routing isn't known until the first chunk, so it can't be a header).
Streaming requests skip the classifier (latency would defeat the UX). Default tier is moderate; pass model: "orchestrator" or a qualified id for a stronger model.
Provider fallback happens only before the first byte. After bytes are sent, a mid-stream provider failure arrives as a final error chunk.

Tools (function calling)

Send tools in the standard OpenAI shape. CtlTower forwards them to whichever provider serves the request and returns tool_calls the same way OpenAI does. You execute the tool — CtlTower never runs it — then send the result back as a role: "tool"message on the next turn.

// Turn 1: model returns tool_calls
const data = await res.json()
const choice = data.choices[0]

if (choice.finish_reason === 'tool_calls') {
  messages.push(choice.message)                       // the assistant turn
  for (const call of choice.message.tool_calls) {
    const args = JSON.parse(call.function.arguments)
    const result = await yourTools[call.function.name](args)
    messages.push({
      role: 'tool',
      tool_call_id: call.id,
      content: JSON.stringify(result),
    })
  }
  // Turn 2: POST again with the appended messages (same tools array)
}

Tool-bearing requests automatically route to a capable model (Sonnet-class or better) — small models with tools are unreliable.
Echo the same tools array on every turn of the conversation, and include the full message history (CtlTower is stateless).
parallel_tool_calls: false is honored natively on Anthropic / OpenAI / Grok; Gemini truncates extras post-hoc.

Image input

User messages may carry multi-part content mixing text and images:

{
  role: 'user',
  content: [
    { type: 'text', text: 'What is in this image?' },
    { type: 'image_url', image_url: { url: 'data:image/png;base64,...' } },
  ],
}

OpenAI, Grok, and Anthropic accept both data: andhttps:// URLs. Gemini accepts data: URLs only — a Gemini-served request with an https:// image will fall through to the next provider in the chain.

Structured JSON output

response_format is supported in both forms:

response_format: { type: 'json_object' }
// or
response_format: {
  type: 'json_schema',
  json_schema: { name: 'my_schema', schema: { /* JSON Schema */ }, strict: true },
}

Honored natively on OpenAI / Grok / Gemini. Anthropic has no equivalent, so CtlTower synthesizes a forced tool internally and unwraps the JSON back into the response text — the caller sees a uniform JSON-string body regardless of which provider answered.

Embeddings

POST /v1/embeddings — OpenAI-compatible vector embeddings, through your existing bearer. input is a string or an array of strings:

const res = await fetch('https://ctltower.com/v1/embeddings', {
  method: 'POST',
  headers: {
    'Content-Type': 'application/json',
    'Authorization': `Bearer ${process.env.CTLTOWER_API_KEY}`,
  },
  body: JSON.stringify({
    input: ['first text', 'second text'],   // string or string[]
    // model: omit for the default, or "text-embedding-3-small", etc.
    // dimensions: 512,                       // optional (3-* models only)
  }),
})

const { data } = await res.json()
// data[i].embedding is the vector for input[i], in order.

Response is the standard OpenAI embeddings envelope:{ object: "list", data: [{ object, index, embedding }], model, usage }.encoding_format (float default, or base64) and dimensions pass through. Token-array input is not supported — send text.

Audio (STT + TTS)

Two more OpenAI-compatible endpoints, same bearer:

POST /v1/audio/transcriptions — speech to text

Multipart form: file + optional model /prompt / language / response_format. Returns { text } (or plain text for srt/vtt). Whisper-compatible.

POST /v1/audio/speech — text to speech

JSON: { input, voice, response_format?, speed? }. Returns raw audio bytes with the matching Content-Type. tts-1-compatible.

Response headers

Non-streaming responses carry routing metadata you can log or surface:

X-TskPilot-Request-Id

Unique per request. Log it to correlate with CtlTower’s server logs.

X-TskPilot-Resolved-Model

Which provider/model actually served (e.g. anthropic/claude-sonnet-4-6).

X-TskPilot-Complexity

The tier the request landed on (or "pinned" when you pinned a model).

X-TskPilot-Fallbacks

Present only when the chain fell through — comma-separated ids that failed first. Alert on this to catch provider degradation.

(Header names carry the X-TskPilot-prefix for historical reasons — CtlTower was formerly TskPilot. The names are stable; don't rely on them changing.)

Endpoints

POST /v1/chat/completionsBearer

OpenAI-compatible chat. The endpoint your app calls.

POST /v1/embeddingsBearer

OpenAI-compatible vector embeddings.

POST /v1/audio/transcriptionsBearer

Speech to text (Whisper-compatible).

POST /v1/audio/speechBearer

Text to speech (tts-1-compatible).

GET /api/configBearer

Diagnostic: active profile, configured providers, per-tier resolution.

GET /api/recentBearer

Last 50 requests served by the current instance (in-memory; resets on cold start).

Caching

CtlTower caches its own internal routing decisions, which is invisible to you — the response shape is identical whether the routing was cached or computed fresh. It does not cache model responses, so you never get a stale answer to a time- or context-sensitive question.

A cache_hint: 'stable' | 'volatile' | 'auto' request field is accepted and validated but currently inert — reserved for a future response cache. Sending it has no effect today.