Proxy Server¶

The liter-llm binary ships with a production proxy that speaks the OpenAI REST API on top of any of the 143 supported providers. It terminates Bearer auth, routes by model name, applies the full Tower middleware stack (cache, budget, rate limit, cooldown, health, fallback), and exposes OpenTelemetry spans.

Point any OpenAI SDK at the proxy URL and it works unchanged.

Quick start¶

Start the server against an auto-discovered liter-llm-proxy.toml:

liter-llm api

Start with an explicit config and a master key from the environment:

export LITER_LLM_MASTER_KEY="sk-proxy-master-$(openssl rand -hex 16)"
liter-llm api --config ./liter-llm-proxy.toml

The default bind address is 0.0.0.0:4000. Override with --host and --port.

A minimal config that exposes one OpenAI model:

[general]
master_key = "${LITER_LLM_MASTER_KEY}"

[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"

Send a request:

curl http://localhost:4000/v1/chat/completions \
  -H "Authorization: Bearer $LITER_LLM_MASTER_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "user", "content": "Summarize the CAP theorem in one sentence."}
    ]
  }'

See Proxy Configuration for every config field.

Command-line flags¶

Flag	Default	Purpose
`--config`, `-c`	auto-discover	Path to `liter-llm-proxy.toml`. Walks from the current directory up to the filesystem root when omitted.
`--host`	`0.0.0.0`	Bind address. Overrides `[server].host`.
`--port`, `-p`	`4000`	Bind port. Overrides `[server].port`.
`--master-key`	reads env: `LITER_LLM_MASTER_KEY`	Master API key. Overrides `[general].master_key`.
`--debug`	off	Enable debug-level tracing. Equivalent to `RUST_LOG=debug`.

CLI flags take precedence over config file values, which take precedence over defaults.

Endpoints¶

The proxy exposes 23 API routes with 26 endpoints (verb+path combinations). All /v1/* routes require a Bearer token. Health and OpenAPI endpoints are public.

LLM operations¶

Method	Path	Request body	Notes
POST	`/v1/chat/completions`	`ChatCompletionRequest`	Supports SSE streaming when `stream: true`.
POST	`/v1/embeddings`	`EmbeddingRequest`
GET	`/v1/models`	n/a	Lists configured `[[models]]` and aliases.
POST	`/v1/images/generations`	`CreateImageRequest`
POST	`/v1/audio/speech`	`CreateSpeechRequest`	Returns audio bytes.
POST	`/v1/audio/transcriptions`	multipart	Speech to text.
POST	`/v1/moderations`	`ModerationRequest`
POST	`/v1/rerank`	`RerankRequest`	Extended endpoint, not in OpenAI API.
POST	`/v1/search`	`SearchRequest`	Extended endpoint.
POST	`/v1/ocr`	`OcrRequest`	Extended endpoint.

Files¶

Method	Path	Purpose
POST	`/v1/files`	Upload a file (multipart).
GET	`/v1/files`	List files.
GET	`/v1/files/{file_id}`	Retrieve file metadata.
DELETE	`/v1/files/{file_id}`	Delete a file.
GET	`/v1/files/{file_id}/content`	Retrieve raw file bytes.

Batches¶

Method	Path	Purpose
POST	`/v1/batches`	Create a batch job.
GET	`/v1/batches`	List batch jobs.
GET	`/v1/batches/{batch_id}`	Retrieve a batch job.
POST	`/v1/batches/{batch_id}/cancel`	Cancel an in-progress batch.

Responses¶

Method	Path	Purpose
POST	`/v1/responses`	Create a response (Responses API).
GET	`/v1/responses/{response_id}`	Retrieve a response.
POST	`/v1/responses/{response_id}/cancel`	Cancel a response.

Health and discovery¶

Method	Path	Auth	Purpose
GET	`/health`	public	Full status including configured model list.
GET	`/health/liveness`	public	Always returns 200 while the process is alive. Use as a Kubernetes liveness probe.
GET	`/health/readiness`	public	Returns 200 once the service pool is initialised. Use as a readiness probe.
GET	`/openapi.json`	public	Machine-readable OpenAPI 3.1 schema for every `/v1/*` route.

Request lifecycle¶

Every authenticated request passes through this pipeline:

flowchart LR
    A[HTTP client] --> B[CORS]
    B --> C[Body limit]
    C --> D[Sensitive header strip]
    D --> E[Bearer auth]
    E --> F{Model allowed?}
    F -->|no| G[403 Forbidden]
    F -->|yes| H[Service pool dispatch]
    H --> I[Tower middleware stack]
    I --> J[Provider client]
    J --> K[Response]

The Tower stack composes: tracing, cost tracking, cache, budget, rate limit, cooldown, health check, fallback. Layer order is set in service_pool.rs when the pool is built.

Deployment¶

Docker¶

A 35 MB statically-linked image is published on every release:

docker run --rm -it \
  -p 4000:4000 \
  -v "$PWD/liter-llm-proxy.toml:/etc/liter-llm-proxy.toml:ro" \
  -e LITER_LLM_MASTER_KEY \
  -e OPENAI_API_KEY \
  ghcr.io/kreuzberg-dev/liter-llm:latest \
  api --config /etc/liter-llm-proxy.toml

Docker Compose¶

services:
  liter-llm:
    image: ghcr.io/kreuzberg-dev/liter-llm:latest
    command: api --config /etc/liter-llm-proxy.toml
    ports:
      - "4000:4000"
    volumes:
      - ./liter-llm-proxy.toml:/etc/liter-llm-proxy.toml:ro
    environment:
      LITER_LLM_MASTER_KEY: ${LITER_LLM_MASTER_KEY}
      OPENAI_API_KEY: ${OPENAI_API_KEY}
      ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY}
    healthcheck:
      test: ["CMD", "curl", "-fsS", "http://localhost:4000/health/readiness"]
      interval: 10s
      timeout: 3s
      retries: 3

Kubernetes¶

Point the liveness and readiness probes at the dedicated health endpoints:

livenessProbe:
  httpGet:
    path: /health/liveness
    port: 4000
  initialDelaySeconds: 5
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /health/readiness
    port: 4000
  initialDelaySeconds: 2
  periodSeconds: 5

HTTP behaviour¶

Concern	Default	Controlled by
Request timeout	600 s	`[server].request_timeout_secs`
Body size limit	10 MiB	`[server].body_limit_bytes`
CORS origins	`*`	`[server].cors_origins`
Response compression	always on	built in (`tower_http::CompressionLayer`)
Panic handling	caught and returned as 500	built in (`tower_http::CatchPanicLayer`)
`Authorization` redaction in logs	always on	built in (`SetSensitiveHeadersLayer`)

CORS is wide open by default so the proxy works from any browser app during development. Restrict it to a known origin list before shipping to production.

Verify a running instance¶

curl -fsS http://localhost:4000/health/readiness && echo "ready"
curl -fsS http://localhost:4000/v1/models \
  -H "Authorization: Bearer $LITER_LLM_MASTER_KEY"

If the readiness probe returns 200 and /v1/models lists the expected models, the proxy is serving traffic.