Skip to content

Proxy Configuration

The proxy loads a single TOML file named liter-llm-proxy.toml. Pass --config <path> to the liter-llm api command or place the file in the current working directory (or any parent) and the server will discover it automatically.

Every string value supports ${VAR_NAME} environment variable interpolation. Unknown fields are rejected at parse time, so typos fail fast instead of silently being ignored.

A minimal file that exposes one model:

[general]
master_key = "${LITER_LLM_MASTER_KEY}"

[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"

Discovery and overrides

The proxy resolves config in this order, later values winning:

  1. Defaults from each struct's Default impl.
  2. Either auto-discovery of liter-llm-proxy.toml in the working directory walked upward to the filesystem root, or an explicit --config <path> file (mutually exclusive — --config disables auto-discovery).
  3. CLI flags (--host, --port, --master-key).
  4. Environment variables read during ${VAR} interpolation.

See Proxy Server > Command-line flags for the flag list.

[server]

HTTP listener settings.

Field Type Default Description
host string "0.0.0.0" Bind address.
port u16 4000 Bind port.
request_timeout_secs u64 600 Upper bound on request duration before the proxy returns 504.
body_limit_bytes usize 10485760 (10 MiB) Maximum request body size. Requests larger than this return 413.
cors_origins list ["*"] Allowed CORS origins. ["*"] is wide open; replace with an explicit list in production.
[server]
host = "0.0.0.0"
port = 4000
request_timeout_secs = 600
body_limit_bytes = 10_485_760
cors_origins = ["https://app.example.com", "https://admin.example.com"]

[general]

Proxy-wide behaviour.

Field Type Default Description
master_key string? none Superuser API key. Requests with this Bearer token bypass virtual-key restrictions. Can also be set via LITER_LLM_MASTER_KEY.
default_timeout_secs u64 120 Default per-request timeout used when a model does not set its own.
max_retries u32 3 Retry attempts per upstream request. Only 429 and 5xx trigger a retry.
enable_cost_tracking bool false Record gen_ai.usage.cost on every response using embedded pricing data.
enable_tracing bool false Emit OpenTelemetry spans with GenAI semantic conventions. Set to true when exporting to an OTEL collector.
[general]
master_key = "${LITER_LLM_MASTER_KEY}"
default_timeout_secs = 120
max_retries = 3
enable_cost_tracking = true
enable_tracing = true

[[models]]

Named model entries. A single model name may appear multiple times to define an active-active load-balanced pool.

Field Type Default Description
name string required Alias clients send in the model field.
provider_model string required Fully-qualified provider model, like openai/gpt-4o.
api_key string? none Provider API key. Falls back to the environment variable the provider crate expects.
base_url string? none Override the provider endpoint.
timeout_secs u64? general.default_timeout_secs Per-model timeout.
fallbacks list [] Named models to try in order when the primary returns a transient error.
[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"
timeout_secs = 60
fallbacks = ["claude-sonnet", "llama3-groq"]

[[models]]
name = "claude-sonnet"
provider_model = "anthropic/claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"

[[models]]
name = "llama3-groq"
provider_model = "groq/llama3-70b-8192"
api_key = "${GROQ_API_KEY}"

[[aliases]]

Glob-pattern credential overrides. Aliases apply to models that match the pattern and are not already defined as a [[models]] entry. Useful when you want a single Anthropic key to cover every Anthropic model without listing them individually.

Field Type Default Description
pattern string required Glob pattern such as anthropic/* or openai/gpt-4*.
api_key string? none Credential override for matching models.
base_url string? none Endpoint override for matching models.
# Any model matching "anthropic/*" uses the shared Anthropic key.
[[aliases]]
pattern = "anthropic/*"
api_key = "${ANTHROPIC_API_KEY}"

# Route all OpenAI models through an Azure deployment.
[[aliases]]
pattern = "openai/*"
api_key = "${AZURE_OPENAI_KEY}"
base_url = "https://my-azure.openai.azure.com"

[[keys]]

Virtual API keys. Each key is a Bearer token with its own model allowlist, rate limit, and spend cap. The master key bypasses all of these.

Field Type Default Description
key string required The Bearer token clients present.
description string? none Free-text label surfaced in logs and the admin API.
models list [] Allowed model names. Empty means all models are allowed.
rpm u32? none Per-key request-per-minute cap.
tpm u64? none Per-key token-per-minute cap.
budget_limit f64? none Lifetime spend cap in USD. Requests that would exceed the cap return 402.
[[keys]]
key = "vk-team-frontend"
description = "Frontend team, chat-only, capped spend"
models = ["gpt-4o", "claude-sonnet"]
rpm = 60
tpm = 200_000
budget_limit = 100.0

[[keys]]
key = "vk-batch-worker"
description = "Overnight batch jobs, unrestricted model access"
rpm = 10
budget_limit = 500.0

[rate_limit]

Global request-per-minute and token-per-minute caps applied on top of per-key limits. Omit the table to disable global limiting.

Field Type Default Description
rpm u32? none Global requests-per-minute across all keys.
tpm u64? none Global tokens-per-minute across all keys.
[rate_limit]
rpm = 600
tpm = 1_000_000

[budget]

Aggregate spend enforcement. When enforcement = "hard", requests that would cross the limit are rejected with 402. Under "soft", they are logged and passed through.

Field Type Default Description
global_limit f64? none Total lifetime spend cap in USD.
model_limits map {} Per-model spend caps keyed by provider/model.
enforcement "hard" or "soft" "hard" Whether to reject or log over-budget requests.
[budget]
global_limit = 1000.0
enforcement = "hard"  # or "soft" to log but allow through

[budget.model_limits]
"openai/gpt-4o" = 500.0
"anthropic/claude-opus-4-20250514" = 200.0

[cache]

Response cache for non-streaming completions and embeddings. Keys include the model name, request body, and any relevant headers.

Field Type Default Description
max_entries usize? none In-memory LRU capacity. Required for the memory backend.
ttl_seconds u64? none Entry lifetime. Entries are evicted after this many seconds.
backend string "memory" Backend identifier: memory, or any OpenDAL scheme (redis, s3, fs, gcs, azblob, and more).
backend_config map {} Backend-specific key/value options. See the OpenDAL docs for each scheme.
# In-memory cache (default).
[cache]
max_entries = 4096
ttl_seconds = 900
backend = "memory"
# Redis cache via OpenDAL.
[cache]
max_entries = 10_000
ttl_seconds = 3600
backend = "redis"

[cache.backend_config]
endpoint = "redis://cache.internal:6379"

[files]

Storage backend for the /v1/files endpoints.

Field Type Default Description
backend string "memory" Backend identifier. memory is volatile; use s3, gcs, azblob, or fs for persistence.
prefix string "liter-llm-files/" Object key prefix under the backend's root.
backend_config map {} Backend-specific options (bucket, region, credentials).
# In-memory (default). Files are lost on restart.
[files]
backend = "memory"
# S3-backed file store.
[files]
backend = "s3"
prefix = "liter-llm-files/"

[files.backend_config]
bucket = "my-llm-files"
region = "us-west-2"

[health]

Periodic upstream probes. When configured, the proxy sends a small request to probe_model every interval_secs seconds and marks failing providers unhealthy so the fallback layer can skip them.

Field Type Default Description
interval_secs u64? none Probe interval. Disabled when omitted.
probe_model string? none Model name used for the probe. Usually a cheap chat model.
[health]
interval_secs = 30
probe_model = "openai/gpt-4o-mini"

[cooldown]

Circuit-breaker duration. After a provider returns a transient error, the proxy refuses to send it traffic for duration_secs seconds and routes to fallbacks instead.

Field Type Default Description
duration_secs u64 required Cooldown window in seconds.
[cooldown]
duration_secs = 60

Environment variable interpolation

Any ${VAR_NAME} pattern inside a string value is replaced with the environment variable's value before parsing. Unknown variables expand to an empty string, which is usually what you want for Option<String> fields. The interpolation runs on the raw TOML source, so nested tables and array values are expanded uniformly.

# Any ${VAR} pattern in a string value is replaced with the env var at load time.
# Unknown variables expand to an empty string.
[general]
master_key = "${LITER_LLM_MASTER_KEY}"

[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"
base_url = "${OPENAI_BASE_URL}"  # empty if unset

Unclosed braces are treated as literals

If a ${ is missing its closing }, the proxy leaves the text as-is rather than silently truncating. That makes typos easy to spot in logs.

Validation

The parser sets deny_unknown_fields on every struct. Any typo or unsupported field raises a invalid TOML config error with the line and column. Fix the typo and restart; the proxy does not hot-reload.