Proxy Configuration¶
The proxy loads a single TOML file named liter-llm-proxy.toml. Pass --config <path> to the liter-llm api command or place the file in the current working directory (or any parent) and the server will discover it automatically.
Every string value supports ${VAR_NAME} environment variable interpolation. Unknown fields are rejected at parse time, so typos fail fast instead of silently being ignored.
A minimal file that exposes one model:
[general]
master_key = "${LITER_LLM_MASTER_KEY}"
[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"
Discovery and overrides¶
The proxy resolves config in this order, later values winning:
- Defaults from each struct's
Defaultimpl. - Either auto-discovery of
liter-llm-proxy.tomlin the working directory walked upward to the filesystem root, or an explicit--config <path>file (mutually exclusive —--configdisables auto-discovery). - CLI flags (
--host,--port,--master-key). - Environment variables read during
${VAR}interpolation.
See Proxy Server > Command-line flags for the flag list.
[server]¶
HTTP listener settings.
| Field | Type | Default | Description |
|---|---|---|---|
host |
string | "0.0.0.0" |
Bind address. |
port |
u16 | 4000 |
Bind port. |
request_timeout_secs |
u64 | 600 |
Upper bound on request duration before the proxy returns 504. |
body_limit_bytes |
usize | 10485760 (10 MiB) |
Maximum request body size. Requests larger than this return 413. |
cors_origins |
list |
["*"] |
Allowed CORS origins. ["*"] is wide open; replace with an explicit list in production. |
[server]
host = "0.0.0.0"
port = 4000
request_timeout_secs = 600
body_limit_bytes = 10_485_760
cors_origins = ["https://app.example.com", "https://admin.example.com"]
[general]¶
Proxy-wide behaviour.
| Field | Type | Default | Description |
|---|---|---|---|
master_key |
string? | none | Superuser API key. Requests with this Bearer token bypass virtual-key restrictions. Can also be set via LITER_LLM_MASTER_KEY. |
default_timeout_secs |
u64 | 120 |
Default per-request timeout used when a model does not set its own. |
max_retries |
u32 | 3 |
Retry attempts per upstream request. Only 429 and 5xx trigger a retry. |
enable_cost_tracking |
bool | false |
Record gen_ai.usage.cost on every response using embedded pricing data. |
enable_tracing |
bool | false |
Emit OpenTelemetry spans with GenAI semantic conventions. Set to true when exporting to an OTEL collector. |
[general]
master_key = "${LITER_LLM_MASTER_KEY}"
default_timeout_secs = 120
max_retries = 3
enable_cost_tracking = true
enable_tracing = true
[[models]]¶
Named model entries. A single model name may appear multiple times to define an active-active load-balanced pool.
| Field | Type | Default | Description |
|---|---|---|---|
name |
string | required | Alias clients send in the model field. |
provider_model |
string | required | Fully-qualified provider model, like openai/gpt-4o. |
api_key |
string? | none | Provider API key. Falls back to the environment variable the provider crate expects. |
base_url |
string? | none | Override the provider endpoint. |
timeout_secs |
u64? | general.default_timeout_secs |
Per-model timeout. |
fallbacks |
list |
[] |
Named models to try in order when the primary returns a transient error. |
[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"
timeout_secs = 60
fallbacks = ["claude-sonnet", "llama3-groq"]
[[models]]
name = "claude-sonnet"
provider_model = "anthropic/claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"
[[models]]
name = "llama3-groq"
provider_model = "groq/llama3-70b-8192"
api_key = "${GROQ_API_KEY}"
[[aliases]]¶
Glob-pattern credential overrides. Aliases apply to models that match the pattern and are not already defined as a [[models]] entry. Useful when you want a single Anthropic key to cover every Anthropic model without listing them individually.
| Field | Type | Default | Description |
|---|---|---|---|
pattern |
string | required | Glob pattern such as anthropic/* or openai/gpt-4*. |
api_key |
string? | none | Credential override for matching models. |
base_url |
string? | none | Endpoint override for matching models. |
# Any model matching "anthropic/*" uses the shared Anthropic key.
[[aliases]]
pattern = "anthropic/*"
api_key = "${ANTHROPIC_API_KEY}"
# Route all OpenAI models through an Azure deployment.
[[aliases]]
pattern = "openai/*"
api_key = "${AZURE_OPENAI_KEY}"
base_url = "https://my-azure.openai.azure.com"
[[keys]]¶
Virtual API keys. Each key is a Bearer token with its own model allowlist, rate limit, and spend cap. The master key bypasses all of these.
| Field | Type | Default | Description |
|---|---|---|---|
key |
string | required | The Bearer token clients present. |
description |
string? | none | Free-text label surfaced in logs and the admin API. |
models |
list |
[] |
Allowed model names. Empty means all models are allowed. |
rpm |
u32? | none | Per-key request-per-minute cap. |
tpm |
u64? | none | Per-key token-per-minute cap. |
budget_limit |
f64? | none | Lifetime spend cap in USD. Requests that would exceed the cap return 402. |
[[keys]]
key = "vk-team-frontend"
description = "Frontend team, chat-only, capped spend"
models = ["gpt-4o", "claude-sonnet"]
rpm = 60
tpm = 200_000
budget_limit = 100.0
[[keys]]
key = "vk-batch-worker"
description = "Overnight batch jobs, unrestricted model access"
rpm = 10
budget_limit = 500.0
[rate_limit]¶
Global request-per-minute and token-per-minute caps applied on top of per-key limits. Omit the table to disable global limiting.
| Field | Type | Default | Description |
|---|---|---|---|
rpm |
u32? | none | Global requests-per-minute across all keys. |
tpm |
u64? | none | Global tokens-per-minute across all keys. |
[budget]¶
Aggregate spend enforcement. When enforcement = "hard", requests that would cross the limit are rejected with 402. Under "soft", they are logged and passed through.
| Field | Type | Default | Description |
|---|---|---|---|
global_limit |
f64? | none | Total lifetime spend cap in USD. |
model_limits |
map |
{} |
Per-model spend caps keyed by provider/model. |
enforcement |
"hard" or "soft" |
"hard" |
Whether to reject or log over-budget requests. |
[budget]
global_limit = 1000.0
enforcement = "hard" # or "soft" to log but allow through
[budget.model_limits]
"openai/gpt-4o" = 500.0
"anthropic/claude-opus-4-20250514" = 200.0
[cache]¶
Response cache for non-streaming completions and embeddings. Keys include the model name, request body, and any relevant headers.
| Field | Type | Default | Description |
|---|---|---|---|
max_entries |
usize? | none | In-memory LRU capacity. Required for the memory backend. |
ttl_seconds |
u64? | none | Entry lifetime. Entries are evicted after this many seconds. |
backend |
string | "memory" |
Backend identifier: memory, or any OpenDAL scheme (redis, s3, fs, gcs, azblob, and more). |
backend_config |
map |
{} |
Backend-specific key/value options. See the OpenDAL docs for each scheme. |
# Redis cache via OpenDAL.
[cache]
max_entries = 10_000
ttl_seconds = 3600
backend = "redis"
[cache.backend_config]
endpoint = "redis://cache.internal:6379"
[files]¶
Storage backend for the /v1/files endpoints.
| Field | Type | Default | Description |
|---|---|---|---|
backend |
string | "memory" |
Backend identifier. memory is volatile; use s3, gcs, azblob, or fs for persistence. |
prefix |
string | "liter-llm-files/" |
Object key prefix under the backend's root. |
backend_config |
map |
{} |
Backend-specific options (bucket, region, credentials). |
# S3-backed file store.
[files]
backend = "s3"
prefix = "liter-llm-files/"
[files.backend_config]
bucket = "my-llm-files"
region = "us-west-2"
[health]¶
Periodic upstream probes. When configured, the proxy sends a small request to probe_model every interval_secs seconds and marks failing providers unhealthy so the fallback layer can skip them.
| Field | Type | Default | Description |
|---|---|---|---|
interval_secs |
u64? | none | Probe interval. Disabled when omitted. |
probe_model |
string? | none | Model name used for the probe. Usually a cheap chat model. |
[cooldown]¶
Circuit-breaker duration. After a provider returns a transient error, the proxy refuses to send it traffic for duration_secs seconds and routes to fallbacks instead.
| Field | Type | Default | Description |
|---|---|---|---|
duration_secs |
u64 | required | Cooldown window in seconds. |
Environment variable interpolation¶
Any ${VAR_NAME} pattern inside a string value is replaced with the environment variable's value before parsing. Unknown variables expand to an empty string, which is usually what you want for Option<String> fields. The interpolation runs on the raw TOML source, so nested tables and array values are expanded uniformly.
# Any ${VAR} pattern in a string value is replaced with the env var at load time.
# Unknown variables expand to an empty string.
[general]
master_key = "${LITER_LLM_MASTER_KEY}"
[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"
base_url = "${OPENAI_BASE_URL}" # empty if unset
Unclosed braces are treated as literals
If a ${ is missing its closing }, the proxy leaves the text as-is rather than silently truncating. That makes typos easy to spot in logs.
Validation¶
The parser sets deny_unknown_fields on every struct. Any typo or unsupported field raises a invalid TOML config error with the line and column. Fix the typo and restart; the proxy does not hot-reload.