Proxy Configuration¶

The proxy loads a single TOML file named liter-llm-proxy.toml. Pass --config <path> to the liter-llm api command or place the file in the current working directory (or any parent) and the server will discover it automatically.

Every string value supports ${VAR_NAME} environment variable interpolation. Unknown fields are rejected at parse time, so typos fail fast instead of silently being ignored.

A minimal file that exposes one model:

[general]
master_key = "${LITER_LLM_MASTER_KEY}"

[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"

Discovery and overrides¶

The proxy resolves config in this order, later values winning:

Defaults from each struct's Default impl.
Either auto-discovery of liter-llm-proxy.toml in the working directory walked upward to the filesystem root, or an explicit --config <path> file (mutually exclusive — --config disables auto-discovery).
CLI flags (--host, --port, --master-key).
Environment variables read during ${VAR} interpolation.

See Proxy Server > Command-line flags for the flag list.

`[server]`¶

HTTP listener settings.

Field	Type	Default	Description
`host`	string	`"0.0.0.0"`	Bind address.
`port`	u16	`4000`	Bind port.
`request_timeout_secs`	u64	`600`	Upper bound on request duration before the proxy returns 504.
`body_limit_bytes`	usize	`10485760` (10 MiB)	Maximum request body size. Requests larger than this return 413.
`cors_origins`	list	`["*"]`	Allowed CORS origins. `["*"]` is wide open; replace with an explicit list in production.

[server]
host = "0.0.0.0"
port = 4000
request_timeout_secs = 600
body_limit_bytes = 10_485_760
cors_origins = ["https://app.example.com", "https://admin.example.com"]

`[general]`¶

Proxy-wide behaviour.

Field	Type	Default	Description
`master_key`	string?	none	Superuser API key. Requests with this Bearer token bypass virtual-key restrictions. Can also be set via `LITER_LLM_MASTER_KEY`.
`default_timeout_secs`	u64	`120`	Default per-request timeout used when a model does not set its own.
`max_retries`	u32	`3`	Retry attempts per upstream request. Only 429 and 5xx trigger a retry.
`enable_cost_tracking`	bool	`false`	Record `gen_ai.usage.cost` on every response using embedded pricing data.
`enable_tracing`	bool	`false`	Emit OpenTelemetry spans with GenAI semantic conventions. Set to `true` when exporting to an OTEL collector.

[general]
master_key = "${LITER_LLM_MASTER_KEY}"
default_timeout_secs = 120
max_retries = 3
enable_cost_tracking = true
enable_tracing = true

`[[models]]`¶

Named model entries. A single model name may appear multiple times to define an active-active load-balanced pool.

Field	Type	Default	Description
`name`	string	required	Alias clients send in the `model` field.
`provider_model`	string	required	Fully-qualified provider model, like `openai/gpt-4o`.
`api_key`	string?	none	Provider API key. Falls back to the environment variable the provider crate expects.
`base_url`	string?	none	Override the provider endpoint.
`timeout_secs`	u64?	`general.default_timeout_secs`	Per-model timeout.
`fallbacks`	list	`[]`	Named models to try in order when the primary returns a transient error.

[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"
timeout_secs = 60
fallbacks = ["claude-sonnet", "llama3-groq"]

[[models]]
name = "claude-sonnet"
provider_model = "anthropic/claude-sonnet-4-20250514"
api_key = "${ANTHROPIC_API_KEY}"

[[models]]
name = "llama3-groq"
provider_model = "groq/llama3-70b-8192"
api_key = "${GROQ_API_KEY}"

`[[aliases]]`¶

Glob-pattern credential overrides. Aliases apply to models that match the pattern and are not already defined as a [[models]] entry. Useful when you want a single Anthropic key to cover every Anthropic model without listing them individually.

Field	Type	Default	Description
`pattern`	string	required	Glob pattern such as `anthropic/` or `openai/gpt-4`.
`api_key`	string?	none	Credential override for matching models.
`base_url`	string?	none	Endpoint override for matching models.

# Any model matching "anthropic/*" uses the shared Anthropic key.
[[aliases]]
pattern = "anthropic/*"
api_key = "${ANTHROPIC_API_KEY}"

# Route all OpenAI models through an Azure deployment.
[[aliases]]
pattern = "openai/*"
api_key = "${AZURE_OPENAI_KEY}"
base_url = "https://my-azure.openai.azure.com"

`[[keys]]`¶

Virtual API keys. Each key is a Bearer token with its own model allowlist, rate limit, and spend cap. The master key bypasses all of these.

Field	Type	Default	Description
`key`	string	required	The Bearer token clients present.
`description`	string?	none	Free-text label surfaced in logs and the admin API.
`models`	list	`[]`	Allowed model names. Empty means all models are allowed.
`rpm`	u32?	none	Per-key request-per-minute cap.
`tpm`	u64?	none	Per-key token-per-minute cap.
`budget_limit`	f64?	none	Lifetime spend cap in USD. Requests that would exceed the cap return 402.

[[keys]]
key = "vk-team-frontend"
description = "Frontend team, chat-only, capped spend"
models = ["gpt-4o", "claude-sonnet"]
rpm = 60
tpm = 200_000
budget_limit = 100.0

[[keys]]
key = "vk-batch-worker"
description = "Overnight batch jobs, unrestricted model access"
rpm = 10
budget_limit = 500.0

`[rate_limit]`¶

Global request-per-minute and token-per-minute caps applied on top of per-key limits. Omit the table to disable global limiting.

Field	Type	Default	Description
`rpm`	u32?	none	Global requests-per-minute across all keys.
`tpm`	u64?	none	Global tokens-per-minute across all keys.

[rate_limit]
rpm = 600
tpm = 1_000_000

`[budget]`¶

Aggregate spend enforcement. When enforcement = "hard", requests that would cross the limit are rejected with 402. Under "soft", they are logged and passed through.

Field	Type	Default	Description
`global_limit`	f64?	none	Total lifetime spend cap in USD.
`model_limits`	map	`{}`	Per-model spend caps keyed by `provider/model`.
`enforcement`	`"hard"` or `"soft"`	`"hard"`	Whether to reject or log over-budget requests.

[budget]
global_limit = 1000.0
enforcement = "hard"  # or "soft" to log but allow through

[budget.model_limits]
"openai/gpt-4o" = 500.0
"anthropic/claude-opus-4-20250514" = 200.0

`[cache]`¶

Response cache for non-streaming completions and embeddings. Keys include the model name, request body, and any relevant headers.

Field	Type	Default	Description
`max_entries`	usize?	none	In-memory LRU capacity. Required for the `memory` backend.
`ttl_seconds`	u64?	none	Entry lifetime. Entries are evicted after this many seconds.
`backend`	string	`"memory"`	Backend identifier: `memory`, or any OpenDAL scheme (`redis`, `s3`, `fs`, `gcs`, `azblob`, and more).
`backend_config`	map	`{}`	Backend-specific key/value options. See the OpenDAL docs for each scheme.

# In-memory cache (default).
[cache]
max_entries = 4096
ttl_seconds = 900
backend = "memory"

# Redis cache via OpenDAL.
[cache]
max_entries = 10_000
ttl_seconds = 3600
backend = "redis"

[cache.backend_config]
endpoint = "redis://cache.internal:6379"

`[files]`¶

Storage backend for the /v1/files endpoints.

Field	Type	Default	Description
`backend`	string	`"memory"`	Backend identifier. `memory` is volatile; use `s3`, `gcs`, `azblob`, or `fs` for persistence.
`prefix`	string	`"liter-llm-files/"`	Object key prefix under the backend's root.
`backend_config`	map	`{}`	Backend-specific options (bucket, region, credentials).

# In-memory (default). Files are lost on restart.
[files]
backend = "memory"

# S3-backed file store.
[files]
backend = "s3"
prefix = "liter-llm-files/"

[files.backend_config]
bucket = "my-llm-files"
region = "us-west-2"

`[health]`¶

Periodic upstream probes. When configured, the proxy sends a small request to probe_model every interval_secs seconds and marks failing providers unhealthy so the fallback layer can skip them.

Field	Type	Default	Description
`interval_secs`	u64?	none	Probe interval. Disabled when omitted.
`probe_model`	string?	none	Model name used for the probe. Usually a cheap chat model.

[health]
interval_secs = 30
probe_model = "openai/gpt-4o-mini"

`[cooldown]`¶

Circuit-breaker duration. After a provider returns a transient error, the proxy refuses to send it traffic for duration_secs seconds and routes to fallbacks instead.

Field	Type	Default	Description
`duration_secs`	u64	required	Cooldown window in seconds.

[cooldown]
duration_secs = 60

Environment variable interpolation¶

Any ${VAR_NAME} pattern inside a string value is replaced with the environment variable's value before parsing. Unknown variables expand to an empty string, which is usually what you want for Option<String> fields. The interpolation runs on the raw TOML source, so nested tables and array values are expanded uniformly.

# Any ${VAR} pattern in a string value is replaced with the env var at load time.
# Unknown variables expand to an empty string.
[general]
master_key = "${LITER_LLM_MASTER_KEY}"

[[models]]
name = "gpt-4o"
provider_model = "openai/gpt-4o"
api_key = "${OPENAI_API_KEY}"
base_url = "${OPENAI_BASE_URL}"  # empty if unset

Unclosed braces are treated as literals

If a ${ is missing its closing }, the proxy leaves the text as-is rather than silently truncating. That makes typos easy to spot in logs.

Validation¶

The parser sets deny_unknown_fields on every struct. Any typo or unsupported field raises a invalid TOML config error with the line and column. Fix the typo and restart; the proxy does not hot-reload.

Proxy Configuration¶

Discovery and overrides¶

[server]¶

[general]¶

[[models]]¶

[[aliases]]¶

[[keys]]¶

[rate_limit]¶

[budget]¶

[cache]¶

[files]¶

[health]¶

[cooldown]¶

Environment variable interpolation¶

Validation¶

`[server]`¶

`[general]`¶

`[[models]]`¶

`[[aliases]]`¶

`[[keys]]`¶

`[rate_limit]`¶

`[budget]`¶

`[cache]`¶

`[files]`¶

`[health]`¶

`[cooldown]`¶