Architecture¶
liter-llm is a Rust-first monorepo. A small core library provides the LLM client and all provider integrations; a proxy and a CLI layer on top; eleven language bindings wrap the core via native extension mechanisms.
Crate graph¶
graph TB
CORE["liter-llm\ncore library"]
CORE --> PROXY["liter-llm-proxy\nHTTP proxy · MCP server"]
CORE --> CLI["liter-llm-cli\nbinary"]
CORE --> BC["bindings-core\nshared helpers"]
BC --> NAT["Native extensions\nPython · Node.js · WebAssembly · Ruby · Elixir"]
BC --> FFI["C FFI\nGo · Java · C# · PHP"]
The liter-llm core crate never depends on any binding. All knowledge flows in one direction: core outward to bindings, CLI, and proxy.
Native-extension bindings (Python, Node.js, WebAssembly, Ruby, Elixir) use Rust procedural macros (PyO3, napi-rs, wasm-bindgen, magnus, rustler). The remaining four languages (Go, Java, C#, PHP) call the liter-llm-ffi shared library via their native FFI mechanism (cgo, Panama FFM, P/Invoke).
Core library structure¶
crates/liter-llm/src/
client/ # LlmClient trait + DefaultClient
error.rs # LiterLlmError enum (17 variants)
cost.rs # completion_cost() - pricing registry
tokenizer.rs # count_tokens() - HuggingFace tokenizer bridge
auth/ # CredentialProvider trait + Azure, Bedrock, Vertex providers
http/ # reqwest-backed HTTP client, SSE parser, retry logic
provider/ # Per-provider adapters (143 providers)
tower/ # Tower middleware layers
types/ # Request and response types (OpenAI wire format)
crates/liter-llm/schemas/
pricing.json # Provider pricing registry embedded at compile time
providers.json # 142+ provider configurations
Tower middleware stack¶
Every LlmRequest flows through a chain of composable Tower layers. The proxy builds the full stack; library users can assemble any subset.
graph LR
REQ([Request])
subgraph obs ["Observability"]
T[Tracing] --> CT[Cost tracking]
end
subgraph ctrl ["Traffic control"]
B[Budget] --> RL[Rate limit] --> CD[Cooldown]
end
subgraph route ["Routing"]
H[Health check] --> FB[Fallback / Router]
end
REQ --> T
CT --> B
CD --> H
FB --> PROV([Provider])
Layers run outermost to innermost. CacheLayer short-circuits the stack on a hit (before traffic control). BudgetLayer rejects a request before it reaches the network if it would exceed the configured spend cap.
| Layer | File | Purpose |
|---|---|---|
TracingLayer |
tower/tracing.rs |
Emits OpenTelemetry gen_ai spans |
CostTrackingLayer |
tower/cost.rs |
Records gen_ai.usage.cost on the span |
CacheLayer |
tower/cache.rs |
Response cache (LRU, configurable TTL) |
BudgetLayer |
tower/budget.rs |
Hard/soft spend caps per key |
RateLimitLayer |
tower/rate_limit.rs |
RPM / TPM sliding-window limits |
CooldownLayer |
tower/cooldown.rs |
Per-provider backoff after transient errors |
HealthLayer |
tower/health.rs |
Marks providers unhealthy after failure threshold |
FallbackLayer |
tower/fallback.rs |
Primary-plus-backup failover on transient errors |
Router |
tower/router.rs |
Multi-deployment load distribution (5 strategies) |
LlmService |
tower/service.rs |
Bridges LlmClient into the Tower Service trait |
Request lifecycle¶
flowchart TD
A([App]) -->|LlmRequest| B{Cache hit?}
B -->|yes| C([LlmResponse — cached])
B -->|no| D[Budget · Rate limit · Health]
D -->|rejected| E([Error returned])
D -->|pass| F[Provider adapter]
F -->|HTTP POST| G([Provider API])
G -->|HTTP response| F
F -->|LlmResponse| A
On a transient provider error, FallbackLayer replays the request on the configured backup. If a Router is configured, requests are distributed across deployments before reaching LlmService.
Language binding strategy¶
All eleven bindings share the same Rust core. Six native-extension approaches cover the binding surface:
| Approach | Used by | Mechanism |
|---|---|---|
| PyO3 | Python | Rust procedural macros generate Python module + exception classes |
| napi-rs | Node.js | Rust procedural macros generate N-API addon |
| wasm-bindgen | WebAssembly | Compiles to WASM + JS glue; fetch API replaces reqwest |
| magnus | Ruby | Rust procedural macros generate Ruby C extension |
| rustler | Elixir | Rust procedural macros generate Elixir NIF |
extern "C" FFI |
Go, Java, C#, PHP | Single shared library; language calls via cgo/Panama FFM/P/Invoke |
crates/liter-llm-bindings-core provides two helpers shared by every non-FFI binding: error_kind_label() maps a LiterLlmError variant to a stable string label, and format_error() produces the [Label] message prefix used by TypeScript and the C FFI.
The WASM binding is the only one that does not use format_error(). It calls the browser fetch API directly and produces HTTP {status}: {message} errors from the raw HTTP response, bypassing the Rust error enum entirely.
Proxy structure¶
crates/liter-llm-proxy/src/
auth/ # Auth key store + validation
config/ # TOML config structs + env-var interpolation
routes/ # Axum route handlers (23 unique routes)
mcp/ # MCP server (22 tools via rmcp)
error.rs # Error types and mapping
file_store.rs # OpenDAL file storage backend
lib.rs # Module exports
openapi.rs # OpenAPI spec generation
service_pool.rs # Builds the Tower stack per model
state.rs # Shared application state
streaming.rs # SSE response streaming
The proxy builds one Tower stack per [[models]] entry at startup. Stacks are stored in a ServicePool indexed by model name and alias. Incoming requests authenticate via the master key or a virtual key ([[keys]]), then route to the matching stack.
See Proxy Server and Proxy Configuration for operational details.