Architecture¶
Liter-llm is a Rust-first polyglot library. A small core crate provides the client surface and all provider adapters; a proxy crate and a CLI layer on top; 14 language bindings wrap the core via native extensions or the C FFI shared library.
Crate graph¶
graph TB
CORE["liter-llm\ncore library"]
CORE --> PROXY["liter-llm-proxy\nHTTP proxy · MCP server"]
CORE --> CLI["liter-llm-cli\nbinary"]
CORE --> PY["liter-llm-py\nPyO3"]
CORE --> NODE["liter-llm-node\nNAPI-RS"]
CORE --> WASM["liter-llm-wasm\nwasm-bindgen"]
CORE --> PHP["liter-llm-php\next-php-rs"]
CORE --> FFI["liter-llm-ffi\nC shared library"]
FFI --> GO["Go (cgo)"]
FFI --> JAVA["Java / Kotlin JVM interop"]
FFI --> CS["C# (P/Invoke)"]
FFI --> RB["Ruby (Fiddle)"]
FFI --> EX["Elixir (NIF)"]
FFI --> ZIG["Zig"]
FFI --> DART["Dart (dart:ffi)"]
FFI --> SWIFT["Swift"]
The liter-llm core crate never depends on any binding. All knowledge flows in one direction: core outward to bindings, CLI, and proxy.
Core library structure¶
crates/liter-llm/src/
client/ # LlmClient + FileClient + BatchClient + ResponseClient + DefaultClient
error.rs # LiterLlmError enum (17 variants)
cost.rs # Per-call cost estimation
tokenizer.rs # HuggingFace tokenizer bridge (feature-gated)
auth/ # Credential providers — Azure AD, AWS SigV4, Vertex OAuth2, Copilot
http/ # reqwest-backed HTTP client, SSE parser, retry logic
provider/ # Per-provider adapters (OpenAI, Anthropic, Google, Bedrock, Vertex, …)
tower/ # Tower middleware layers (feature `tower`)
types/ # Request and response types (OpenAI wire format)
crates/liter-llm/schemas/
providers.json # 143 runtime providers plus schema metadata and complex-provider names
Public surface¶
The core crate re-exports a curated set of symbols at its root:
| Surface | Items |
|---|---|
| Client traits | LlmClient, LlmClientRaw, BatchClient, FileClient, ResponseClient |
| Default impl | DefaultClient, ManagedClient (with tower feature) |
| Constructors | create_client(...), create_client_from_json(...) |
| Config | ClientConfig, ClientConfigBuilder, FileConfig |
| Custom providers | register_custom_provider, unregister_custom_provider, CustomProviderConfig, AuthHeaderFormat |
| Provider registry | ProviderConfig, all_providers(), complex_provider_names() |
| Errors | LiterLlmError, Result<T> |
| Types | All types::* submodules — chat, embedding, image, audio, files, batch, responses, rerank, search, ocr, moderation, models, raw, common |
Internal modules (http) are pub(crate) and not part of the public API.
LlmClient covers the core model operations: chat, chat_stream, embed, list_models, image_generate, speech, transcribe, moderate, rerank, search, and ocr. File uploads and retrieval use FileClient; batch jobs use BatchClient; OpenAI Responses API operations use ResponseClient.
Tower middleware stack¶
When the tower feature is enabled, every request flows through a chain of composable Tower layers. The proxy builds the full stack; library users assemble any subset they need.
graph LR
REQ([Request])
subgraph obs ["Observability"]
T[Tracing] --> CT[Cost tracking]
end
subgraph ctrl ["Traffic control"]
CACHE[Cache] --> B[Budget] --> RL[Rate limit] --> CD[Cooldown]
end
subgraph route ["Routing"]
H[Health check] --> FB[Fallback / Router]
end
REQ --> T
CT --> CACHE
CD --> H
FB --> PROV([Provider])
Layers run outermost to innermost. CacheLayer short-circuits the stack on a hit (before traffic control). BudgetLayer rejects a request before it reaches the network if it would exceed the configured spend cap.
| Layer | File | Purpose |
|---|---|---|
TracingLayer |
tower/tracing.rs |
Emits OpenTelemetry gen_ai spans |
CostTrackingLayer |
tower/cost.rs |
Records gen_ai.usage.cost on the span |
CacheLayer |
tower/cache.rs |
In-memory response cache (LRU, configurable TTL) |
BudgetLayer |
tower/budget.rs |
Hard/soft spend caps per key |
ModelRateLimitLayer |
tower/rate_limit.rs |
RPM / TPM sliding-window limits |
CooldownLayer |
tower/cooldown.rs |
Per-provider backoff after transient errors |
HealthCheckLayer |
tower/health.rs |
Marks providers unhealthy after failure threshold |
HooksLayer |
tower/hooks.rs |
Pre/post-request hook execution |
FallbackLayer |
tower/fallback.rs |
Primary-plus-backup failover on transient errors |
Router |
tower/router.rs |
Multi-deployment load distribution |
LlmService |
tower/service.rs |
Bridges LlmClient into the Tower Service trait |
The optional opendal-cache feature swaps the in-memory cache for an OpenDAL-backed store (S3, GCS, Azure Blob, Redis, filesystem) via OpenDalCacheStore.
Router supports four strategies: RoundRobin, LatencyBased, CostBased, and WeightedRandom. Ordered-fallback is provided separately by FallbackLayer.
Request lifecycle¶
flowchart TD
A([App]) -->|ChatCompletionRequest| B{Cache hit?}
B -->|yes| C([cached response])
B -->|no| D[Budget · Rate limit · Health]
D -->|rejected| E([Error returned])
D -->|pass| F[Provider adapter]
F -->|HTTP POST| G([Provider API])
G -->|HTTP response| F
F -->|response| A
On a transient provider error, FallbackLayer replays the request on the configured backup. If a Router is configured, requests are distributed across deployments before reaching LlmService.
Language binding strategy¶
All 14 bindings share the same Rust core. Four native-extension crates and one C FFI crate cover the binding surface:
| Approach | Crate | Used by |
|---|---|---|
| PyO3 | liter-llm-py |
Python |
| napi-rs | liter-llm-node |
TypeScript / Node.js |
| wasm-bindgen | liter-llm-wasm |
WebAssembly (browsers, Cloudflare Workers, Deno, Bun) |
| ext-php-rs | liter-llm-php |
PHP |
| C ABI shared lib | liter-llm-ffi |
Go, Java, Kotlin Android, C#, Ruby, Elixir, Dart, Swift, Zig |
The C FFI surface is the only one that exposes Rust types as opaque handles. All FFI-consuming bindings use their language-native FFI mechanism — cgo, Panama FFM, P/Invoke, Fiddle, NIF, dart:ffi, Swift's C interop, Zig's @cImport — to call the shared library. Error context (variant label, numeric code, message) is preserved across the boundary so each binding can throw a typed exception.
The WASM binding compiles to a JS bundle and uses the browser/Node fetch API in place of reqwest; this is gated by the mutually-exclusive wasm-http feature instead of native-http.
Proxy structure¶
crates/liter-llm-proxy/src/
auth/ # Auth key store + validation
config/ # TOML config structs + env-var interpolation
routes/ # Axum route handlers (23 routes)
mcp/ # MCP server (22 tools via rmcp)
error.rs # Error types and HTTP mapping
file_store.rs # OpenDAL file storage backend
lib.rs # Module exports
openapi.rs # OpenAPI spec generation
service_pool.rs # Builds the Tower stack per model
state.rs # Shared application state
streaming.rs # SSE response streaming
The proxy builds one Tower stack per [[models]] entry at startup. Stacks are stored in a ServicePool indexed by model name and alias. Incoming requests authenticate via the master key or a virtual key ([[keys]]), then route to the matching stack.
See Proxy Server and Proxy Configuration for operational details.