Architecture¶

liter-llm is a Rust-core library with native bindings for 11 languages. All business logic lives in the Rust core; each language binding is a thin wrapper that translates types and errors across the FFI boundary.

System Overview¶

graph TD
    App["Your Application"]
    App --> Binding

    subgraph Binding["Language Binding"]
        direction LR
        Py["Python<br/>(PyO3)"]
        TS["TypeScript<br/>(NAPI-RS)"]
        Go["Go<br/>(cgo)"]
        Rb["Ruby<br/>(Magnus)"]
        Java["Java<br/>(Panama FFM)"]
        CS["C#<br/>(P/Invoke)"]
        Elixir["Elixir<br/>(Rustler)"]
        PHP["PHP<br/>(ext-php-rs)"]
        WASM["WASM<br/>(wasm-bindgen)"]
    end

    Binding --> Core

    subgraph Core["Rust Core (liter-llm)"]
        Client["DefaultClient"]
        Tower["Tower Middleware"]
        HTTP["HTTP / SSE Layer"]
        Client --> Tower --> HTTP
    end

    HTTP --> Provider["Provider API<br/>(OpenAI, Anthropic, Groq, ...)"]

Crate Layout¶

The workspace is split into a core crate and per-language binding crates:

Crate	Purpose
`crates/liter-llm`	Core library: client, providers, types, HTTP, errors, Tower middleware
`crates/liter-llm-py`	Python bindings via PyO3
`crates/liter-llm-node`	Node.js bindings via NAPI-RS
`crates/liter-llm-ffi`	C FFI layer consumed by Go, Java, and C#
`crates/liter-llm-php`	PHP bindings via ext-php-rs
`crates/liter-llm-wasm`	WebAssembly bindings via wasm-bindgen

Language packages in packages/ wrap the compiled artifacts into idiomatic packages for each ecosystem:

Package	Ecosystem
`packages/go`	Go module (cgo, wraps C FFI)
`packages/java`	Maven artifact (Panama FFM, wraps C FFI)
`packages/csharp`	NuGet package (P/Invoke, wraps C FFI)
`packages/ruby`	RubyGem (Magnus)
`packages/elixir`	Hex package (Rustler NIF)

Provider Resolution¶

Providers are resolved once at client construction, not on every request. The DefaultClient::new() constructor accepts an optional model_hint that selects a provider from the embedded registry (schemas/providers.json, compiled into the binary).

At request time, the model string prefix (e.g. openai/ in openai/gpt-4o) routes to the correct provider. Since the registry is already loaded, this is a simple prefix lookup with no I/O.

sequenceDiagram
    participant App
    participant Client as DefaultClient
    participant Registry as Provider Registry
    participant HTTP as HTTP Layer
    participant API as Provider API

    App->>Client: new(config, model_hint)
    Client->>Registry: resolve provider (once)
    Registry-->>Client: provider config

    App->>Client: chat("openai/gpt-4o", ...)
    Client->>Client: prefix lookup → OpenAI
    Client->>HTTP: POST /v1/chat/completions
    HTTP->>API: request + auth header
    API-->>HTTP: response
    HTTP-->>Client: parsed response
    Client-->>App: ChatCompletionResponse

Tower Middleware Stack¶

The Rust core uses Tower for composable middleware. Each layer wraps the LlmService and can inspect or modify requests and responses.

graph TD
    Request["Incoming Request"]
    Request --> Tracing
    Tracing["TracingLayer<br/><small>OpenTelemetry GenAI spans</small>"]
    Tracing --> Cost
    Cost["CostTrackingLayer<br/><small>USD cost estimation</small>"]
    Cost --> RateLimit
    RateLimit["ModelRateLimitLayer<br/><small>per-model RPM / TPM</small>"]
    RateLimit --> Cache
    Cache["CacheLayer<br/><small>LRU with TTL</small>"]
    Cache --> Cooldown
    Cooldown["CooldownLayer<br/><small>backoff after errors</small>"]
    Cooldown --> Fallback
    Fallback["FallbackLayer<br/><small>backup service on failure</small>"]
    Fallback --> Health
    Health["HealthCheckLayer<br/><small>periodic probes</small>"]
    Health --> Service
    Service["LlmService<br/><small>DefaultClient</small>"]

Layers are optional and composable. Use tower::ServiceBuilder to stack only the layers you need:

use liter_llm::tower::{CostTrackingLayer, LlmService, TracingLayer};
use tower::ServiceBuilder;

let client = liter_llm::DefaultClient::new(config, None)?;
let service = ServiceBuilder::new()
    .layer(TracingLayer)
    .layer(CostTrackingLayer)
    .service(LlmService::new(client));

Layer	Purpose	Default
`TracingLayer`	OpenTelemetry GenAI semantic conventions	Off
`CostTrackingLayer`	USD cost estimation via embedded pricing	Off
`ModelRateLimitLayer`	Per-model RPM and TPM enforcement	Off
`CacheLayer`	In-memory LRU (256 entries, 300s TTL)	Off
`CooldownLayer`	Deployment cooldowns after transient errors	Off
`FallbackLayer`	Route to backup service on failure	Off
`HealthCheckLayer`	Periodic health probes	Off

How Bindings Work¶

Each binding crate is a thin wrapper. It:

Accepts language-native types (Python dicts, JS objects, Go structs)
Converts them to Rust core types (ChatCompletionRequest, etc.)
Calls the Rust DefaultClient methods
Converts the Rust response back to language-native types
Maps Rust Result::Err to language-native exceptions/errors

No business logic lives in the binding layer. If a bug is fixed in the Rust core, all bindings get the fix automatically.

Async bridging

Each binding bridges Rust async (Tokio) to the host language's concurrency model: Python asyncio, Node.js Promises, Go goroutines, C# async/await, Elixir processes, etc. See Streaming for details.