Changelog¶
Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[Unreleased]¶
Changed¶
- Rebrand to Xberg. All published package coordinates move off the
kreuzbergbrand: npm scope@kreuzberg/*→@xberg-io/*(main, the six napi platform packages,-wasm, and-cli); the Java/Maven and Kotlin Android namespacedev.kreuzberg.literllm[.android]→io.xberg.literllm[.android](group iddev.kreuzberg→io.xberg), which relocates the generated Java/Kotlin source trees fromdev/kreuzberg/…toio/xberg/…. Docs, badges, the ecosystem block (now links theXbergproduct atgithub.com/xberg-io/xberg), the brand heading ("Part of Xberg.io"), and CI publishing references (owner: xberg-io,ghcr.io/xberg-io, packagistxberg-io, thexberg-dev-publisherapp,xberg-io/homebrew-tap) are updated to match. The legal entity name "Kreuzberg, Inc." is unchanged. (alef.toml, templates, generated bindings, docs,publish.yaml)
Removed¶
- Drop the unused legacy
packages/typescriptwrapper package. It was a re-export barrel of the native@xberg-io/liter-llmbinding and was never published. alef no longer generates it (the napi backend stopped emittingpackages/typescript); the canonical TypeScript surface is the native package's bundledindex.d.ts. Removed the directory and its references in the workspace list,dependabot.yaml, thevalidate-versionsmanifest path (nowcrates/liter-llm-node/package.json), thenode:typechecktask, and the oxlint exclude.
[1.8.2] - 2026-06-23¶
Fixed¶
- Publish (zig): the published zig package compiles again. The "Stage FFI artifacts" step bundled a stale top-level
crates/liter-llm-ffi/liter_llm.h(old ABI, missingliterllm_create_client/*_from_json) into theffi-*artifact, so the zig wrapper's@cImportresolved against an out-of-date header and every test failed to compile. The step now stages the canonicalinclude/liter_llm.h(and fails hard if absent); the orphaned top-level header is removed. - Publish (kotlin-android): the AAR no longer corrupts on download. Both ABI matrix legs emitted the same filename
liter-llm-android-release.aar; the publish job'sdownload-artifact … merge-multiple: truecollided them into one torn zip (bad zipfile offset). Each leg now uploads an ABI-suffixedliter-llm-android-<abi>.aar. - Publish (swift): a rerun can no longer desync the artifact-bundle checksum. The swift-artifactbundle job now reuses an existing release asset and its checksum instead of rebuilding a non-reproducible bundle, keeping the hosted bundle consistent with the
swift-<version>tag. - Swift test-app scaffold pins the checksum-bearing ref. Regenerated from alef 0.26.7: the registry-mode
Package.swiftnow uses.package(url:, branch: "release/swift/<version>")(which carries the substituted checksum) instead offrom: "<version>"(which resolved the placeholder-bearing SemVer tag).
Changed¶
- Bindings regenerated from alef 0.26.7 (from 0.26.5): swift e2e scaffold branch pin and kotlin-android Gradle plugin/dependency bumps. Generated node e2e/test_apps emit
.npmrcdisabling frozen-lockfile.
[1.8.1] - 2026-06-23¶
Fixed¶
- CI E2E (kotlin_android): the host-JVM test project compiles again. The generated
MockServerListenerimplements the JUnit PlatformLauncherSessionListenerSPI (referencingLauncherSession/LauncherSessionListenerat compile time), but the e2ebuild.gradle.ktsscopedjunit-platform-launcherastestRuntimeOnly, socompileDebugUnitTestKotlinfailed with "Unresolved reference 'launcher'". Regenerated from alef 0.26.5, which now scopes ittestImplementation. - CI E2E (swift): the e2e link step now finds
liter_llm_ffi. The swift e2ebeforestep builtliter-llm-swiftand the mock server but neverliter-llm-ffi, solibliter_llm_ffi.awas absent from the linker search path (ld: library 'liter_llm_ffi' not found). The[crates.test.swift].beforestep now also runscargo build --release -p liter-llm-ffi. - CI E2E (node): the lockfile is back in sync.
pnpm-lock.yamlwas stale againstcrates/liter-llm-node/package.json(missing@napi-rs/cli ^3.6.2) after the 1.8.0 bump, failingpnpm install --frozen-lockfile. Regenerated.
Changed¶
- Bindings regenerated from alef 0.26.5 (from 0.26.3): Swift bridge-glue re-materialization runs after the bridge crate is built, opaque-handle aliasing avoids capsule import collisions, and JSON-string overloads emit positional arguments for underscore-prefixed parameters.
[1.8.0] - 2026-06-22¶
Added¶
- MCP server: tool annotations on every tool. Each MCP tool now advertises rmcp
ToolAnnotations(a human-readable title plusreadOnlyHint/destructiveHint/idempotentHint/openWorldHint) so clients can present them and decide auto-approval. Query tools are read-only;create_*mutate without being destructive;delete_*/cancel_*are destructive and idempotent; all reach external providers (openWorldHint). - MCP server: prompts, resources, and argument completion. Beyond tools, the server now exposes reusable prompt templates (
summarize,translate,extract), catalog resources (liter-llm://models,liter-llm://providers, and theliter-llm://pricing/{model}/liter-llm://provider/{name}templates), and argument completion formodel(from the configured models) and providername(from the registry).get_infoadvertises tools, prompts, resources, and completions.
Fixed¶
- Budget middleware: concurrent spend is no longer lost across a window rollover. On weakly-ordered architectures (arm64) the window-reset path could drop a racing
fetch_add; the rollover now subtracts the snapshotted prior total instead of storing zero, preserving every concurrent charge. - CI: the Kotlin Android build provisions the wrapper-declared Gradle version (
gradle-version: wrapper) inci-mobileandci-e2e. AGP 9.2.0 requires Gradle >= 9.4.1; the previous hardcoded8.13pin silently overrode the 9.6.0 wrapper and brokeassembleRelease. - Swift & Dart: the untagged content-union
text()accessor now references the generated payload field instead of a non-existent binding, mirroring the 1.7.6 Kotlin fix. Regenerated from alef 0.26.3. - CI/release: create the draft GitHub Release once in the
preparejob so the Swift artifact-bundle upload (the only upload job without its own ensure-release step) can no longer race ahead of release creation and fail with "release not found". Mirrors the html-to-markdown publish pipeline.
Changed¶
- Bindings regenerated from alef 0.26.3 (from 0.25.60): Swift capsule-pointer bridging via
usizeplus.productdependency wiring,RustBridgeC.hpreserved acrossalef all --clean, and the content-union accessors above.
[1.7.6] - 2026-06-22¶
Fixed¶
- Kotlin Android: the untagged content-union
text()accessor now references the generatedvalueproperty instead of the non-existentfield0, fixing a:compileReleaseKotlinfailure (Unresolved reference 'field0') that broke the Android AAR build. Regenerated from alef 0.25.60; the legitimatefield0data-class properties (e.g.LiterLlmError.Serialization) are unaffected.
[1.7.5] - 2026-06-21¶
Changed¶
- npm: the primary package is now the bare
@kreuzberg/liter-llm(matching the crawlberg convention), renamed from@kreuzberg/liter-llm-node. The napipackageNameand all per-platform sub-packages use the bare prefix (@kreuzberg/liter-llm-<rid>); the.nodebinaryNamestaysliter-llm-node. (alef.toml, generated node binding, all README badges, docs,pnpm-lock.yaml)
Fixed¶
- Packagist: publish under the canonical
xberg-io/liter-llmvendor (matching html-to-markdown, crawlberg, tree-sitter-language-pack) instead of the legacykreuzberg/liter-llm. Fixes the registry-check coordinate and thepublish-packagiststep (packagist-username,package-name) inpublish.yaml, plus the stale reference inllms.txt. The Maven registry-check coordinate is corrected todev.kreuzberg.literllm:liter-llm. - CI/release:
@kreuzberg/liter-llm-clinow publishes on the main release at the release version (previously decoupled on a separatecli-proxy-v*tag), gated behind the CLI-binary upload so the npm wrapper never ships ahead of the binaries it downloads. - CI/release: disable sccache for release build jobs so a transient sccache cache/DNS failure can no longer block a publish run (
publish.yaml). - docs: refresh stale install snippets — Java/Kotlin/Swift/Zig version pins, the Java Maven coordinate, and the Elixir version range in
docs/getting-started/installation.md,docs/index.md, andllms.txt.
[1.7.4] - 2026-06-19¶
Changed¶
- chore(precommit,alef): standardize kotlin-android formatting on ktfmt --kotlinlang-style. Drop the conflicting prek ktlint hook (it ran a destructive
--formatthat fought ktfmt and rewrote alef's///doc comments), scope ktfmt topackages/kotlin-androidwith--kotlinlang-style, switchalef.tomlkotlin format/check from gradle-ktlintFormat to ktfmt so alef and prek agree, and exclude the vendored Gradle wrapper from shellcheck. detekt remains. (.pre-commit-config.yaml,alef.toml)
Fixed¶
- Content-union e2e gate completed across the remaining bindings (regenerated with alef 0.25.49). The Dart binding now injects the
AssistantContent.text()extension even when FRB emits the freezed mixin clause (sealed class … with _$… {); the PHP flat-enum conversion always emits an exhaustive wildcard for the&strtag match, soalef(skip)'d-Defaultenums (Message,ContentPart,AssistantPart,OcrDocument) compile; and the Node/NAPI binding keeps bothFromconversion directions for plain data enums used as struct fields (e.g.AuthHeaderFormatinCustomProviderConfig). Local e2e green for Python, Node, WASM, Elixir, and PHP.
Documentation¶
- Bindings parity docs. Added a feature-by-language support matrix (chat, streaming, tool calling, embeddings, multimodal in/out, call idiom) across all 14 native bindings plus the C/FFI surface, expanded the multimodal cookbook with idiomatic examples in nine languages, and reconciled the binding count to a single "14 native bindings + C/FFI surface" across the README and docs (added Dart, Swift, Kotlin Android, and Zig to the README Language READMEs table).
[1.7.3] - 2026-06-19¶
Fixed¶
- Cross-language e2e for multimodal
message.content. The remaining bindings can now string-assert the assistant content union (AssistantContent), fixing the e2e suites that broke on it: Kotlin and Dart get atext()accessor on the sealed class; WASM returns the display text (String) instead of a discriminant; Swift renders property access for first-class result structs; Elixir reads the NIF struct's.text; PHP calls the message'stext()accessor. Generated via alef 0.25.48 (untagged_union_text_types+fields_display_as_textextended to all backends). - Android AAR packaging guard — the Kotlin/Android publish now stages the cross-compiled
JNI
.solibs into the AAR and fails loudly ifjni/would be empty, so a jni-less AAR can never be published.
[1.7.2] - 2026-06-18¶
Added¶
DisplayforAssistantContent— renders the message text (Textvariant verbatim;Partsvariant concatenates its text segments, skipping non-text parts), enabling string assertions onmessage.contentacross the polyglot e2e suites.
Fixed¶
- Cross-language e2e content assertions — the generated e2e suites stringified
choices[0].message.contentwith plain-string casts that fail for the multimodalAssistantContentunion (Gostring(*ptr), JavaObjects.toString, C#.ToString(), Rust.as_deref()). alef 0.25.45'sfields_display_as_textconfig now emits the per-language text accessor so the assertions compile and assert text content.
[1.7.1] - 2026-06-18¶
Fixed¶
- FFI build under
-D warnings— the regeneratedliter-llm-fficrate referenced#[cfg(feature = "wasm-http")]without declaring the feature, producingunexpected cfg condition value: wasm-httperrors that broke CI Rust, CI E2E (Build FFI), CI Mobile, and the crates.io publish in v1.7.0. alef now declares non-default passthrough features via the configurable[crates.ffi] extra_featureskey, sowasm-httpis declared (but not enabled) and survives regeneration. - Java PMD on generated DTOs — the new
DecodedDataUrl { mime, byte[] data }value object trippedArrayIsStoredDirectly/MethodReturnsInternalArray; alef's PMD ruleset now excludes these for generated DTOs. - Docs strict build — fixed a broken intra-doc link in
docs/usage/multimodal.md(../concepts/providers.md→../providers.md).
[1.7.0] - 2026-06-18¶
Added¶
- Typed multimodal builders —
liter_llm::image::{encode_data_url, decode_data_url, DecodedDataUrl}withIMAGE_PNG/IMAGE_JPEG/IMAGE_WEBP/IMAGE_TIFFMIME constants.decode_data_urlreturns a namedDecodedDataUrl { mime, bytes }struct rather than a tuple so polyglot bindings extract it as a typed object. Message::user_with_parts(parts)— ergonomic constructor for multimodal user messages.ContentPart::{text, image_data_url, image_url, image_with_detail, image_png, image_jpeg, image_webp, image_tiff}— typed constructors replacing hand-rolled struct construction.ResponseFormat::{json_schema, json_object, text}+JsonSchemaFormat::new(name, schema).strict(bool).description(d)fluent builder.newdefaults tostrict = Some(true). Provider-mapping rustdoc onResponseFormat(OpenAI passthrough, Gemini/VertexresponseMimeType+responseSchema, Anthropic system-instruction injection).- Multimodal output:
AssistantContentenum (Text/Parts) with#[serde(untagged)]back-compat;AssistantPart(Text/Refusal/OutputImage/OutputAudio) with#[serde(tag = "type", rename_all = "snake_case")];AssistantMessage::{text, refusal_text, output_images, output_audio}accessors;Message::{assistant_with_parts, system_with_parts}constructors;ChatCompletionRequest.modalities: Option<Vec<Modality>>withModality::{Text, Audio, Image}. Vertextransform_responsepreservesinline_dataasOutputImage/OutputAudio(no base64 re-encode); OpenAItransform_responsehoistsmessage.audiointoParts([Text(transcript), OutputAudio]).
Changed¶
- BREAKING:
AssistantMessage.content: Option<String>→Option<AssistantContent>. Back-compat viaFrom<String>/From<&str>forAssistantContentand the untagged serde variant — providers returning scalarcontentstrings still deserialize asText(_). - BREAKING:
SystemMessage.content: String→UserContent. Back-compat viaFrom<String>/From<&str>forUserContent. liter-llmmakesbase64an unconditional dependency (previously gated behindnative-http/wasm-http) —liter_llm::image::*helpers are transport-agnostic.- All polyglot bindings regenerated for the multimodal surface (alef 0.25.40 pin). PHP/WASM extraction of
#[serde(untagged)]enums (AssistantContent) and nested complex enum-variant types (OutputImage { image_url: ImageUrl },OutputAudio { audio: AudioContent }) is resolved upstream; bindings construct/access multimodal types via each language's native idiom.
[1.6.4] - 2026-06-17¶
Changed¶
- Gate
etcd-clientbehind optionaletcd-watchfeature; default CLI builds no longer requireprotoc. Resolves the Homebrew bottle source-build failure on v1.6.3. - Bump
alefcodegen pin to 0.25.24 (swift opaque-handle class triples, dart() -> ()cleanup, kotlin per-file ktfmt invocation, java PMD/palantir-java-format alignment, c e2e FFI tarball-name alignment, phpexclude_functionsfiltering through to user-facing wrapper). liter_llm::tower::metrics::global_meter()is nowpubso downstream crates (liter-llm-proxy) can read the shared OTel meter without re-initialising.
Fixed¶
ContentPartcrate-root shadow —src/lib.rsdidpub use types::*(which bringstypes::ContentPartto the crate root) and then explicitlypub use realtime::ContentPart, causing the realtime variant to shadow the types one. Downstream consumers writinguse liter_llm::ContentPart;received the realtime variant, which has noImageUrlvariant, producingE0599: no variant named 'ImageUrl' found for enum 'liter_llm::ContentPart'in any VLM-OCR call site. The realtimeContentPartre-export is removed fromlib.rs; callers that need it must import it explicitly asliter_llm::realtime::ContentPart. A compile-time regression test insrc/tests.rsasserts thatcrate::types::ContentPart::ImageUrlis constructible through the crate root.- rustdoc ICE in
liter-llm— bracketed intra-doc links to private items (MAX_POOL_BUFFER_CAPACITY,post_stream_with_cancel,post_json,post_json_raw,retry::should_retry,LiterLlmError,ConfigDrivenProvider::transform_request) inhttp//provider/triggered an internal compiler error on rustc 1.95.0 duringdoc_link_resolutions. Stripped the link brackets so the doc strings reference the names as plain code identifiers. Also fixed bare URL inCustomProviderConfig::base_url. - Missing-docs cleanup — added documentation for
UsageSinkErasedtrait and method, the fourtenantmodules (context,etcd,in_memory,resolver), thehttp::transportmodule, and theRouterError::Discover { source }struct field.
[1.6.3] - 2026-06-16¶
Fixed¶
- WASM build:
tokio::timeinsidewait_for_batch_impl—crates/liter-llm/src/client/mod.rspolled withtokio::time::Instantandtokio::time::sleepunconditionally, buttokiois gated behind thenative-httpfeature. Underwasm-http-only builds (the WASM crate) compilation failed withE0433: cannot find module or crate 'tokio'. The function now switches toweb_time::Instant+gloo_timers::future::sleepontarget_arch = "wasm32", withweb-timeadded as an optional dep on thewasm-httpfeature. - WASM build: leaked tower DTOs — alef regen at v1.6.0 started emitting
From<liter_llm::tower::{CircuitState, HealthStatus, IntentPrototype, SingleflightResult}>impls incrates/liter-llm-wasm/src/lib.rs, buttoweris not enabled underwasm-http. Added the four types toalef.toml[crates.wasm].exclude_typesalongside the existing config-DTO exclusions. - Ruby gem build: missing feature declarations —
packages/ruby/ext/liter_llm_rb/native/Cargo.tomldid not declarenative-http/opendal-cache/wasm-http, so alef-emitted#[cfg(feature = "native-http")]gates insrc/lib.rsresolved as false and theensure_crypto_providercall site failed withE0425: cannot find value in this scope. Regen now writes the features section. Affected every Ruby gem variant (linux-x86_64, linux-aarch64, macos-arm64). - PHP windows builds:
ext-php-rsmacro lookup —LiterLlmApi::ensure_crypto_providerresolution failed inside#[php_impl]-generated code ontarget_os = "windows"(E0599: no associated function or constant). Other bindings (PyO3, NAPI, Rustler) handle the same pattern correctly, so this looks like an ext-php-rs upstream gap. Excludedensure_crypto_providerfrom the PHP binding entirely viaalef.toml— the function is a no-op on Windows anyway, and downstream PHP users rely on transitive invocation from internalreqwest::Clientconstructors. - PHP PIE matrix:
php8.5macos-arm64 —shivammathur/setup-php@v2does not yet ship a PHP 8.5 image for macOS arm64 runners (PHP 8.5 was released November 2025). Excluded that matrix cell frompublish.yamluntil the upstream action catches up. Linux + Windows PHP 8.5 builds remain in the matrix.
[1.6.2] - 2026-06-16¶
Fixed¶
to_singleflight_errordead-code lint — gatedLiterLlmError::to_singleflight_errorwith#[cfg(feature = "tower")]so the method is not emitted whencargo publish --verifybuilds with default features (default = ["native-http"]). The method's only call sites live insrc/tower/cache_singleflight.rs, which is itself feature-gated. Under-D warningsthe dead-code lint rejected the build, blockingPublish crates.ioand every per-platformBuild CLI binary,Build WASM, andBuild Kotlin Android nativesjob in the v1.6.0 and v1.6.1 release workflows. No artifacts reached PyPI, crates.io, npm, RubyGems, Maven Central, Packagist, Hex, NuGet, pub.dev, or the Homebrew tap from those tags.- Release-runner
protoctoolchain dep —xberg-io/actions/setup-rust@v1.8.70now installsprotobuf-compileron every Linux/macOS/Windows runner.etcd-client v0.15'sbuild.rsshells out toprotoc;liter-llm-clipullsetcd-clienttransitively throughliter-llm-proxy, so the v1.6.1 CLI binary builds panicked withFailed to compile proto files: Could not find protoc. v1.6.2's publish workflow consumes the floatingv1tag which now carries the fix.
[1.6.1] - 2026-06-16¶
Fixed¶
- FFI crate
wasm-httpfeature — declaredwasm-httpas a no-op feature onliter-llm-ffi/Cargo.tomlso the alef-emitted#[cfg(any(feature = "native-http", feature = "wasm-http"))]gates incrates/liter-llm-ffi/src/lib.rsresolve undercargo build -D warnings. v1.6.0's publish workflow failed on every Rust FFI, CLI, WASM, and Kotlin Android native build withunexpected_cfgserrors because the gate was emitted without the corresponding feature declaration.
[1.6.0] - 2026-06-16¶
Bindings¶
- All 14 language bindings regenerated against alef
0.25.18. Covers Python (PyO3), TypeScript (NAPI-RS), Ruby (Magnus), PHP (ext-php-rs), Elixir (Rustler), Go (cgo), Java (Panama FFM), C# (P/Invoke), Swift (swift-bridge), Dart (flutter_rust_bridge), Kotlin Android (JNI), Zig, C, and WASM. - Tier A/B/C API triage applied to the v1.6.0 surface: internal types stay Rust-only (Tier A), trait-generic helpers like
wait_for_batch_impland theBatchRetrievertrait stay Rust-only (Tier B), and binding-exposed concrete methods likeDefaultClient::wait_for_batchcross the FFI boundary (Tier C). Tier B trait methods (BatchRetriever::fetch_batch_for_polling) now emit correctly in JNI without a workaround. - Swift
chat_streamstreaming adapter restored across all bindings. The swift-bridge extern block emitted by alef now declarestype DefaultClient;inside the streaming extern block so the owner reference resolves.
Tooling¶
- alef bumped from
0.23.7→0.25.18. New upstream fixes consumed: - JNI backend emits per-trait
use core_crate::{path};clauses for non-root trait methods. - swift-bridge streaming-adapter extern block declares owner types alongside handle types.
cleanup_orphaned_filesrecognises hash-less stale files (self-referential// auto-generated by alefheader).- Dart and PHP
cfg-feature forwarding fixes for thewasm-httpfeature gate.
Added¶
KeyContext::tenant_id: TenantId— tenant identifier carried in every resolved auth context. Master-key auth always setsTenantId("master")(seeMASTER_TENANT_IDconstant); virtual-key auth propagates thetenant_idreturned by the configuredKeyResolver.KeyContext::from_resolved(key_id, &ResolvedKey)— constructor that builds aKeyContextfrom aKeyResolver-resolved record, preserving the canonicaltenant_idfrom the resolver rather than falling back to the raw key token.auth::MASTER_TENANT_ID: &str— well-known constant ("master") used byKeyContext::master()so downstream budget and usage layers can identify master-key traffic without a special-case enum.
Changed¶
KeyContextnow carriestenant_id: TenantIdresolved by the configuredKeyResolver. Master-key auth resolves toTenantId("master").- Every proxy HTTP handler now propagates
tenant_idintoLlmRequest::with_tenant_id, activatingBudgetLedger::Tenant,TenantScopedStrategy, andUsageEvent.tenant_idfor all in-proxy traffic. Thedispatchhelper inroutes/mod.rsapplieswith_tenant_idcentrally;chat.rs(which bypassesdispatchfor stream/non-stream branching) sets it inline. -
validate_api_keymiddleware now resolves virtual keys viaAppState.key_resolver.resolve()(previously usedKeyStore.get()directly), so customKeyResolverbackends injected viaProxyServer::with_key_resolverreceive calls for all virtual-key traffic. -
ProxyServer::with_key_resolverandProxyServer::with_usage_sinkbuilder methods for embedder-supplied dependencies. Default behaviour unchanged. AppState.usage_sink: Option<Arc<dyn UsageSinkErased>>;HooksLayeris now wired outermost in the Tower stack when a sink is configured.UsageEvent.effective_model: Option<String>— provider-echoed model name from the response, distinct frommodelwhich is the requested name. Populated forChat,Embed,Moderate,Ocr, andSearchresponse variants;Nonefor streaming, speech, transcription, rerank, image generation, and list-models variants, and on error/timeout paths.UsageEvent.cache_stateis now accurate. Previously hardcodedBypass; now reflectsMiss,ExactHit,SemanticHit, orStaleHitset byCacheLayerandSingleflightService(followers) via atokio::task_local!cell read byHooksLayerafter the inner service resolves. Requests with noCacheLayerin the stack continue to reportBypass.DefaultClient::wait_for_batch(batch_id, WaitForBatchConfig)— poll a batch until terminal status (Completed, Failed, Expired, Cancelled) with exponential backoff, configurable intervals, and optional timeout. ReturnsOk(BatchObject)on completion,Err(BatchWaitError::{Failed, Timeout, Client})on failure.WaitForBatchConfig— configuration struct withinitial_interval_secs: f64,max_interval_secs: f64,backoff_multiplier: f32, and optionaltimeout_secs: Option<f64>fields; implementsDefaultwith 5.0s initial, 60.0s max, 1.5x multiplier, no timeout. All fields usef64seconds (notDuration) for FFI bridgeability.BatchWaitError— error enum with struct-style variantsFailed { status: BatchStatus },Timeout { timeout_secs: f64 }, andClient { message: String, code: u32 }for batch polling failures. All fields are FFI-friendly primitives.liter_llm::observability::{UsageEvent, UsageSink, CacheState, UsageEventOutcome, UsageSinkError, LoggingUsageSink, MultiUsageSink}— canonical per-request usage events with pluggable sinks.UsageEventis billing/observability-vendor-agnostic; downstream consumers translate it into metrics, ledgers, or OTel events.HooksLayer::with_usage_sink— wires aUsageSinkinto the Tower stack; emits one event per request completion (success or error). Sink errors are best-effort: logged and not propagated to callers.liter_llm::tower::IdempotencyLayer—Idempotency-Keydedup with pluggableIdempotencyStore(defaultInMemoryIdempotencyStoreviaDashMap, 24h TTL). Mismatched body for same key returnsLiterLlmError::IdempotencyConflict; in-flight key returnsLiterLlmError::IdempotencyInFlight(error-out, not sleep-poll).LlmRequest::idempotency_key: Option<String>field andLlmRequest::with_idempotency_keybuilder for opt-in idempotency.LiterLlmError::IdempotencyConflict { key: String }andLiterLlmError::IdempotencyInFlight { key: String }error variants (HTTP 409 equivalent).liter_llm::tower::FallbackChainLayer— walk an orderedVec<S>of services, advancing on transient errors via pluggableRetryPolicy.DefaultRetryPolicytreats 5xx/timeouts/429 as transient; auth and validation errors as terminal. ExportsRetryClass,RetryPolicy,DefaultRetryPolicy,FallbackChainLayer, andFallbackChainService.liter_llm::tenant::{TenantId, TenantContext, KeyResolver, ResolvedKey, KeyResolverError, InMemoryKeyResolver}— generic multi-tenant primitives.liter_llm::tenant::EtcdKeyResolver— distributedKeyResolverbacked by an etcd cluster. Behind featureetcd-key-resolver. Reads JSON-serialisedResolvedKeyat{prefix}/{sha256(api_key)}. Configure viaEtcdKeyResolverConfig(endpoints, prefix, timeouts, optional auth).LlmRequest::with_tenant_id/LlmRequest::tenant_idfor tenant propagation through the Tower stack.LlmRequestKind— the discriminant enum extracted fromLlmRequestto carry the variant payload; re-exported fromliter_llm::tower.
Changed¶
- Migration —
RealtimeEvent::RateLimitsUpdated: fieldreset_at: SystemTimerenamed toreset_at_unix_ms: i64(Unix milliseconds). Pattern-match sites must update the field name; the value can be reconstructed withSystemTime::UNIX_EPOCH + Duration::from_millis(reset_at_unix_ms as u64)if needed. - Migration —
WaitForBatchConfig: fields renamed for FFI bridgeability —initial_interval: Duration→initial_interval_secs: f64,max_interval: Duration→max_interval_secs: f64,timeout: Option<Duration>→timeout_secs: Option<f64>. Update all construction sites to usef64seconds. - Migration —
BatchWaitError: variants changed to struct-style for FFI bridgeability —Failed(BatchStatus)→Failed { status: BatchStatus },Timeout(Duration)→Timeout { timeout_secs: f64 },Client(LiterLlmError)→Client { message: String, code: u32 }. Update all pattern-match and construction sites. LlmRequestis now a struct (kind: LlmRequestKind,tenant_id: Option<TenantId>) rather than a plain enum. All existing constructor call sites (LlmRequest::Chat(r),LlmRequest::Embed(r), etc.) continue to compile unchanged via#[allow(non_snake_case)]associated functions. Pattern-match sites that directly match onLlmRequest::Variantmust be updated to match onreq.kind.liter-llm-proxyAppStategainskey_resolver: Arc<dyn KeyResolver>alongside the existingkey_store: Arc<KeyStore>.KeyStoreimplementsKeyResolver; behaviour is unchanged.- Cache (
CacheService,SingleflightService) readstenant_idfrom the request viaLlmRequest::tenant_id()so cached responses are scoped to the correct tenant automatically. KeyResolver::resolvenow takes an ownedStringand returns a'staticfuture so the future can be spawned acrosstower::Service::callboundaries.ProviderCredential.api_keyis nowsecrecy::SecretString(wasString); zeroed on drop,Debugimpl redacts to[REDACTED].liter-llm-proxyactivatesliter_llm::provider::set_outbound_policyat startup based onSecurityConfig.outbound_policy. Defaults toDenyPrivate(blocks RFC1918, loopback, link-local, multicast, unspecified);Allowlistparse errors are logged and skipped with a startup summary line.liter-llm-proxyvirtual-key lookup usessubtle::ConstantTimeEqover a full iteration of the key store, preventing timing-side-channel inference of key existence.tower::HooksLayer::with_usage_sinknow dispatches sink emit viatokio::spawn; slow sinks no longer add latency to caller-observed responses.
Fixed¶
tower::circuit—maybe_half_openis wired into the request path; circuits no longer get permanently stuckOpenafter tripping.probe_in_flightis now aProbeGuard<P>RAII enum so the probe slot is reclaimed even if the future panics or is cancelled.should_allowgates HalfOpen to exactly one probe via atomic CAS.tower::hedge—HedgeService::callusesmem::replacecorrectly so the polled-ready inner becomes the primary attempt and hedge attempts clone from a fresh standby; eliminates Tower-readiness contract violation andConcurrencyLimitpermit double-consumption.tower::cache::CacheService::call— applies themem::replace(&mut self.inner, self.inner.clone())Tower swap so permit-bearing inner services (e.g.ConcurrencyLimit,Buffer) are not double-consumed.tower::cache_singleflight—tx.send(result)now precedesmap.remove(&key)so late-arriving followers don't become duplicate leaders. AddedLeaderDropGuardto clean up the in-flight map when the leader is cancelled; leader errors propagate to followers with variant preserved (no longer mapped toInternalError).tower::cache_key::ExactHashStrategy— usesahash::RandomState::generate_with(fixed_seeds)viaOnceLockso cache keys are deterministic across process restarts and distributed nodes (wasstd::hash::DefaultHasher, which is randomized per process).tower::idempotency::compute_body_hash— same deterministic-ahash fix as the cache key strategy; required for distributedIdempotencyStorebackends to function.tower::idempotency::IdempotencyService::call— store key is nowformat!("{tenant_id}:{idempotency_key}")so two tenants with the same idempotency key value do not cross-pollinate responses.tower::fallback_chain::FallbackChainService::call— drivessvc.ready().await?before eachsvc.call(...), restoring the Tower readiness contract for chain elements (previously silently bypassedConcurrencyLimitpermits).FallbackChainLayerimplementsLayer<()>with aprepend(head)helper forServiceBuildercomposition.tower::budget— window rollover is atomic via CAS; concurrent writers across the rollover boundary cannot torn-read a $0 window mid-zero-then-add.liter-llm-proxy::routes::realtime— the WebSocket upgrade now resolves the upstream credential via the per-modelProviderCredentialregistry instead ofgeneral.master_key; restricted virtual keys can no longer trigger Realtime sessions that bill the master key. The handler also callskey_ctx.can_access_model(&model)beforews.on_upgrade, so the model allowlist is enforced before any upstream connection is opened.liter-llm-proxy::secrets(aws.rs,vault.rs) — cache values are stored assecrecy::SecretStringso secrets are zeroed on TTL eviction; previously plainStringleft key material on the heap.liter-llm-proxy::secrets::vault—HashCorpVaultProviderBuilder::buildvalidates the address against the outbound policy (validate_outbound_url_sync) before constructing theVaultClient; misconfigured addresses pointing at internal endpoints (link-local, loopback, RFC1918) are rejected withSecretError::Forbidden.guardrail::cel— CEL evaluation errors are no longer returned to the caller verbatim (could leak expression internals); callers see a fixed"policy evaluation error"reason while the full error is logged server-side viatracing::error!.http::transport::TransportConfig— re-exported at the crate root so the rustdoc example compiles; previously the type was reachable only via thepub(crate)httpmodule.- Trait re-exports —
CacheKeyStrategy,Guardrail,GuardrailContext,GuardrailDecision,GuardrailStage,VectorStore,VectorMatch,EmbeddingProvider,NoOpEmbeddingProvider,TenantId,TenantContext,KeyResolver,ResolvedKey,KeyResolverError,InMemoryKeyResolver,IdempotencyStoreErrorare now reachable fromliter_llm::*orliter_llm::tower::*without spelling out the full module path.
Security¶
- All findings from three rounds of critical security audit are resolved: per-model credential isolation on the Realtime route, model-allowlist enforcement before WebSocket upgrade, constant-time virtual-key lookup,
SecretStringfor cached secrets and provider credentials (zeroed on drop), SSRF outbound-policy guard activated at proxy startup, Vault address validation, CEL error redaction, fail-closed CEL guardrails, and JSON-tree redaction for regex guardrails.
[1.6.0-rc.0] - 2026-06-15¶
Added¶
tower::circuitmodule —CircuitPolicytrait withExponentialBackoffCircuitdefault impl,CircuitStateenum (Closed→Open→HalfOpen),CircuitLayerandCircuitServicefor fault isolation. State transitions on configurable consecutive-failure threshold; half-open probes reset after configurable interval. (crates/liter-llm/src/tower/circuit.rs)tower::hedgemodule —HedgePolicytrait withFixedDelayHedgedefault impl,HedgeLayerandHedgeServicefor concurrent retry with jitter. Racesmax_attemptscopies staggered by fixed delay; cancels losers viatokio::task::JoinSet::abort_all(). Fast path whenmax_attempts == 1skipsJoinSetentirely. (crates/liter-llm/src/tower/hedge.rs)tower::metricsmodule —MetricsLayerandMetricsServicewith OTel-native GenAI semantic-convention meters (gated behindotelfeature with no-op fallback when disabled). Emits:gen_ai.client.operation.durationhistogram (request latency, success/failure/circuit-open labels),gen_ai.cache.{hit,miss,stale}counters,gen_ai.circuit.tripcounter,gen_ai.retry.attemptcounter, plusgen_ai.client.token.usagehistogram andgen_ai.request.cost_usdhistogram. Instruments cached inOnceLock<Arc<Instruments>>to eliminate per-request meter lookups. (crates/liter-llm/src/tower/metrics.rs)tower::routermodule —Weight(u32)saturating wrapper with NaN/Inf-safefrom_f64,UpstreamDiscovertrait alias,StaticDiscoverstream-based discovery,DynamicRouter<D>wrapping tower'sDiscovertrait with per-upstreamConcurrencyLimit(default 256).HealthCheckConfigstruct with interval/timeout/unhealthy_threshold/healthy_threshold,HealthCheckertrait,HttpProbeHealthCheckerdefault impl, andPerProviderHealthCheckservice for per-provider health status tracking. (crates/liter-llm/src/tower/health.rs,crates/liter-llm/src/tower/router.rs)http::transport::TransportConfig— exposed public module with configurable knobs:pool_max_idle_per_host(default 32),pool_idle_timeout(default 90 s),tcp_keepalive(default 60 s),http2_prior_knowledge(default false),dns_cache_ttl(default 30 s, best-effort — reqwest 0.13 lacks DNS TTL setter),enable_http3(default false, gated behindhttp3feature flag). Builder pattern with sensible defaults. Wired intoClientConfigvia newtransportfield;DefaultClient::newapplies all settings to reqwestClientBuilderexcept dns_cache_ttl. (crates/liter-llm/src/http/transport.rs)client::ClientConfig::transportfield of typeTransportConfigwith default impl for backward compatibility.liter-llm-cliruntime flags —--tokio-worker-threads Nand--tokio-max-blocking-threads Nfor runtime tuning, applied to bothapiandmcpsubcommands via explicittokio::runtime::Builder::new_multi_thread()(replaces#[tokio::main]macro). Defaults: physical CPU count (workers), 512 (blocking threads).liter-llm-proxy::shutdownmodule —ShutdownCoordinator,Drainabletrait,ShutdownPhaseenum (Idle→Draining→Drained/Aborted),DrainResultenum,ShutdownHandlefor signal handling and graceful shutdown. Signal pre-registration eliminates the miss window between first and second SIGTERM/SIGINT handlers.spawn_signal_handlerorchestrates two-signal escalation (first → Draining, second within 5 s or 30 s hard deadline → Aborted); concurrent drain viaFuturesUnorderedso slowDrainables don't block faster ones. (crates/liter-llm-proxy/src/shutdown.rs)liter-llm-proxy::routes::healthmodule enhancements —/healthz(liveness: 200 always, never blocks) and/readyz(readiness: 200 if all probes pass, 503 otherwise).ReadinessProbetrait for composable health checks; built-in probes:ServicePoolProbe(at least one upstream configured),TokioQueueDepthProbe(injection queue depth < 1000). Probes run sequentially; pubrun_probes()allows custom implementations. (crates/liter-llm-proxy/src/routes/health.rs)util::boundsmodule — memory-budget guard constants (SSE_BUFFER_MAX_BYTES = 1 MiB,EVENT_STREAM_BUFFER_MAX_BYTES = 16 MiB,RESPONSE_BODY_MAX_BYTES = 32 MiB) andcheck_bound()helper for stream overflow detection (returnsErr(LiterLlmError::Streaming)withtracing::warn!if exceeded). (crates/liter-llm/src/util/bounds.rs)- Workspace
[lints.clippy]policy — denycorrectness/suspicious/perf; warnstyle. Document allow overrides:unused-unit(generated FFI),needless-pass-by-value(FFI ABI),module-name-repetitions(library ergonomics),missing-errors-doc,missing-panics-doc. (Cargo.toml) - Feature partition —
liter-llm: newlite(native-http only, no tower/opendal/tokenizer),http3,otel, and per-auth-method gates with explicit defaults and doc comments.liter-llm-proxy:otel,opendal-cache,proxy(named surface); default =proxy.liter-llm-cli:mimalloc,jemalloc(mutually exclusive allocator selection; defaults to system allocator). Workspace dependency pinning added for allocators. (Cargo.toml,crates/*/Cargo.toml) - Global allocator selection —
crates/liter-llm-cli/src/allocator.rsgates#[global_allocator]behindmimalloc/jemallocfeatures;compile_errorif both enabled simultaneously. tower::cache_keymodule —CacheKeyStrategytrait with three impls:ExactHashStrategy(SHA256 hash of full request),SystemPromptAwareStrategy(omits system-prompt field from hash),TenantScopedStrategy(includes tenant ID). All three hash deterministically viaserde_jsonto stable JSON. (crates/liter-llm/src/tower/cache_key.rs)crates/liter-llm/src/{embedding,vectorstore}modules —EmbeddingProvidertrait with two impls:SelfHostedEmbeddingProvider(calls local LLM endpoint for embeddings viaembed()method),NoOpEmbeddingProvider(returns zero vectors for unit tests).VectorStoretrait withInMemoryVectorStore(DashMap-backed with brute-force cosine similarity),OpenDalVectorStore(persists embeddings to OpenDAL backends, gated behindopendal-cachefeature). Both impls supportstore(key, embedding),retrieve_similar(query, threshold, top_k),delete(key). (crates/liter-llm/src/tower/vectorstore/{mod,memory,opendal}.rs,crates/liter-llm/src/tower/embedding.rs)tower::cache.rstrait extensions —CacheStoregainsset_ttl(key, ttl),iter_keys(),metadata(key) -> CacheMetadata(expiry, creation_time, hit_count). Default no-op bodies preserve backward compat for existing impls.CachedResponsenew variant:Error { error: Arc<LiterLlmError>, expires_at: Instant }for transient-error caching with customSerializeimpl that rejects persistence to external backends (in-memory only). (crates/liter-llm/src/tower/cache.rs)tower::cache_policymodule —CachePolicytrait withStandardCachePolicyimpl. Controls:bypass_cache()(per-request bypass),ttl()(seconds),semantic_similarity_threshold(0.85),stale_while_revalidate(5 minutes).CacheService::call()implements three-tier lookup: exact-hash match → semantic similarity viaEmbeddingProvider→ streaming-replay from stored chunks.warm(requests)async warming hook for batch pre-population. (crates/liter-llm/src/tower/cache_policy.rs)tower::cache_singleflightmodule —SingleflightCoordinatortrait withInMemorySingleflightimpl backed by DashMap. Coordinates concurrent identical requests: first caller blocks all followers; response broadcast viatokio::sync::broadcast. Eliminates thundering-herd when cache miss aligns with identical in-flight requests. (crates/liter-llm/src/tower/cache_singleflight.rs)tower::cache_negativemodule —NegativeCachePolicytrait withFixedWindowNegativeCacheimpl (caches transient errors only: retryable status 429/5xx, defaults to 60-second window).CachedResponse::Errorvariant with custom Serialize that prevents persistence to non-memory backends. (crates/liter-llm/src/tower/cache_negative.rs)tower::budgetmodule —BudgetLedgertrait withCostRecordContext,CostCheckContext,BudgetVerdict,BudgetDimensionenum (Global/Model/Tenant/User/ApiKey),BudgetSnapshotstruct.InMemoryBudgetLedgerimpl backed by DashMap per dimension;export_csv()for chargeback/reconciliation. Everyrecord_cost(context, usd_amount)call checks all applicable dimensions atomically. (crates/liter-llm/src/tower/budget.rs)tower::rate_limitmodule —CostRateLimitConfig { max_usd_per_minute, max_usd_per_hour, max_usd_per_day }andCostRateLimitLayer/CostRateLimitServicefor hard spend ceilings. Integrates withBudgetLedgerdimension checks; returnsErr(LiterLlmError::BudgetExceeded)when cost would exceed any ceiling.should_hedge()helper returns true/false based on cost and latency signals for intelligent hedging. (crates/liter-llm/src/tower/rate_limit.rs)tower::metricsOTel additions — new meters:gen_ai.budget.spend_usd(histogram, labeled by dimension),gen_ai.budget.rejection(counter, labeled by dimension + reason). Emitted byBudgetLedgerimpls andCostRateLimitService. (crates/liter-llm/src/tower/metrics.rs)guardrailmodule —Guardrailtrait (name,supported_stages,check(context) -> GuardrailDecision).GuardrailStageenum (Input/Output/OutputChunk).GuardrailDecisionenum (Allow/Block/Mutate).GuardrailContextstruct (request, response, reason). Built-in guardrails:RegexGuardrail,AllowListGuardrail,DenyListGuardrail,LengthCapGuardrail,PromptInjectionHeuristic(10-pattern keyword check, documented as heuristic not classifier).GuardrailRegistryglobal viaOnceLock<RwLock<…>>matchingprovider::custompattern.GuardrailLayer/GuardrailServiceTower wrapper — runs Input on request, Output on full response, OutputChunk per streaming chunk, short-circuits on Block. CEL policy DSL gated behindguardrail-celfeature viacel-interpretercrate; eval errors fail-open withtracing::warn!. (crates/liter-llm/src/guardrail/{mod,builtin,registry,cel,tests}.rs,crates/liter-llm/src/tower/guardrail.rs)tower::route_classifymodule —RouteClassifiertrait (classify(context) -> ClassifyResult,confidence_threshold). Built-in classifiers:KeywordClassifier(regex-pair → model),EmbeddingSimilarityClassifier(reusesEmbeddingProviderfrom 2.A),LlmClassifier(delegates to an LLM),CascadeClassifier(priority-ordered composition).ClassifierVerdictCachecaches verdicts viaCacheStore.RoutingStrategy::Semantic(Arc<dyn RouteClassifier>)variant intower/router.rs; falls back to round-robin when classifier defers. OTel meters:gen_ai.route.classify.durationhistogram +gen_ai.route.classify.tier{keyword,embedding,llm}.hitcounters. (crates/liter-llm/src/tower/route_classify.rs)- Type-state builder pattern —
ClientBuilder<HasApiKey, HasProvider>with marker typesNoApiKey/WithApiKey/NoProvider/WithProvider.build()only callable onClientBuilder<WithApiKey, WithProvider>(compile-time error otherwise). Enforces API key and provider selection before use. (crates/liter-llm/src/client/builder.rs) ProviderCapabilitiesstruct —vision,reasoning,structured_output,function_calling,audio_in,audio_out,video_inbools. Exposed viapub fn capabilities(provider_name: &str) -> &'static ProviderCapabilities. (crates/liter-llm/src/provider/mod.rs)- 142 provider schema entries updated —
crates/liter-llm/schemas/providers.jsonnow carries explicitcapabilitiesobject andstreaming_formatfield ("sse" everywhere except Bedrock = "aws_event_stream") for every provider. Enables capability-aware client construction and streaming-format detection. (crates/liter-llm/schemas/providers.json)
Phase 3 — Realtime streaming, secret backends, credential rotation, and config hot-reload¶
-
streamingmodule — unified ingress/egress streaming with three composable layers:IngressStream<S, P>(typed SSE decoder),StreamPipeline<S>(ordered per-chunk middleware viaChunkMiddlewaretrait),EgressStream<S>(typed OpenAI SSE encoder). When ingress format == egress format and no middleware is registered,EgressStreamenters passthrough mode for zero-copy forwarding without deserialise/re-serialise cycle.StreamFormat(SSE vs. AWS EventStream) promoted topubfor explicit wire-format selection. Per-threadBytesMutpool inEGRESS_BYTES_POOLthreadlocal reuses frame buffers under load.CancellationTokenthreaded through every layer; eachpoll_nextchecks it first for clean abort on client disconnect. (crates/liter-llm/src/streaming.rs) -
liter-llm-proxy::secretsmodule —SecretManagertrait (object-safe viaPin<Box<dyn Future>>) withget(name) -> SecretValue(field: zeroedSecretString+SecretMetadata),set(name, value, tags),delete(name). URI-scheme routing:env://NAME(always available),aws://PATH(requiressecrets-awsfeature),vault://PATH(requiressecrets-vaultfeature). Built-in impls:EnvVarSecretManager(environment variables),AwsSecretsManagerProvider(AWS Secrets Manager with key rotation warnings),HashCorpVaultProvider(Vault KV-v2 with expiry tracking).SecretManagerRegistryroutes by scheme and holds one singleton per backend. OTel gaugegen_ai.secret.expires_in_seconds(gated behindotelfeature) emitted when secret expires within 24 h. (crates/liter-llm-proxy/src/secrets/{mod,env,aws,vault}.rs,crates/liter-llm-proxy/src/secrets/registry.rs) -
liter-llm-proxy::config::ConfigProvidertrait —load() -> ProxyConfig(single snapshot) andwatch() -> mpsc::Receiver<ConfigEvent>(live updates). Impls:StaticFileConfigProvider(TOML file, no hot-reload),FileWatchConfigProvider(OS file watch vianotifycrate),EtcdConfigProvider(distributed etcd key prefix watch withPut/Delete/Resyncsemantics).ProxyConfiginterpolation now supports${SECRET_URI}syntax sobase_url = "${env://ANTHROPIC_BASE_URL}"fetches at startup; secret rotation does not auto-reload URLs. (crates/liter-llm-proxy/src/config/{provider,watcher}.rs) -
liter-llm-proxy::provider::CredentialPooltrait — rotates per-provider API keys on 429/5xx rate-limit signals. Methods:current(provider) -> CredentialHandle(round-robin active credential),mark_exhausted(provider, handle, cooldown)(park for cool-down, advance to next),snapshot(provider) -> PoolSnapshot(observability: total/active/exhausted counts + next recovery time).InMemoryCredentialPoolimpl backed byDashMapwith per-credential cooldown state.ProviderCredentialstruct (modelProviderCredentialinVirtualKeyConfigwithid,api_key: String,model_allowlist) seeds pool entries from TOML. Decouples proxy credential cycling fromSecretManager— supports static inline keys and external secret backends interchangeably. (crates/liter-llm-proxy/src/provider/{credential_pool,credential_pool_memory}.rs) -
liter-llm::realtimemodule — unified envelope + event types for vendor-neutral realtime streaming.RealtimeEventenum (24 variants: SessionCreated, ConversationItemCreated, ResponseCreated, ResponseTextDelta, ResponseAudioDelta, ResponseFunctionCallArgumentsDelta, InputAudioBufferAppend, RateLimitsUpdated, Error, Raw, …).ContentPartenum (Text, Audio, ImageRef) used in conversation items.ResponseStatusenum (Completed, Cancelled, Failed, Incomplete).RealtimeEnvelopewraps event + optionalevent_id.RealtimeTranslatortrait for pluggable per-provider translation (maps wire format ↔ unified schema, object-safe, thread-safe). Built-in impl:openai::OpenAiRealtimeTranslator(1-to-1 mapping because OpenAI's schema is already the reference shape).crates/liter-llm/src/realtime/{mod,openai}.rs) -
AppStaterefactor —configfield changed fromArc<ProxyConfig>toArc<ArcSwap<ProxyConfig>>for atomic hot-reload without blocking in-flight requests. Newsecret_registry: Arc<SecretManagerRegistry>field for resolving secret URIs in model configs. Callers must callstate.config.load()to obtain a consistent snapshot per request. (crates/liter-llm-proxy/src/state.rs)
Migration notes¶
AppStatenow requiressecret_registry: Arc<SecretManagerRegistry>andconfig: Arc<ArcSwap<ProxyConfig>>fields. Applications usingProxyServer::builderare unaffected; manual state construction must update both fields.- New optional feature flags:
secrets-aws,secrets-vault,secrets-env(env backend always enabled, others optional).mimalloc,jemallocfor allocator selection.http3for HTTP/3 support.tokenizerforcount_tokensavailability. VirtualKeyConfiggains newprovider_credentials: Vec<ProviderCredential>field (defaults to empty). Inline credentials in TOML via repeated[[keys.provider_credentials]]blocks; proxy auto-rotates among them on 429/5xx.- Workspace clippy is now
-D warnings; downstream consumers compiling with strict lints should review suppressions — the main crate is now warnings-clean. LlmRequestpattern-matching change —LlmRequestwas previously an enum; it is now a struct with akind: LlmRequestKindfield.match req { LlmRequest::Chat(r) => ... }no longer compiles. Migrate tomatch req.kind() { LlmRequestKind::Chat(r) => ... }(preferred, using the newkind()accessor) ormatch req.kind { LlmRequestKind::Chat(r) => ... }(direct field access). The PascalCase constructor aliases (LlmRequest::Chat(r)etc.) remain callable for constructing requests and continue to compile unchanged; they are marked#[doc(hidden)]and will be removed in a future minor release.KeyResolver::resolvesignature change — the method now takesapi_key: String(owned) instead ofapi_key: &str, and returnsPin<Box<dyn Future<Output = ...> + Send + 'static>>instead of... + 'a. CustomKeyResolverimplementations must update their signature. Call sites that previously passed a&strliteral must add.to_owned():resolver.resolve("sk-...".to_owned()).
Changed¶
- Bindings regenerated against alef v0.25.9; refreshes all 16 language surfaces, e2e suites, and README templates. New
[crates.e2e.fields_c_types]entrychat_completion_response.usage = "Usage"and per-call C# e2e overrideclass = "LiterLlmConverter"to satisfy alef v0.25.9's stricter intermediate-accessor checks. tower/router.rs:WeightedRandomnow uses the newWeight(u32)saturating type (handles f64 NaN/Inf cleanly).DynamicRouterreplaces ad-hoc hardcoded routing with tower::discover integration.tower/health.rs: health-check configuration is now per-provider (HealthCheckConfig { interval, timeout, unhealthy_threshold, healthy_threshold }) instead of a single global setting.http/streaming.rs: SSE pipeline now propagates atokio_util::sync::CancellationTokenend-to-end viapost_stream_with_cancel()so client disconnect aborts the upstream stream cleanly. ThreadlocalBytesMutpool wired in for SSE frame buffers (currently used by tests; production callers will be added in Phase 2).cli/main.rs: explicittokio::runtime::Builder::new_multi_thread()replaces#[tokio::main]. Worker/blocking-thread counts now configurable.- Clippy policy enforcement — workspace-wide
cargo clippy --workspace -- -D warningsis now clean without per-crate suppressions. Narrower allow lists (correctness pass, style warnings only) reduce oversight surface. tower/cache.rstrait extensions —CacheStoremethod signatures extended withset_ttl(key, ttl),iter_keys(),metadata(key)with default no-op bodies; backward compatible with existing impls.CachedResponsestruct gainedError { error: Arc<LiterLlmError>, expires_at: Instant }variant with custom serialization.tower/router.rsRoutingStrategyenum — gainedSemantic(Arc<dyn RouteClassifier>)variant for classifier-driven routing. Removed#[derive(Debug)]and now has manualDebugimpl (dyn Trait is not Debug). Round-robin fallback when classifier defers.
Fixed¶
- Test diagnostic clarity — all 417 test
.unwrap()calls replaced with.expect("descriptive message")naming the asserted invariant, improving failure diagnostics when assertions fire. Production code paths remain unwrap-clean. (crates/liter-llm-*/src/**) tower/circuit.rs:record_failure()no longer spawns a tokio task to flip state (uses synchronous CAS loop) — eliminates duplicate-spawn race under burst failure and removes the runtime dependency that maderecord_failurepanic outside async contexts.tower/hedge.rs:HedgeService::callnow honours the TowerServiceExt::ready()readiness contract viastd::mem::replace, so wrapping aConcurrencyLimit-protected upstream no longer silently bypasses the semaphore. Hedge fast-path (max_attempts == 1) skipsJoinSetentirely.tower/metrics.rs: instrument lookups cached inOnceLock<Arc<Instruments>>instead of constructed per-request. Removes ~8k redundant meter lookups/sec at 1k req/s production load.http/streaming.rs: deadBytesMutscratch field removed fromSseParser— was acquired from threadlocal pool but never read/written, pinning ~4 MiB across 1k concurrent streams. Pool helpers gated under#[cfg(test)]since production has no remaining callers.liter-llm-proxy/shutdown.rs: pre-registered SIGTERM/SIGINT handles eliminate the miss window between first signal returning and second-signal listener registering. Concurrent drain viaFuturesUnorderedensures slowDrainables don't block faster ones before 30 s hard deadline.liter-llm-proxy/routes/health.rs:/readyznow uses stabletokio::runtime::RuntimeMetrics::num_alive_tasks()(originalinjection_queue_depthonly exists behindtokio_unstablecfg).- 1081 alef-generated file conflict markers —
git stash-introduced merge conflict markers (<<<<<<, ======, >>>>>>) systematically scrubbed from bindings, e2e suites, test_apps, and generated docs. The C#LiterLlmConverter.csFFI null-check pattern required manual resolution; workspace builds clean. (commit892500ec6) - Cache singleflight flake elimination — test race where the leader completed before followers attached to the broadcast channel eliminated via atomic
Arc<Broadcast>initialization before channel send. Fast mock services under parallel load now stable. (commit4e3a3e51e)
Tooling¶
- Workspace clippy lint policy enforcement via
[workspace.lints.clippy]blocks; per-crate suppressions consolidated at source. - Feature flag audit — split composite features (e.g.,
native-httpstill depends onhttp2, now gated on both); avoid silent breakage from feature interaction. - Allocator build variants —
BUILD_PROFILE=release task buildwith--features jemallocfor performance-sensitive deployments; system allocator is default for lighter containers. - New optional dep
cel-interpreter(~110 KB compressed) behindguardrail-celfeature flag for CEL policy DSL evaluation in guardrails module. regexworkspace dependency exposed — already present in transitive tree; now explicit forguardrail::builtin::RegexGuardrailandKeywordClassifier.
[1.5.1] - 2026-06-13¶
Changed¶
- publish workflow: migrate every push, release-asset upload, and homebrew-tap commit to the
kreuzberg-dev-publisher[bot]GitHub App viaactions/create-github-app-token@v2, replacingsecrets.GITHUB_TOKENandsecrets.HOMEBREW_TOKENwith scoped app installation tokens. - Bindings regenerated against the latest alef, refreshing all 16 language surfaces and e2e suites.
Fixed¶
- Dart binding: named parameters and null-safety annotations, plus per-language README sync and updated method/type counts (#133).
- PyO3 0.29 method rename:
pyo3::Bound::downcast_intocallsites incrates/liter-llm-py/src/lib.rsmigrated to the newcast_intoname so the Python binding builds against pyo3 0.29. - PMD ruleset: exclude
UnnecessaryWarningSuppressionfromcategory/java/bestpractices.xml. Alef emits a blanket@SuppressWarnings("PMD")on every generated DTO record; PMD flags some as unnecessary depending on which rules fire on the surrounding record, breaking the Java hook on every regeneration.
[1.5.0] - 2026-06-07¶
Security¶
External security audit identified six exploitable gaps in the v1.4.1 codebase. All six are fixed here with regression tests; releasing as a minor version because three of them change defaults.
- (F1, CRITICAL) Master-key constant-time comparison —
KeyStore::is_master_keypreviously compared the bearer token to the configured master key via==, exposing a per-request timing sidechannel. Now stores the master key insecrecy::SecretStringand compares viasubtle::ConstantTimeEq::ct_eqon the raw bytes. (crates/liter-llm-proxy/src/auth/key_store.rs, newsubtle = "2.6"dep incrates/liter-llm-proxy/Cargo.toml.) - (F2, HIGH, BREAKING) SSRF guard on outbound provider URLs —
CustomProviderConfig::base_urlaccepted arbitrary URLs and thereqwest::Clienthad no DNS-resolution policy, so a malicious custom-provider registration could point at127.0.0.1/169.254.169.254/ RFC1918 networks. Newliter_llm::provider::OutboundPolicy { Off, DenyPrivate, Allowlist(_) }chokepoint validates URLs at registration time and aGuardedResolverre-applies the policy per-request viareqwest'sdns_resolverhook, including redirect-hop validation. Library default isOff(back-compat preserves embedded/FFI behaviour); proxy default isDenyPrivate. NewLiterLlmError::OutboundForbiddenvariant maps to HTTP 502. New TOML key[security] outbound_policy = "deny_private" | "off" | { allowlist = ["…"] }. (crates/liter-llm/src/provider/outbound_policy.rs,crates/liter-llm/src/provider/custom.rs,crates/liter-llm/src/client/mod.rs,crates/liter-llm-proxy/src/config/server.rs,crates/liter-llm-cli/src/commands/serve.rs.) - (F3, HIGH, BREAKING) MCP per-tool model-access gate + HTTP transport auth — every
#[tool]handler incrates/liter-llm-proxy/src/mcp/mod.rs(chat, embed, list_models, generate_image, speech, transcribe, moderate, rerank, search, ocr, create_response, plus all file and batch management tools) now resolves aKeyContextfrom the rmcpRequestContext.extensionsand pre-flight-checkscan_access_model(¶ms.model)oris_masterbefore routing throughServicePool/FileStore. The HTTP/SSE MCP transport mounted incrates/liter-llm-cli/src/commands/mcp.rsis wrapped with the samevalidate_api_keymiddleware as the OpenAI endpoint, so virtual-key restrictions apply uniformly. Stdio transport requires an explicitmcp.stdio_key_id/mcp.stdio_trust_local = trueopt-in or refuses to start. - (F4, MED-HIGH) Error message sanitization — SSE error events and
ProxyError::from(LiterLlmError)previously embedded raw provider error strings viaDisplaywith no truncation or control-character handling. Newcrates/liter-llm-proxy/src/error.rs::sanitize_message(UTF-8-safe 200-char truncation, control-character strip except\t/\n) is applied at the singleFrom<LiterLlmError>chokepoint; SSE payloads now build viaserde_jsonrather than string interpolation, andProxyError::to_sse_payloadis the canonical serializer. - (F5, MED-HIGH) Mutex poisoning recovery —
SyncService::clone_service(crates/liter-llm/src/client/managed.rs) previously panicked when the innerstd::sync::Mutexwas poisoned. The lock guard only protects the clone step over aBoxCloneService, which isCloneand stateless across the lock, so recovery is safe: poisoned guards are now reclaimed viaPoisonError::into_innerand the next request proceeds normally. - (F7, MED-HIGH, BREAKING) CORS default is empty + wildcard origin loses Authorization header — the proxy's
default_cors()is nowvec![]instead ofvec!["*"]; with nocors_origins, the router skips theCorsLayerentirely. Whencors_originsis set to"*", the wildcard branch restrictsallow_headersto a fixed list (CONTENT_TYPE,ACCEPT, …) and explicitly does not includeAuthorization— wildcard origins must not see credentialed headers per CORS-fetch spec.liter-llm-cli servealso logs atracing::warn!whencors_origins.contains("*") && host == "0.0.0.0".
Changed¶
- Bindings regenerated against alef v0.23.28 (was v0.23.16). All 16 language surfaces — Python, Node, Ruby, PHP, Go, Java, Kotlin Android, C#, Elixir, WASM, C/FFI, Zig, Dart, Swift, R, Homebrew — and the e2e suites refresh end-to-end. The new alef ships my upstream java/magnus/go template patches (PMD braces, jinja whitespace,
MethodHandle.invokethrows Throwablewrap,data_enumclose-brace, magnus top-level module doc) plus the parallel agent's brew/zig/php/dart/snippets/kotlin/swift fixes. - Tighter Rust clippy allow surface in the core and proxy crates: removed three unused
#[allow]annotations, the unusedget_jsonhelper incrates/liter-llm/src/http/request.rs, and a now-deadserde::de::DeserializeOwnedimport.cargo clippy --workspace … -- -D warningsis clean without the deleted suppressions.
Tooling¶
xberg-io/pre-commit-hooksbumped to v2.1.10 — picks up the consumer-sidealef-sync-versions --no-regenfix (full regen no longer fires on every commit), the palantir-java-format multi-platform sha256 manifest acceptance, the ktfmt checksum entry, and thegodoc-lint/golangci-lintgo.work-aware module discovery (no longer scans staletest_apps/swift_e2e/.build/checkouts/.../e2e/go/).- Project-local PMD ruleset at
packages/java/pmd-ruleset.xmlwired into thepmdhook to suppress alef-generated FFI patterns that PMD's quickstart ruleset misflags (AvoidCatchingGenericException,PreserveStackTrace,CloseResource,UnusedLocalVariable,UnnecessaryFullyQualifiedName,VariableCanBeInlined,ReturnEmptyCollectionRatherThanNull). deny.tomlignoresRUSTSEC-2023-0071(Marvin Attack timing sidechannel inrsa@0.9.x, transitive viaopendal -> reqsign-core). No safe upstream version yet; the underlying RSA private-key signing path is not exercised on our network-observable code paths.alef-docs-freshhook and the CIVerify alef-generated code is up-to-datestep soft-disabled pending an alef v0.23.28inputs-hashregression fix —alef verifycurrently flags files as stale immediately after a freshalef allrun (the hash recomputed during verify disagrees with the hash written at emit time).markdownlint-rumdl-strictexclude expanded to cover the rootREADME.md(alef-generated badge row uses inline HTML),CONTRIBUTING.md,templates/readme/, and.github/PULL_REQUEST_TEMPLATE.md.
Migration notes¶
The three behaviour-changing defaults above (cors_origins = [], outbound_policy = "deny_private", MCP per-tool model gate) are all reversible via explicit config. Operators who relied on the old defaults should add to their proxy config:
cors_origins = ["*"] # opt back into the v1.4.x wildcard CORS default
[security]
outbound_policy = "off" # opt back into the v1.4.x unguarded outbound HTTP
Virtual-key holders who previously hit MCP tools without a model-access policy need their [[virtual_keys]] entries updated to include the model names they expect to call — or be granted is_master = true.
[1.4.1] - 2026-06-05¶
Fixed¶
- Docker build: removed stale
COPY tools/ tools/fromdocker/Dockerfile— thetools/directory was deleted in v1.3.0 and the unfixed copy was failing every Docker image build since. publish-cratesjob timeout: bumped.github/workflows/publish.yamlpublish-cratestimeout-minutesfrom 30 to 60. The 30-minute ceiling was cancelling mid-publish on busycrates.ioindex-propagation days, which (combined with the Python stdout buffering issue below) made cancelled runs look like silent failures with no per-crate log output.- Upstream
xberg-io/actionsto v1.8.29:publish-crates/scripts/publish.pynow line-buffers stdout/stderr (sys.stdout.reconfigure(line_buffering=True)), so per-crate "Publishing X (n/total)..." progress survives job cancellation. Before this fix, GitHub Actions' block-buffered Python stdout swallowed all in-flight progress when the job hittimeout-minutes, hiding which crate was actually mid-publish.
Notes¶
- v1.4.0 was a no-op release because
task version:bumpwas not run before tagging — the tree still carried1.4.0-rc.61inCargo.toml, so every publish job either re-shippedrc.61artifacts (already on the registry) or failed verification looking for1.4.0. v1.4.1 is the first real1.4.xrelease. - alef pin advanced to
0.23.16(was0.23.12) — no functional codegen changes vs.0.23.12; bump tracks the latest released0.23.x.
1.4.0 - 2026-06-05¶
Added¶
feat(provider/vertex): auto-install VertexAdcCredentialProvider in DefaultClient::new— when the resolved provider isvertex_aiand the caller supplied neither an explicitapi_keynor acredential_provider, the client now auto-constructsVertexAdcCredentialProvider::new()and installs it on the config. This is the canonical auth path for GKE Workload Identity / Cloud Run / Compute Engine deployments — short-lived OAuth2 tokens are fetched from the metadata server (with agcp_authADC discovery fallback for local development) and cached with a 5-minute pre-expiry refresh buffer. Pre-obtained tokens supplied viaapi_keyand explicitcredential_providers continue to take precedence. The ADC module is now reachable through thenative-httpfeature (gated behindnative-httpinstead ofvertex-adc, withvertex-adcretained as a back-compat alias).feat(provider/azure): per-modelbase_urloverrides for Azure deployments—[[models]]entries that pin abase_urlfor anazure/...provider_modelnow route throughAzureProvider::with_base_url(...), producing the required{base_url}/openai/deployments/{model}{path}?api-version=…shape instead of the generic OpenAI-compatible URL. Unblocks multi-resource Azure setups (different deployments per region/subscription). Closes #83.feat(wasm-backend): emit chat_stream returning JS async iterator— the WASM binding now exposesWasmDefaultClient.chat_stream(req)alongside the existingchat,embed, etc. The streaming adapter buffers the underlyingBoxStream<ChatCompletionChunk>into an array and returns it as aJsValue, mirroring the NAPI binding's streaming semantics.- CLI binary tarballs (Linux x86_64/aarch64, macOS aarch64, Windows x86_64) attached to GitHub Releases for direct download — closes #64.
schemas/pricing.jsonregenerated from models.dev and now covers 4,219 models (up from 35);scripts/generate_pricing.pywired intotask generate:pricing,task update, andtask upgrade. Closes #48.Usage::prompt_tokens_details({ cached_tokens, audio_tokens }) deserialised from the OpenAI-compatible response body, pluscost::completion_cost_with_cacheand matchingcache_read_input_token_cost/cache_creation_input_token_costfields onModelPricing.ChatCompletionResponse::estimated_costand theCostTrackingLayernow bill cached prompt tokens at the provider's discounted cache-read rate.schemas/pricing.jsoncarries cache-read/cache-creation costs for the 1,500+ models on models.dev that publish them. Closes #65.ci-mobile: new.github/workflows/ci-mobile.yamlrunningandroid-check(ubuntu,arm64-v8a+x86_64viacargo ndk),ios-check(macos,aarch64-apple-ios+aarch64-apple-ios-sim), andxcframework-build(macos, SPM-ready XCFramework + SHA256 checksum). Uses shared composite actions fromxberg-io/actions@v1.- Alef migration to v0.23.11: the entire polyglot surface (16 language bindings — Python, Node, Ruby, PHP, Go, Java, C#, Kotlin Android, Elixir, WASM, C/FFI, Zig, Dart, Swift, Homebrew + Rust core) is regenerated end-to-end via alef. Streaming (
chat_stream) is available across every applicable language, including Go (cgo channel bridge), Dart (FRB v2StreamSink<T>), and WASM. Skipped-assertion total across e2e suites: 354 → 0.
Changed¶
- API rename:
ResponseClient::retrieve_response/cancel_responsenow take a parameter namedresponse_id(wasid). Positional callers are unaffected; named-arg callers must update. Consistent withfile_id/batch_idon the file and batch clients, and unblocks the alef-generated Python binding from shadowing theidbuiltin. - GitHub Release CLI assets ship a single sorted
SHA256SUMS-<version>.txtinstead of one.sha256per archive — closes #67. - WebAssembly build verified
mio-free.liter-llmexposes two mutually exclusive HTTP-stack features —native-http(reqwest + tokio + memchr + base64) andwasm-http(reqwest + memchr + base64 + gloo-timers, no tokio).liter-llm-wasmenables onlywasm-http; reqwest is pinned withdefault-features = false, features = ["json", "stream", "rustls", "multipart", "form"].cargo build --target wasm32-unknown-unknown -p liter-llm-wasmpulls neithermionortokio— reqwest auto-routes to the browser/NodefetchAPI onwasm32targets. - Ruby publish vendors core crates exclusively via the shared
xberg-io/actions/rewrite-native-deps@v1action (alefpublish prepare,vendor_mode = "core-only"). The bespokescripts/ci/ruby/vendor-liter-llm-core.py, the localruby:vendorTask, and theruby:builddependency on it are removed. - Repo hygiene:
.gitattributesmarks all alef-generated output directories (packages/**,crates/*-{py,php,ffi,node,wasm}/**,e2e/**) aslinguist-generated=trueso generated files collapse in GitHub PR diffs.
Fixed¶
- TLS ABI floor: reqwest crypto provider switched from
aws-lc-rstoring(rustls-no-providerfeature + explicitrustlsdep withringbackend). Eliminates__isoc23_strtoland related glibc 2.38+ symbols emitted byaws-lc-sys0.40.0, restoring the GLIBC_2.28 ABI floor required by downstream users (e.g. Node.js aarch64 bindings). - HTTP retry jitter on
wasm32-unknown-unknown: the jitter calculation calledstd::time::SystemTime::now()which panics withRuntimeError: unreachableon bare wasm32 (std time is not implemented). Onwasm32the jitter step is skipped; native targets keep the existing[0.5x, 1.0x]jitter. Unblocks WASM e2e tests that exercise 429/5xx retry paths. - WASM and JNI bindings no longer fail to compile against the
tokenizer-gatedcount_tokens/count_request_tokensfunctions. Both nowexclude_functionsinalef.toml; apps that need token counting on those targets should call a server-side endpoint. - C/FFI header emits the opaque
typedef struct LITERLLMLiterLlmError LITERLLMLiterLlmError;referenced by theliterllm_liter_llm_error_{status_code,is_transient,error_type}accessors. - Java
ResponseObject/ResponseToolDTOs round-trip the full OpenAI Responses payload.ResponseOutputItem.contentis aList<…>(was a misalignedLinkedHashMap);ResponseToolacceptsdescriptionvia the@JsonAnyGetter/@JsonAnySetterflatten path. FixesMismatchedInputExceptionandUnrecognizedPropertyExceptionthrown bycreateResponse/retrieveResponse/cancelResponse. - Node (NAPI) streaming HTTP-init errors (400 content-policy, 401 unauthorized on
chatStream) now reject through the iterator. Binding remains lazy (parity with Python'sasync for _ in stream: pass). - Python
api.pywrapper emits the correct shape for non-streaming methods (22DefaultClientops). Previously every method was wrapped as a streamingAsyncIterator; onlychat_streamis genuinely streaming now. Also fixesString→strandbytes::Bytes→bytesmappings.
1.3.0 - 2026-04-23¶
Changed¶
- Alef migration: All language bindings are now auto-generated by alef instead of hand-written
BoxFuture/BoxStreamtype aliases no longer wrapResult<T>— all method signatures now explicitly returnResult<T>providermodule is now public (waspub(crate))ChatCompletionRequest.streamfield is now public (waspub(crate))- Switched spell checker from codespell to typos
- CI no longer runs code generation — only
alef verify --exit-codefor freshness checks - Updated alef to v0.5.9
Added¶
alef.tomlconfiguration for 10 language targets, 23 API method call configs, mock server supportbindings.rsadapter module withcreate_clientandcreate_client_from_jsonbinding-friendly constructorsDefaultderives on all public types for binding compatibilityClonederive onDefaultClient- E2E test fixtures converted to alef format (167+ fixtures across 23 categories)
- E2E tests regenerated for 13 languages with mock HTTP server support
- Test apps generated with
alef e2e generate --registry - API reference documentation auto-generated with
alef docsfor all 10 languages - Package READMEs generated with
alef readmeusing restored Jinja templates alef-verifyandalef-sync-versionspre-commit hooksalef verify --exit-codestep in CI validation workflow.lychee.tomllink checker configuration_typos.tomlspell checker configuration- Auto-load API keys from environment variables
- FFI callback streaming support
chat_streammethod across all bindings
Removed¶
liter-llm-bindings-corecrate — replaced by alef codegentools/e2e-generatorcrate — replaced byalef e2e generatescripts/sync_versions.py— replaced byalef sync-versionsscripts/generate_readme.py— replaced byalef readmescripts/readme_config.yamlandscripts/readme_templates/— replaced bytemplates/readme/tests/test_apps/— replaced bytest_apps/(alef registry mode)- Hand-written binding source in
crates/liter-llm-{py,node,ffi,wasm,php}/src/ - Hand-written package source in
packages/{go,java,csharp,ruby,elixir}/
1.2.2 - 2026-04-18¶
Added¶
- GitHub Copilot OAuth Device Flow credential provider (
copilot-authfeature) — use your Copilot subscription as an LLM backend viagithub_copilot/model prefix (#12) - GitHub Copilot provider with OpenAI-compatible routing, required Copilot headers, per-request UUID, and
X-Initiatorheader - E2E test fixtures for GitHub Copilot provider (chat + auth error)
Fixed¶
- Provider registry audit: corrected base URLs for 20 providers (aiml, assemblyai, clarifai, dashscope, deepseek, elevenlabs, firecrawl, friendliai, gradient_ai, gmi, helicone, lambda_ai, minimax, moonshot, morph, nlp_cloud, ollama, poe, stability, wandb)
- Provider registry audit: corrected env var names for 5 providers (cometapi, fal_ai, gradient_ai, jina_ai, venice)
- Provider registry audit: corrected endpoint lists for 6 providers (cometapi, deepinfra, elevenlabs, jina_ai, mistral, nvidia_nim)
- Added missing
base_urlandauthconfig for 11 previously non-functional providers (amazon_nova, baseten, compactifai, datarobot, docker_model_runner, duckduckgo, langgraph, lemonade, v0, vercel_ai_gateway, zai) - Added 18 stub/infrastructure providers to
complex_providerslist to prevent incorrect config-driven routing - Added
nanogptparam mapping (max_completion_tokens→max_tokens)
1.2.1 - 2026-04-17¶
Added¶
LlmClientRawtrait with_rawvariants of allLlmClientmethods, returningRawExchange<T>that exposes the final request body and raw provider response before normalization (#13)RawExchange<T>andRawStreamExchange<S>types for wire-level debugging and custom parsing- MCP & IDE integration documentation with setup guides for VS Code, GitHub Copilot, Claude Desktop, Cursor (#12)
Fixed¶
- Docker image now published to
ghcr.io/xberg-io/liter-llm(#11) - Docker publish workflow timeout increased from 60 to 360 minutes (multi-arch Rust builds via QEMU were timing out)
- Bedrock
build_urltests no longer flake due toBEDROCK_CROSS_REGIONenv var race condition
1.2.0 - 2026-04-07¶
Added¶
- Local LLM provider support: Ollama, LM Studio, vLLM, llama.cpp, LocalAI, llamafile -- use any local inference engine via OpenAI-compatible API
- Docker Compose setup for local LLM integration testing with Ollama
- Integration test suite for local LLM providers
Fixed¶
- PHP
onErrorhook now passes a proper\Exceptionobject instead of a plain string (PHP strict types requires\Throwable) - README templates fixed for rumdl compliance (MD040 code fence language, MD031 blank lines, MD032 list spacing, MD020 closed headings)
- Added 404 to all POST endpoint OpenAPI specs (model not found on default model names)
- Homebrew badge added to all READMEs
1.1.1 - 2026-03-29¶
Fixed¶
- Java Maven plugins downgraded to 3.x stable (was 4.0.0-beta, incompatible with Maven 3.9.x CI)
- PHP hook isolation (per-client instead of global), budget per-model enforcement, onError hook invocation, shutdown segfault
- PHP e2e tests set
max_retries=0to prevent retry delays on mock 500s - OpenAPI spec: added 400/415/422/503 status codes to all endpoints for schemathesis compliance
first_client()returns 503 Service Unavailable instead of 500 for "no models configured"- Schemathesis CI checks aligned (removed
content_type_conformance,not_a_server_error) - Docker cache: per-platform
TARGETARCHcache IDs prevent multi-arch build races
Added¶
- Homebrew formula:
brew tap xberg-io/tap && brew install liter-llm - Homebrew bottle builds (arm64_sequoia) in publish workflow
liter-llm-proxyandliter-llm-cliadded to crates.io publish pipeline- Installation docs: CLI/Docker/Homebrew tabs
scripts/publish/upload-homebrew-bottles.shandensure-github-release-exists.sh
1.1.0 - 2026-03-29¶
OpenAI-compatible LLM proxy server with CLI, MCP tool server, and Docker support.
Proxy Server (liter-llm-proxy)¶
- 22 REST endpoints — full OpenAI-compatible API surface: chat completions (streaming + non-streaming), embeddings, models, images, audio (speech + transcription), moderations, rerank, search, OCR, files CRUD, batches CRUD, responses CRUD, health
- Tower middleware stack — reuses core middleware: cache, rate limit, budget, cost tracking, cooldown, health check, tracing
- Virtual API keys — in-memory key store with per-key model restrictions, RPM/TPM limits, budget limits
- Model routing — name-based routing to provider deployments, wildcard aliases, deterministic default client
- OpenDAL file storage — configurable backend (memory, S3, GCS, filesystem) for file operations
- SSE streaming — chat completion chunks proxied as Server-Sent Events with
[DONE]sentinel - OpenAPI 3.1 — utoipa-generated spec served at
/openapi.jsonwith bearer auth security scheme - TOML configuration —
liter-llm-proxy.tomlwith env var interpolation (${VAR}), auto-discovery,deny_unknown_fields - CORS — configurable origins from config (default: allow all)
- Graceful shutdown — SIGINT/SIGTERM handling via
tokio::signal
MCP Server (rmcp)¶
- 22 tools — full parity with REST API: chat, embed, list_models, generate_image, speech, transcribe, moderate, rerank, search, ocr, file CRUD (5), batch CRUD (4), response CRUD (3)
- Transports — stdio (default) and HTTP/SSE via
StreamableHttpService - Parameter schemas —
schemars::JsonSchemaderives for MCP tool discovery
CLI (liter-llm)¶
liter-llm api— start proxy server with config, host/port overrides, debug loggingliter-llm mcp— start MCP server with stdio or HTTP transport- 3-tier config precedence: CLI flags > env vars > config file > defaults
Docker¶
- Multi-stage build:
rust:1.91-bookwormbuilder,cgr.dev/chainguard/glibc-dynamicruntime (35MB) - Non-root execution, OCI labels, port 4000 exposed
ENTRYPOINT ["liter-llm"],CMD ["api", "--host", "0.0.0.0", "--port", "4000"]
Testing¶
- 74 unit tests — config parsing, error mapping, auth key store, service pool, file store, streaming
- 32 integration tests — auth middleware, chat/embedding/models routes, error propagation, CORS, health, OpenAPI
- 12 proxy e2e fixtures — chat (basic + streaming), embeddings, models, auth errors, upstream errors, health, images, moderation, reranking
- Schemathesis — contract testing against OpenAPI spec via Docker (
task proxy:schemathesis)
CI/CD¶
.github/workflows/ci-docker.yaml— build + health test + schemathesis contract tests.github/workflows/publish-docker.yaml— multi-arch (amd64/arm64) publish toghcr.io/xberg-io/liter-llm- Taskfile:
proxy:test,proxy:schemathesis
1.0.0 - 2026-03-28¶
Initial stable release. Universal LLM API client with native bindings for 11 languages and 142+ providers.
Core¶
LlmClienttrait with chat, chat_stream, embed, list_models, image_generate, speech, transcribe, moderate, rerank, search, ocrFileClient,BatchClient,ResponseClienttraits for file/batch/response operationsDefaultClientwith reqwest + tokio, SSE streaming, retry with exponential backoffManagedClientwith composable Tower middleware stack- 142 LLM providers embedded at compile time from
schemas/providers.json - Per-request provider routing from model name prefix (e.g.
anthropic/claude-sonnet-4-20250514) secrecy::SecretStringfor API keys (zeroized on drop, never logged)- TOML configuration file loading with auto-discovery (
liter-llm.toml) - Custom provider registration at runtime
Middleware (Tower)¶
- CacheLayer — in-memory LRU + pluggable backends via
CacheStoretrait - OpenDAL cache — 40+ storage backends (Redis, S3, GCS, filesystem, etc.) via Apache OpenDAL
- BudgetLayer — global + per-model spending limits with hard/soft enforcement
- HooksLayer — request/response/error lifecycle callbacks with guardrail pattern
- CooldownLayer — circuit breaker after transient errors
- ModelRateLimitLayer — per-model RPM/TPM rate limiting
- HealthCheckLayer — background health probing
- CostTrackingLayer — per-request cost calculation from embedded pricing registry
- TracingLayer — OpenTelemetry GenAI semantic convention spans
- FallbackLayer — automatic failover to backup provider
- RouterLayer — multi-deployment load balancing (round-robin, latency, cost, weighted)
Language Bindings¶
All bindings expose the full API surface with language-idiomatic conventions:
- Python (PyO3) — async/await, typed kwargs, full .pyi stubs
- TypeScript / Node.js (NAPI-RS) — camelCase, .d.ts types, Promise-based
- Rust — native, zero-cost
- Go (cgo) — FFI wrapper with build tags,
context.Contextsupport - Java (Panama FFM) — JDK 25+,
AutoCloseable, builder pattern - C# / .NET (P/Invoke) — async/await,
IAsyncEnumerablestreaming,IDisposable - Ruby (Magnus) — RBS type signatures, Enumerator streaming
- Elixir (Rustler NIF) —
{:ok, result}tuples, OTP-compatible - PHP (ext-php-rs) — PHP 8.2+, JSON in/out, PIE packages
- WebAssembly (wasm-bindgen) — browser + Node.js, Fetch API
- C / FFI (cbindgen) —
extern "C"with opaque handles
Authentication¶
- Static API keys (Bearer, x-api-key)
- Azure AD OAuth2 client credentials
- Vertex AI service account JWT
- AWS STS Web Identity (EKS/IRSA)
- AWS SigV4 signing for Bedrock
Provider Transforms¶
- Anthropic: message format, tool use v1, thinking blocks, max_tokens default
- AWS Bedrock: Converse API, EventStream binary framing, cross-region routing
- Vertex AI: Gemini format, embedding
:predictendpoint - Google AI: embedding/list_models response transforms
- Cohere: citation handling
- Mistral: API compatibility
param_mappingsfor config-driven field renaming (8 providers)
Documentation¶
- MkDocs Material site at docs.liter-llm.xberg.io
- 170+ code snippets across 10 languages
- 11 API reference docs with full method coverage
- Usage pages: Chat & Streaming, Embeddings & Rerank, Media, Search & OCR, Files & Batches, Configuration
- TOML configuration reference
- llms.txt (218 lines) with capabilities, examples, provider list
- Skills directory (4,072 lines) for Claude Code integration
- README generation from Jinja templates via
scripts/generate_readme.py
Testing¶
- 500+ unit and integration tests
- Middleware stack composition tests (cache + budget + hooks + rate limit + cooldown)
- Per-request provider routing tests
- File/batch/response CRUD operation tests
- Concurrency tests (budget atomicity, cache contention, rate limit fairness)
- Redis cache backend integration tests (Docker Compose)
- Live provider tests for 7 providers (OpenAI, Anthropic, Google AI, Vertex AI, Mistral, Azure, Bedrock)
- Smoke test apps for all 10 languages against real APIs
- E2E test generation from JSON fixtures across all languages
- Contract test fixtures for binding API parity
CI/CD¶
- Multi-platform publish pipeline: crates.io, PyPI, npm, RubyGems, Hex.pm, Maven Central, NuGet, Packagist, Go FFI, PHP PIE
- Pre-commit hooks: 43 linters across all languages
- Post-generation formatting in e2e-generator
- Version sync script across 27+ manifests with README regeneration