Error Handling¶
Every liter-llm client and the proxy return the same error taxonomy, defined by the LiterLlmError enum in crates/liter-llm/src/error.rs. Seventeen variants cover authentication, rate limits, payload problems, transport failures, and internal bugs. Language bindings map each variant to an idiomatic exception type but preserve the original semantics.
This page is the canonical reference. See API Reference for the per-language exception names.
Variants¶
The 17 variants, their typical cause, and whether the Tower middleware treats them as transient:
| Variant | Typical trigger | Transient? |
|---|---|---|
Authentication |
Provider rejected the API key or the token is missing. | no |
RateLimited |
Provider returned 429. Carries an optional retry_after parsed from the header. |
yes |
BadRequest |
Malformed request, unsupported parameter, or a 4xx the proxy could not classify further. | no |
ContextWindowExceeded |
Prompt plus max_tokens exceeds the model context window. Subclass of BadRequest in most bindings. |
no |
ContentPolicy |
Provider safety filter rejected the request or response. Subclass of BadRequest. |
no |
NotFound |
Model name is unknown to the provider, or the file/batch/response ID does not exist. | no |
ServerError |
Provider returned 500 with an unexpected body. | yes |
ServiceUnavailable |
Provider returned 502, 503, or 504, or a health probe marked the upstream unhealthy. | yes |
Timeout |
Request exceeded default_timeout_secs or the per-model timeout_secs. |
yes |
Network |
Transport-level failure from reqwest (connection reset, DNS, TLS). Only present with the native-http feature. |
yes |
Streaming |
UTF-8 decode, CRC mismatch (AWS EventStream), malformed SSE chunk, or buffer overflow during streaming. | no |
EndpointNotSupported |
Provider crate does not implement the requested endpoint (e.g. embeddings on an audio-only provider). | no |
InvalidHeader |
A custom header name or value failed HTTP validation. | no |
Serialization |
serde_json failed to encode the request or decode the response. |
no |
BudgetExceeded |
A [budget] or virtual-key budget_limit cap was hit. Returns 402 through the proxy. |
no |
HookRejected |
A registered hook explicitly rejected the request. | no |
InternalError |
Library bug. Should never surface in normal operation. | no |
Transient variants trigger fallbacks and retries. The Fallback & Routing layer calls LiterLlmError::is_transient() to decide whether to try the next endpoint or return the error to the caller.
HTTP status mapping¶
LiterLlmError::from_status turns an HTTP status code and response body into the right variant. The mapping, from error.rs:146:
| Status | Variant |
|---|---|
401, 403 |
Authentication |
429 |
RateLimited (with Retry-After parsed) |
400, 422 |
ContextWindowExceeded / ContentPolicy / BadRequest (selected by code or message heuristics) |
404 |
NotFound |
405, 413 |
BadRequest |
408 |
Timeout |
500 |
ServerError |
502, 503, 504 |
ServiceUnavailable |
Other 4xx |
BadRequest |
| Anything else | ServerError |
The classification for 400 and 422 prefers the structured code field (context_length_exceeded, content_policy_violation, content_filter) and falls back to substring matching on the message for providers that do not populate code.
Retry behaviour¶
The built-in HTTP client retries only on transient status codes. From crates/liter-llm/src/http/retry.rs:
- Retries only on
429,500,502,503,504. Everything else fails fast. max_retriesdefaults to3and is set globally via[general]max_retriesin the proxy config.- Backoff is exponential:
1s,2s,4s,8s, capped at30s. - Jitter scales each delay to a random value in
[0.5x, 1.0x]of the capped base to avoid thundering herds. - For
429, theRetry-Afterheader takes precedence, capped at60s. Integer seconds are parsed; HTTP-date format is logged and falls back to exponential backoff. - The loop honours the overall request timeout. A retry that would exceed the timeout is not attempted.
Retries apply to single-endpoint calls. Cross-endpoint failover between models is handled by the separate Fallback & Routing layer.
Language bindings¶
Each binding exposes the Rust error taxonomy in whatever shape is idiomatic for the host language. Coverage is not uniform: some bindings mint one exception class per variant, others collapse related variants into broader categories. The table below shows how each binding surfaces errors today.
| Binding | Surface | Categories |
|---|---|---|
| Rust | LiterLlmError enum with 17 variants. |
1:1 with the canonical list. is_transient() and error_type() available. |
| Python | Exception hierarchy rooted at LlmError. |
16 classes (every variant except InternalError, which surfaces as the base LlmError). ContextWindowExceededError and ContentPolicyError inherit from BadRequestError. |
| TypeScript | Thrown JavaScript Error objects. |
Single Error type. The message starts with a bracketed category label ([Authentication], [RateLimited], …). Match on the label rather than the class. |
| Go | Sentinel errors plus *APIError and *StreamError wrapper types. |
8 sentinels: ErrInvalidRequest, ErrAuthentication, ErrRateLimit, ErrNotFound, ErrProviderError, ErrStream, ErrBudgetExceeded, ErrHookRejected. Use errors.Is and errors.As. *APIError exposes StatusCode and Message. |
| Java | LlmException base plus seven inner subclasses and two standalone subclasses. |
InvalidRequestException, AuthenticationException, RateLimitException, NotFoundException, ProviderException, StreamException, SerializationException, BudgetExceededException, HookRejectedException. Every subclass carries a stable getErrorCode(). |
| C# | LlmException base plus nine sealed subclasses. |
Mirrors the Java layout. Numeric ErrorCode constants cover the same categories. |
| Ruby | Raises RuntimeError with a message. |
No typed hierarchy today. Branch on the string message or the underlying HTTP status exposed by the error. |
| Elixir | {:error, %LiterLlm.Error{kind: atom, code: int, http_status: int}}. |
10 kinds: :unknown, :invalid_request, :authentication, :not_found, :rate_limit, :provider_error, :stream_error, :serialization, :budget_exceeded, :hook_rejected. Pattern match on kind. |
| PHP | Throws \RuntimeException for generic failures, plus BudgetExceededException and HookRejectedException for the two dedicated variants. |
Two typed exceptions; everything else is a RuntimeException with a provider message. |
| WASM | Rejects the returned Promise with a plain JavaScript Error. |
No typed hierarchy. Message is formatted as HTTP {status}: {message}. Parse the status to branch on category. |
| C FFI | Returns NULL (or -1 for int32_t returns) and stores a thread-local error message. |
Read via literllm_last_error(). The message is formatted as <function>: [<Category>] <details> using the same bracketed labels as the TypeScript binding. |
Per-language exception trees
See the Error Handling section of each language reference for full class inheritance, retry helpers, and runnable examples: Python, TypeScript, Rust, Go, Java, C#, Ruby, Elixir, PHP, WASM, C FFI.
Bindings that collapse variants still map the HTTP status code to the same category the Rust core would return. The branching in your code may look coarser, but the wire-level semantics (which response is retried, which is a hard failure) are identical.
Catching errors¶
Start by catching the base error type and branch on specific variants only where you need different behaviour.
import asyncio
import os
from liter_llm import (
LlmClient,
LlmError,
AuthenticationError,
RateLimitedError,
ContextWindowExceededError,
BudgetExceededError,
)
async def main() -> None:
client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
try:
response = await client.chat(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
except AuthenticationError as e:
# 401/403 – rotate the key, do not retry.
print(f"auth failed: {e}")
except RateLimitedError as e:
# 429 – transient, retry with backoff or fall back to another model.
print(f"rate limited: {e}")
except ContextWindowExceededError as e:
# Trim the prompt or use a larger context window.
print(f"prompt too long: {e}")
except BudgetExceededError as e:
# Virtual-key or global budget cap hit.
print(f"budget exceeded: {e}")
except LlmError as e:
# Catch-all for the remaining liter-llm errors.
print(f"llm error: {e}")
asyncio.run(main())
import { LlmClient } from "liter-llm";
const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
try {
const response = await client.chat({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);
} catch (err) {
// All liter-llm errors surface as JavaScript Error objects. The message
// carries a bracketed category label: "[RateLimited] Too many requests".
if (err instanceof Error) {
if (err.message.startsWith("[Authentication]")) {
// 401/403 – rotate the key.
console.error("auth failed:", err.message);
} else if (err.message.startsWith("[RateLimited]")) {
// 429 – transient, retry or fall back.
console.error("rate limited:", err.message);
} else if (err.message.startsWith("[BudgetExceeded]")) {
console.error("budget exceeded:", err.message);
} else {
console.error("llm error:", err.message);
}
}
}
use liter_llm::{
ChatCompletionRequest, ClientConfigBuilder, DefaultClient, LiterLlmError, LlmClient, Message,
UserContent, UserMessage,
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = ClientConfigBuilder::new(std::env::var("OPENAI_API_KEY")?).build();
let client = DefaultClient::new(config);
let request = ChatCompletionRequest {
model: "openai/gpt-4o".to_owned(),
messages: vec![Message::User(UserMessage {
content: UserContent::Text("Hello".into()),
name: None,
})],
..Default::default()
};
match client.chat(request).await {
Ok(response) => {
if let Some(text) = response.choices[0].message.content.as_deref() {
println!("{text}");
}
}
// Transient errors — worth retrying or falling back to another model.
Err(e) if e.is_transient() => eprintln!("transient failure: {e}"),
// Terminal errors — branch on specific variants where the response differs.
Err(LiterLlmError::Authentication { message }) => eprintln!("auth failed: {message}"),
Err(LiterLlmError::ContextWindowExceeded { message }) => {
eprintln!("prompt too long: {message}")
}
Err(LiterLlmError::BudgetExceeded { message, .. }) => {
eprintln!("budget exceeded: {message}")
}
Err(e) => eprintln!("llm error ({}): {e}", e.error_type()),
}
Ok(())
}
package main
import (
"context"
"errors"
"fmt"
"os"
literllm "github.com/kreuzberg-dev/liter-llm/packages/go"
)
func main() {
client := literllm.NewClient(
literllm.WithAPIKey(os.Getenv("OPENAI_API_KEY")),
)
_, err := client.Chat(context.Background(), &literllm.ChatCompletionRequest{
Model: "openai/gpt-4o",
Messages: []literllm.Message{literllm.NewTextMessage(literllm.RoleUser, "Hello")},
})
if err == nil {
return
}
switch {
case errors.Is(err, literllm.ErrAuthentication):
// 401/403 — rotate the key.
fmt.Println("auth failed:", err)
case errors.Is(err, literllm.ErrRateLimit):
// 429 — transient, back off and retry.
fmt.Println("rate limited:", err)
case errors.Is(err, literllm.ErrBudgetExceeded):
fmt.Println("budget exceeded:", err)
case errors.Is(err, literllm.ErrProviderError):
// 5xx — transient on the proxy, terminal from the caller's view.
fmt.Println("provider error:", err)
default:
// Inspect the underlying HTTP status when present.
var apiErr *literllm.APIError
if errors.As(err, &apiErr) {
fmt.Printf("HTTP %d: %s\n", apiErr.StatusCode, apiErr.Message)
return
}
fmt.Println("llm error:", err)
}
}
import java.util.List;
import dev.kreuzberg.literllm.LlmClient;
import dev.kreuzberg.literllm.LlmException;
import dev.kreuzberg.literllm.BudgetExceededException;
import dev.kreuzberg.literllm.types.ChatCompletionRequest;
import dev.kreuzberg.literllm.types.Types;
public class ErrorHandling {
public static void main(String[] args) {
try (var client = LlmClient.builder().apiKey(System.getenv("OPENAI_API_KEY")).build()) {
var response = client.chat(new ChatCompletionRequest(
"openai/gpt-4o",
List.of(new Types.UserMessage("Hello"))
));
System.out.println(response.choices().get(0).message().content());
} catch (LlmException.AuthenticationException e) {
// 401/403 — rotate the key.
System.err.println("auth failed: " + e.getMessage());
} catch (LlmException.RateLimitException e) {
// 429 — transient, retry with backoff.
System.err.println("rate limited: " + e.getMessage());
} catch (BudgetExceededException e) {
System.err.println("budget exceeded: " + e.getMessage());
} catch (LlmException.ProviderException e) {
// 5xx — inspect getHttpStatus() to decide next step.
System.err.printf("provider %d: %s%n", e.getHttpStatus(), e.getMessage());
} catch (LlmException e) {
// Catch-all for the remaining liter-llm errors.
System.err.println("llm error (" + e.getErrorCode() + "): " + e.getMessage());
} catch (Exception e) {
System.err.println("unexpected: " + e.getMessage());
}
}
}
using System;
using System.Threading.Tasks;
using LiterLlm;
class Program
{
static async Task Main()
{
await using var client = new LlmClient(
apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!
);
try
{
var response = await client.ChatAsync(new ChatCompletionRequest(
model: "openai/gpt-4o",
messages: new[] { new UserMessage("Hello") }
));
Console.WriteLine(response.Choices[0].Message.Content);
}
catch (AuthenticationException e)
{
// 401/403 — rotate the key.
Console.Error.WriteLine($"auth failed: {e.Message}");
}
catch (RateLimitException e)
{
// 429 — transient, retry with backoff.
Console.Error.WriteLine($"rate limited: {e.Message}");
}
catch (BudgetExceededException e)
{
Console.Error.WriteLine($"budget exceeded: {e.Message}");
}
catch (ProviderException e)
{
Console.Error.WriteLine($"provider error: {e.Message}");
}
catch (LlmException e)
{
// Catch-all for the remaining liter-llm errors.
Console.Error.WriteLine($"llm error ({e.ErrorCode}): {e.Message}");
}
}
}
require 'liter_llm'
require 'json'
client = LiterLlm::LlmClient.new(
api_key: ENV.fetch('OPENAI_API_KEY')
)
request = {
model: 'openai/gpt-4o',
messages: [{ role: 'user', content: 'Hello' }]
}.to_json
begin
response = JSON.parse(client.chat(request))
puts response.dig('choices', 0, 'message', 'content')
rescue RuntimeError => e
# The Ruby binding raises plain RuntimeError. The message is the Rust
# error's Display string — branch on its prefix to identify the category.
case e.message
when /\Arate limited:/ then warn "rate limited: #{e.message}"
when /\Aauthentication failed:/ then warn "auth failed: #{e.message}"
when /\Abudget exceeded:/ then warn "budget exceeded: #{e.message}"
when /\Acontext window exceeded:/ then warn "prompt too long: #{e.message}"
when /\Aservice unavailable:/ then warn "provider unavailable: #{e.message}"
else warn "llm error: #{e.message}"
end
end
<?php
declare(strict_types=1);
use LiterLlm\LlmClient;
use LiterLlm\BudgetExceededException;
use LiterLlm\HookRejectedException;
$client = new LlmClient(apiKey: getenv('OPENAI_API_KEY'));
$request = [
'model' => 'openai/gpt-4o',
'messages' => [['role' => 'user', 'content' => 'Hello']],
];
try {
$response = json_decode($client->chat(json_encode($request)), true);
echo $response['choices'][0]['message']['content'] . PHP_EOL;
} catch (BudgetExceededException $e) {
fwrite(STDERR, "budget exceeded: {$e->getMessage()}\n");
} catch (HookRejectedException $e) {
fwrite(STDERR, "hook rejected: {$e->getMessage()}\n");
} catch (\RuntimeException $e) {
// All other liter-llm errors surface as plain RuntimeException.
// Branch on the provider message text.
$msg = $e->getMessage();
if (stripos($msg, 'authentication') !== false) {
fwrite(STDERR, "auth failed: $msg\n");
} elseif (stripos($msg, 'rate limit') !== false) {
fwrite(STDERR, "rate limited: $msg\n");
} else {
fwrite(STDERR, "llm error: $msg\n");
}
}
client =
LiterLlm.Client.new(
api_key: System.fetch_env!("OPENAI_API_KEY")
)
request = %{
model: "openai/gpt-4o",
messages: [%{role: "user", content: "Hello"}]
}
case LiterLlm.Client.chat(client, request) do
{:ok, response} ->
IO.puts(response["choices"] |> hd() |> get_in(["message", "content"]))
# 401/403 — rotate the key.
{:error, %LiterLlm.Error{kind: :authentication, message: message}} ->
IO.warn("auth failed: #{message}")
# 429 — transient, back off and retry or fall back.
{:error, %LiterLlm.Error{kind: :rate_limit, message: message}} ->
IO.warn("rate limited: #{message}")
{:error, %LiterLlm.Error{kind: :budget_exceeded, message: message}} ->
IO.warn("budget exceeded: #{message}")
# 5xx — inspect http_status when present.
{:error, %LiterLlm.Error{kind: :provider_error, http_status: status, message: message}} ->
IO.warn("provider #{status}: #{message}")
{:error, %LiterLlm.Error{kind: kind, message: message}} ->
IO.warn("llm error (#{kind}): #{message}")
end
import init, { LlmClient } from "@kreuzberg/liter-llm-wasm";
await init();
const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });
try {
const response = await client.chat({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);
} catch (err) {
// The WASM binding rejects with a plain Error whose message is formatted
// as "HTTP {status}: {message}". Parse the status to branch on category.
const message = err instanceof Error ? err.message : String(err);
const match = message.match(/^HTTP (\d+):/);
const status = match ? Number(match[1]) : null;
if (status === 429) {
console.error("rate limited:", message);
} else if (status === 401 || status === 403) {
console.error("auth failed:", message);
} else if (status === 408 || (status !== null && status >= 500)) {
console.error("transient error, retry with backoff:", message);
} else if (message.includes("budget exceeded")) {
console.error("budget exceeded:", message);
} else {
console.error("llm error:", message);
}
}
Observability¶
The tracing middleware records an error.type span attribute on every failed request, set to the value returned by LiterLlmError::error_type(). The set of possible values matches the variant names in the table above. See Observability for the full span schema.