Error Handling¶
Every liter-llm client and the proxy return the same error taxonomy, defined by the LiterLlmError enum in crates/liter-llm/src/error.rs. Seventeen variants cover authentication, rate limits, payload problems, transport failures, and internal bugs. Language bindings map each variant to an idiomatic exception type but preserve the original semantics.
This page is the canonical reference. See API Reference for the per-language exception names.
Variants¶
The 17 variants, their typical cause, and whether the Tower middleware treats them as transient:
| Variant | Typical trigger | Transient? |
|---|---|---|
Authentication |
Provider rejected the API key or the token is missing. | no |
RateLimited |
Provider returned 429. Carries an optional retry_after parsed from the header. |
yes |
BadRequest |
Malformed request, unsupported parameter, or a 4xx the proxy could not classify further. | no |
ContextWindowExceeded |
Prompt plus max_tokens exceeds the model context window. Subclass of BadRequest in most bindings. |
no |
ContentPolicy |
Provider safety filter rejected the request or response. Subclass of BadRequest. |
no |
NotFound |
Model name is unknown to the provider, or the file/batch/response ID does not exist. | no |
ServerError |
Provider returned 500 with an unexpected body. | yes |
ServiceUnavailable |
Provider returned 502, 503, or 504, or a health probe marked the upstream unhealthy. | yes |
Timeout |
Request exceeded default_timeout_secs or the per-model timeout_secs. |
yes |
Network |
Transport-level failure from reqwest (connection reset, DNS, TLS). Only present with the native-http feature. |
yes |
Streaming |
UTF-8 decode, CRC mismatch (AWS EventStream), malformed SSE chunk, or buffer overflow during streaming. | no |
EndpointNotSupported |
Provider crate does not implement the requested endpoint (e.g. embeddings on an audio-only provider). | no |
InvalidHeader |
A custom header name or value failed HTTP validation. | no |
Serialization |
serde_json failed to encode the request or decode the response. |
no |
BudgetExceeded |
A [budget] or virtual-key budget_limit cap was hit. Returns 402 through the proxy. |
no |
HookRejected |
A registered hook explicitly rejected the request. | no |
InternalError |
Library bug. Should never surface in normal operation. | no |
Transient variants trigger fallbacks and retries. The Fallback & Routing layer calls LiterLlmError::is_transient() to decide whether to try the next endpoint or return the error to the caller.
HTTP status mapping¶
LiterLlmError::from_status turns an HTTP status code and response body into the right variant. The mapping, from error.rs:146:
| Status | Variant |
|---|---|
401, 403 |
Authentication |
429 |
RateLimited (with Retry-After parsed) |
400, 422 |
ContextWindowExceeded / ContentPolicy / BadRequest (selected by code or message heuristics) |
404 |
NotFound |
405, 413 |
BadRequest |
408 |
Timeout |
500 |
ServerError |
502, 503, 504 |
ServiceUnavailable |
Other 4xx |
BadRequest |
| Anything else | ServerError |
The classification for 400 and 422 prefers the structured code field (context_length_exceeded, content_policy_violation, content_filter) and falls back to substring matching on the message for providers that do not populate code.
Retry behaviour¶
The built-in HTTP client retries only on transient status codes. From crates/liter-llm/src/http/retry.rs:
- Retries only on
429,500,502,503,504. Everything else fails fast. max_retriesdefaults to3and is set globally via[general]max_retriesin the proxy config.- Backoff is exponential:
1s,2s,4s,8s, capped at30s. - Jitter scales each delay to a random value in
[0.5x, 1.0x]of the capped base to avoid thundering herds. - For
429, theRetry-Afterheader takes precedence, capped at60s. Integer seconds are parsed; HTTP-date format is logged and falls back to exponential backoff. - The loop honours the overall request timeout. A retry that would exceed the timeout is not attempted.
Retries apply to single-endpoint calls. Cross-endpoint failover between models is handled by the separate Fallback & Routing layer.
Language bindings¶
Each binding exposes the Rust error taxonomy in whatever shape is idiomatic for the host language. Coverage is not uniform: some bindings mint one exception class per variant, others collapse related variants into broader categories. The table below shows how each binding surfaces errors today.
| Binding | Surface | Categories |
|---|---|---|
| Rust | LiterLlmError enum with 17 variants. |
1:1 with the canonical list. is_transient() and error_type() available. |
| Python | Exception hierarchy rooted at LlmError. |
16 classes (every variant except InternalError, which surfaces as the base LlmError). ContextWindowExceededError and ContentPolicyError inherit from BadRequestError. |
| TypeScript | Thrown JavaScript Error objects. |
Single Error type. The message starts with a bracketed category label ([Authentication], [RateLimited], …). Match on the label rather than the class. |
| Go | Sentinel errors plus *APIError and *StreamError wrapper types. |
8 sentinels: ErrInvalidRequest, ErrAuthentication, ErrRateLimit, ErrNotFound, ErrProviderError, ErrStream, ErrBudgetExceeded, ErrHookRejected. Use errors.Is and errors.As. *APIError exposes StatusCode and Message. |
| Java | LlmException base plus seven inner subclasses and two standalone subclasses. |
InvalidRequestException, AuthenticationException, RateLimitException, NotFoundException, ProviderException, StreamException, SerializationException, BudgetExceededException, HookRejectedException. Every subclass carries a stable getErrorCode(). |
| C# | LlmException base plus nine sealed subclasses. |
Mirrors the Java layout. Numeric ErrorCode constants cover the same categories. |
| Ruby | Raises RuntimeError with a message. |
No typed hierarchy today. Branch on the string message or the underlying HTTP status exposed by the error. |
| Elixir | {:error, %LiterLlm.Error{kind: atom, code: int, http_status: int}}. |
10 kinds: :unknown, :invalid_request, :authentication, :not_found, :rate_limit, :provider_error, :stream_error, :serialization, :budget_exceeded, :hook_rejected. Pattern match on kind. |
| PHP | Throws \RuntimeException for generic failures, plus BudgetExceededException and HookRejectedException for the two dedicated variants. |
Two typed exceptions; everything else is a RuntimeException with a provider message. |
| WASM | Rejects the returned Promise with a plain JavaScript Error. |
No typed hierarchy. Message is formatted as HTTP {status}: {message}. Parse the status to branch on category. |
| C FFI | Returns NULL (or -1 for int32_t returns) and stores a thread-local error message. |
Read via literllm_last_error(). The message is formatted as <function>: [<Category>] <details> using the same bracketed labels as the TypeScript binding. |
Per-language exception trees
See the Error Handling section of each language reference for full class inheritance, retry helpers, and runnable examples: Python, TypeScript, Rust, Go, Java, C#, Ruby, Elixir, PHP, WASM, C FFI.
Bindings that collapse variants still map the HTTP status code to the same category the Rust core would return. The branching in your code may look coarser, but the wire-level semantics (which response is retried, which is a hard failure) are identical.
Catching errors¶
Start by catching the base error type and branch on specific variants only where you need different behaviour.
import asyncio
import os
from liter_llm import (
AuthenticationError,
BudgetExceededError,
ContextWindowExceededError,
LiterLlmError,
RateLimitedError,
create_client,
)
from liter_llm._internal_bindings import ChatCompletionRequest
async def main() -> None:
client = create_client(api_key=os.environ["OPENAI_API_KEY"])
request = ChatCompletionRequest.from_json(
'{"model":"openai/gpt-4o","messages":[{"role":"user","content":"Hello"}]}'
)
try:
response = await client.chat(request)
print(response.choices[0].message.content)
except AuthenticationError as e:
# 401/403 -- rotate the key, do not retry.
print(f"auth failed: {e}")
except RateLimitedError as e:
# 429 -- transient, retry with backoff or fall back to another model.
print(f"rate limited: {e}")
except ContextWindowExceededError as e:
# Trim the prompt or use a larger context window.
print(f"prompt too long: {e}")
except BudgetExceededError as e:
# Virtual-key or global budget cap hit.
print(f"budget exceeded: {e}")
except LiterLlmError as e:
# Catch-all for the remaining liter-llm errors.
print(f"llm error: {e}")
asyncio.run(main())
import { createClient } from "@xberg-io/liter-llm";
const client = createClient(process.env.OPENAI_API_KEY!);
try {
const response = await client.chat({
model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello" }],
});
console.log(response.choices[0].message.content);
} catch (err) {
// Errors surface as plain JS Error objects -- the message is the Rust
// error's Display form (e.g. "Authentication failed: invalid api key").
// Match by substring or rely on the upstream HTTP status text.
if (err instanceof Error) {
const msg = err.message.toLowerCase();
if (msg.includes("authentication")) {
// 401/403 -- rotate the key.
console.error("auth failed:", err.message);
} else if (msg.includes("rate") || msg.includes("429")) {
// 429 -- transient, retry or fall back.
console.error("rate limited:", err.message);
} else if (msg.includes("budget")) {
console.error("budget exceeded:", err.message);
} else {
console.error("llm error:", err.message);
}
}
}
use liter_llm::{
ChatCompletionRequest, ClientConfigBuilder, DefaultClient, LiterLlmError, LlmClient, Message,
UserContent, UserMessage,
};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let config = ClientConfigBuilder::new(std::env::var("OPENAI_API_KEY")?).build();
let client = DefaultClient::new(config, None)?;
let request = ChatCompletionRequest {
model: "openai/gpt-4o".to_owned(),
messages: vec![Message::User(UserMessage {
content: UserContent::Text("Hello".into()),
name: None,
})],
..Default::default()
};
match client.chat(request).await {
Ok(response) => {
if let Some(text) = response.choices[0].message.content.as_deref() {
println!("{text}");
}
}
// Transient errors — worth retrying or falling back to another model.
Err(e) if e.is_transient() => eprintln!("transient failure: {e}"),
// Terminal errors — branch on specific variants where the response differs.
Err(LiterLlmError::Authentication { message }) => eprintln!("auth failed: {message}"),
Err(LiterLlmError::ContextWindowExceeded { message }) => {
eprintln!("prompt too long: {message}")
}
Err(LiterLlmError::BudgetExceeded { message, .. }) => {
eprintln!("budget exceeded: {message}")
}
Err(e) => eprintln!("llm error ({}): {e}", e.error_type()),
}
Ok(())
}
package main
import (
"encoding/json"
"fmt"
"os"
"strings"
llm "github.com/xberg-io/liter-llm/packages/go"
)
func main() {
client, err := llm.CreateClient(os.Getenv("OPENAI_API_KEY"), nil, nil, nil, nil)
if err != nil {
panic(err)
}
var req llm.ChatCompletionRequest
if err := json.Unmarshal([]byte(`{
"model": "openai/gpt-4o-mini",
"messages": [{"role": "user", "content": "Hello"}]
}`), &req); err != nil {
panic(err)
}
_, err = client.Chat(req)
if err == nil {
return
}
// Errors from the Go binding are plain `error` values formatted as
// "[<code>] <message>". Match on the message text to identify the
// category until a structured error type is exposed.
msg := err.Error()
switch {
case strings.Contains(msg, "authentication"):
fmt.Println("auth failed:", err)
case strings.Contains(msg, "rate limit"):
fmt.Println("rate limited:", err)
case strings.Contains(msg, "context window"):
fmt.Println("prompt too long:", err)
case strings.Contains(msg, "service unavailable"):
fmt.Println("provider unavailable:", err)
default:
fmt.Println("llm error:", err)
}
}
import io.xberg.literllm.*;
import java.util.List;
public class ErrorHandling {
public static void main(String[] args) {
try (var client = LiterLlm.createClient(System.getenv("OPENAI_API_KEY"))) {
var response = client.chat(ChatCompletionRequest.builder()
.withModel("openai/gpt-4o")
.withMessages(List.of(
new Message.User(new UserMessage(UserContent.of("Hello"), null))
))
.build());
System.out.println(response.choices().get(0).message().content());
} catch (AuthenticationException e) {
// 401/403 — rotate the key.
System.err.println("auth failed: " + e.getMessage());
} catch (RateLimitedException e) {
// 429 — transient, retry with backoff.
System.err.println("rate limited: " + e.getMessage());
} catch (BudgetExceededException e) {
System.err.println("budget exceeded: " + e.getMessage());
} catch (ServerErrorException | ServiceUnavailableException e) {
// 5xx — usually transient.
System.err.println("server error: " + e.getMessage());
} catch (EndpointNotSupportedException e) {
System.err.println("endpoint not supported by provider: " + e.getMessage());
} catch (LiterLlmErrorException e) {
// Catch-all for typed liter-llm errors.
System.err.println("llm error: " + e.getMessage());
} catch (LiterLlmRsException e) {
// FFI-level error (carries a numeric code).
System.err.println("ffi error (" + e.getCode() + "): " + e.getMessage());
} catch (Exception e) {
System.err.println("unexpected: " + e.getMessage());
}
}
}
using LiterLlm;
using var client = LiterLlmLib.CreateClient(
apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!,
baseUrl: null, timeoutSecs: null, maxRetries: null, modelHint: null);
try
{
var response = await client.ChatAsync(new ChatCompletionRequest
{
Model = "openai/gpt-4o",
Messages = [new Message.User(new UserMessage { Content = UserContent.Of("Hello") })]
});
Console.WriteLine(response.Choices[0].Message.Content);
}
catch (AuthenticationException e)
{
// 401/403 — rotate the key.
Console.Error.WriteLine($"auth failed: {e.Message}");
}
catch (RateLimitedException e)
{
// 429 — transient, retry with backoff.
Console.Error.WriteLine($"rate limited: {e.Message}");
}
catch (BudgetExceededException e)
{
Console.Error.WriteLine($"budget exceeded: {e.Message}");
}
catch (ServerErrorException e)
{
// 5xx — usually transient.
Console.Error.WriteLine($"server error: {e.Message}");
}
catch (EndpointNotSupportedException e)
{
Console.Error.WriteLine($"endpoint not supported by provider: {e.Message}");
}
catch (LiterLlmErrorException e)
{
// Catch-all for typed liter-llm errors.
Console.Error.WriteLine($"llm error: {e.Message}");
}
catch (LiterLlmException e)
{
// FFI-level error (carries a numeric code).
Console.Error.WriteLine($"ffi error ({e.Code}): {e.Message}");
}
# frozen_string_literal: true
require 'liter_llm'
client = LiterLlm.create_client(ENV.fetch('OPENAI_API_KEY'))
begin
result = client.chat_async(
LiterLlm::ChatCompletionRequest.new(
model: 'openai/gpt-4o-mini',
messages: [{ 'role' => 'user', 'content' => 'Hello' }]
)
)
puts result.choices[0].message.content
rescue RuntimeError => e
# The Ruby binding raises plain RuntimeError. The message is the Rust
# error's Display string — branch on its prefix to identify the category.
case e.message
when /\Arate limited:/ then warn "rate limited: #{e.message}"
when /\Aauthentication failed:/ then warn "auth failed: #{e.message}"
when /\Acontext window exceeded:/ then warn "prompt too long: #{e.message}"
when /\Aservice unavailable:/ then warn "provider unavailable: #{e.message}"
else warn "llm error: #{e.message}"
end
end
<?php
declare(strict_types=1);
use Liter\Llm\LiterLlm;
use Liter\Llm\ChatCompletionRequest;
use Liter\Llm\LiterLlmException;
$client = LiterLlm::createClient(getenv('OPENAI_API_KEY') ?: '');
$request = ChatCompletionRequest::from_json(json_encode([
'model' => 'openai/gpt-4o-mini',
'messages' => [['role' => 'user', 'content' => 'Hello']],
]));
try {
$result = $client->chat($request);
echo $result->choices[0]->message->content . PHP_EOL;
} catch (LiterLlmException $e) {
// All liter-llm errors surface as a single LiterLlmException type.
// The exception message is the Rust error's Display string — branch on it
// to identify the category.
$msg = $e->getMessage();
if (stripos($msg, 'authentication') !== false) {
fwrite(STDERR, "auth failed: $msg\n");
} elseif (stripos($msg, 'rate limit') !== false) {
fwrite(STDERR, "rate limited: $msg\n");
} elseif (stripos($msg, 'context window') !== false) {
fwrite(STDERR, "prompt too long: $msg\n");
} else {
fwrite(STDERR, "llm error: $msg\n");
}
}
{:ok, client} = LiterLlm.create_client(System.get_env("OPENAI_API_KEY"))
request =
Jason.encode!(%{
model: "openai/gpt-4o-mini",
messages: [%{role: "user", content: "Hello"}]
})
# Errors come back as `{:error, String.t()}` — the NIF returns the Rust
# error's Display string verbatim. Match on the prefix to identify the
# category.
case LiterLlm.defaultclient_chat_async(client, request) do
{:ok, result} ->
IO.puts(Enum.at(result.choices, 0).message.content)
{:error, "authentication failed:" <> _ = reason} ->
IO.warn("auth failed: #{reason}")
{:error, "rate limited:" <> _ = reason} ->
IO.warn("rate limited: #{reason}")
{:error, "context window exceeded:" <> _ = reason} ->
IO.warn("prompt too long: #{reason}")
{:error, "service unavailable:" <> _ = reason} ->
IO.warn("provider unavailable: #{reason}")
{:error, reason} ->
IO.warn("llm error: #{reason}")
end
import init, { createClient, WasmChatCompletionRequest } from "@xberg-io/liter-llm-wasm";
await init();
const client = createClient(process.env.OPENAI_API_KEY!);
const request = WasmChatCompletionRequest.default();
request.model = "openai/gpt-4o";
request.messages = [{ role: "user", content: "Hello" }];
try {
const response = await client.chat(request);
console.log(response.choices[0].message.content);
} catch (err) {
// The WASM binding rejects with a JsValue built from the Rust error's
// Display impl -- a plain string message. Match on substrings.
const message = err instanceof Error ? err.message : String(err);
const lower = message.toLowerCase();
if (lower.includes("authentication")) {
console.error("auth failed:", message);
} else if (lower.includes("rate") || lower.includes("429")) {
console.error("rate limited:", message);
} else if (lower.includes("budget")) {
console.error("budget exceeded:", message);
} else {
console.error("llm error:", message);
}
}
Observability¶
The tracing middleware records an error.type span attribute on every failed request, set to the value returned by LiterLlmError::error_type(). The set of possible values matches the variant names in the table above. See Observability for the full span schema.