Error Handling¶

Every liter-llm client and the proxy return the same error taxonomy, defined by the LiterLlmError enum in crates/liter-llm/src/error.rs. Seventeen variants cover authentication, rate limits, payload problems, transport failures, and internal bugs. Language bindings map each variant to an idiomatic exception type but preserve the original semantics.

This page is the canonical reference. See API Reference for the per-language exception names.

Variants¶

The 17 variants, their typical cause, and whether the Tower middleware treats them as transient:

Variant	Typical trigger	Transient?
`Authentication`	Provider rejected the API key or the token is missing.	no
`RateLimited`	Provider returned 429. Carries an optional `retry_after` parsed from the header.	yes
`BadRequest`	Malformed request, unsupported parameter, or a 4xx the proxy could not classify further.	no
`ContextWindowExceeded`	Prompt plus `max_tokens` exceeds the model context window. Subclass of `BadRequest` in most bindings.	no
`ContentPolicy`	Provider safety filter rejected the request or response. Subclass of `BadRequest`.	no
`NotFound`	Model name is unknown to the provider, or the file/batch/response ID does not exist.	no
`ServerError`	Provider returned 500 with an unexpected body.	yes
`ServiceUnavailable`	Provider returned 502, 503, or 504, or a health probe marked the upstream unhealthy.	yes
`Timeout`	Request exceeded `default_timeout_secs` or the per-model `timeout_secs`.	yes
`Network`	Transport-level failure from `reqwest` (connection reset, DNS, TLS). Only present with the `native-http` feature.	yes
`Streaming`	UTF-8 decode, CRC mismatch (AWS EventStream), malformed SSE chunk, or buffer overflow during streaming.	no
`EndpointNotSupported`	Provider crate does not implement the requested endpoint (e.g. embeddings on an audio-only provider).	no
`InvalidHeader`	A custom header name or value failed HTTP validation.	no
`Serialization`	`serde_json` failed to encode the request or decode the response.	no
`BudgetExceeded`	A `[budget]` or virtual-key `budget_limit` cap was hit. Returns 402 through the proxy.	no
`HookRejected`	A registered hook explicitly rejected the request.	no
`InternalError`	Library bug. Should never surface in normal operation.	no

Transient variants trigger fallbacks and retries. The Fallback & Routing layer calls LiterLlmError::is_transient() to decide whether to try the next endpoint or return the error to the caller.

HTTP status mapping¶

LiterLlmError::from_status turns an HTTP status code and response body into the right variant. The mapping, from error.rs:146:

Status	Variant
`401`, `403`	`Authentication`
`429`	`RateLimited` (with `Retry-After` parsed)
`400`, `422`	`ContextWindowExceeded` / `ContentPolicy` / `BadRequest` (selected by `code` or message heuristics)
`404`	`NotFound`
`405`, `413`	`BadRequest`
`408`	`Timeout`
`500`	`ServerError`
`502`, `503`, `504`	`ServiceUnavailable`
Other `4xx`	`BadRequest`
Anything else	`ServerError`

The classification for 400 and 422 prefers the structured code field (context_length_exceeded, content_policy_violation, content_filter) and falls back to substring matching on the message for providers that do not populate code.

Retry behaviour¶

The built-in HTTP client retries only on transient status codes. From crates/liter-llm/src/http/retry.rs:

Retries only on 429, 500, 502, 503, 504. Everything else fails fast.
max_retries defaults to 3 and is set globally via [general] max_retries in the proxy config.
Backoff is exponential: 1s, 2s, 4s, 8s, capped at 30s.
Jitter scales each delay to a random value in [0.5x, 1.0x] of the capped base to avoid thundering herds.
For 429, the Retry-After header takes precedence, capped at 60s. Integer seconds are parsed; HTTP-date format is logged and falls back to exponential backoff.
The loop honours the overall request timeout. A retry that would exceed the timeout is not attempted.

Retries apply to single-endpoint calls. Cross-endpoint failover between models is handled by the separate Fallback & Routing layer.

Language bindings¶

Each binding exposes the Rust error taxonomy in whatever shape is idiomatic for the host language. Coverage is not uniform: some bindings mint one exception class per variant, others collapse related variants into broader categories. The table below shows how each binding surfaces errors today.

Binding	Surface	Categories
Rust	`LiterLlmError` enum with 17 variants.	1:1 with the canonical list. `is_transient()` and `error_type()` available.
Python	Exception hierarchy rooted at `LlmError`.	16 classes (every variant except `InternalError`, which surfaces as the base `LlmError`). `ContextWindowExceededError` and `ContentPolicyError` inherit from `BadRequestError`.
TypeScript	Thrown JavaScript `Error` objects.	Single `Error` type. The message starts with a bracketed category label (`[Authentication]`, `[RateLimited]`, …). Match on the label rather than the class.
Go	Sentinel errors plus `APIError` and `StreamError` wrapper types.	8 sentinels: `ErrInvalidRequest`, `ErrAuthentication`, `ErrRateLimit`, `ErrNotFound`, `ErrProviderError`, `ErrStream`, `ErrBudgetExceeded`, `ErrHookRejected`. Use `errors.Is` and `errors.As`. `*APIError` exposes `StatusCode` and `Message`.
Java	`LlmException` base plus seven inner subclasses and two standalone subclasses.	`InvalidRequestException`, `AuthenticationException`, `RateLimitException`, `NotFoundException`, `ProviderException`, `StreamException`, `SerializationException`, `BudgetExceededException`, `HookRejectedException`. Every subclass carries a stable `getErrorCode()`.
C#	`LlmException` base plus nine sealed subclasses.	Mirrors the Java layout. Numeric `ErrorCode` constants cover the same categories.
Ruby	Raises `RuntimeError` with a message.	No typed hierarchy today. Branch on the string message or the underlying HTTP status exposed by the error.
Elixir	`{:error, %LiterLlm.Error{kind: atom, code: int, http_status: int}}`.	10 kinds: `:unknown`, `:invalid_request`, `:authentication`, `:not_found`, `:rate_limit`, `:provider_error`, `:stream_error`, `:serialization`, `:budget_exceeded`, `:hook_rejected`. Pattern match on `kind`.
PHP	Throws `\RuntimeException` for generic failures, plus `BudgetExceededException` and `HookRejectedException` for the two dedicated variants.	Two typed exceptions; everything else is a `RuntimeException` with a provider message.
WASM	Rejects the returned `Promise` with a plain JavaScript `Error`.	No typed hierarchy. Message is formatted as `HTTP {status}: {message}`. Parse the status to branch on category.
C FFI	Returns `NULL` (or `-1` for `int32_t` returns) and stores a thread-local error message.	Read via `literllm_last_error()`. The message is formatted as `<function>: [<Category>] <details>` using the same bracketed labels as the TypeScript binding.

Per-language exception trees

See the Error Handling section of each language reference for full class inheritance, retry helpers, and runnable examples: Python, TypeScript, Rust, Go, Java, C#, Ruby, Elixir, PHP, WASM, C FFI.

Bindings that collapse variants still map the HTTP status code to the same category the Rust core would return. The branching in your code may look coarser, but the wire-level semantics (which response is retried, which is a hard failure) are identical.

Catching errors¶

Start by catching the base error type and branch on specific variants only where you need different behaviour.

Python

import asyncio
import os

from liter_llm import (
    LlmClient,
    LlmError,
    AuthenticationError,
    RateLimitedError,
    ContextWindowExceededError,
    BudgetExceededError,
)

async def main() -> None:
    client = LlmClient(api_key=os.environ["OPENAI_API_KEY"])
    try:
        response = await client.chat(
            model="openai/gpt-4o",
            messages=[{"role": "user", "content": "Hello"}],
        )
        print(response.choices[0].message.content)
    except AuthenticationError as e:
        # 401/403 – rotate the key, do not retry.
        print(f"auth failed: {e}")
    except RateLimitedError as e:
        # 429 – transient, retry with backoff or fall back to another model.
        print(f"rate limited: {e}")
    except ContextWindowExceededError as e:
        # Trim the prompt or use a larger context window.
        print(f"prompt too long: {e}")
    except BudgetExceededError as e:
        # Virtual-key or global budget cap hit.
        print(f"budget exceeded: {e}")
    except LlmError as e:
        # Catch-all for the remaining liter-llm errors.
        print(f"llm error: {e}")

asyncio.run(main())

TypeScript

import { LlmClient } from "liter-llm";

const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });

try {
  const response = await client.chat({
    model: "openai/gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  });
  console.log(response.choices[0].message.content);
} catch (err) {
  // All liter-llm errors surface as JavaScript Error objects. The message
  // carries a bracketed category label: "[RateLimited] Too many requests".
  if (err instanceof Error) {
    if (err.message.startsWith("[Authentication]")) {
      // 401/403 – rotate the key.
      console.error("auth failed:", err.message);
    } else if (err.message.startsWith("[RateLimited]")) {
      // 429 – transient, retry or fall back.
      console.error("rate limited:", err.message);
    } else if (err.message.startsWith("[BudgetExceeded]")) {
      console.error("budget exceeded:", err.message);
    } else {
      console.error("llm error:", err.message);
    }
  }
}

Rust

use liter_llm::{
    ChatCompletionRequest, ClientConfigBuilder, DefaultClient, LiterLlmError, LlmClient, Message,
    UserContent, UserMessage,
};

#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
    let config = ClientConfigBuilder::new(std::env::var("OPENAI_API_KEY")?).build();
    let client = DefaultClient::new(config);

    let request = ChatCompletionRequest {
        model: "openai/gpt-4o".to_owned(),
        messages: vec![Message::User(UserMessage {
            content: UserContent::Text("Hello".into()),
            name: None,
        })],
        ..Default::default()
    };

    match client.chat(request).await {
        Ok(response) => {
            if let Some(text) = response.choices[0].message.content.as_deref() {
                println!("{text}");
            }
        }
        // Transient errors — worth retrying or falling back to another model.
        Err(e) if e.is_transient() => eprintln!("transient failure: {e}"),
        // Terminal errors — branch on specific variants where the response differs.
        Err(LiterLlmError::Authentication { message }) => eprintln!("auth failed: {message}"),
        Err(LiterLlmError::ContextWindowExceeded { message }) => {
            eprintln!("prompt too long: {message}")
        }
        Err(LiterLlmError::BudgetExceeded { message, .. }) => {
            eprintln!("budget exceeded: {message}")
        }
        Err(e) => eprintln!("llm error ({}): {e}", e.error_type()),
    }

    Ok(())
}

Go

package main

import (
    "context"
    "errors"
    "fmt"
    "os"

    literllm "github.com/kreuzberg-dev/liter-llm/packages/go"
)

func main() {
    client := literllm.NewClient(
        literllm.WithAPIKey(os.Getenv("OPENAI_API_KEY")),
    )

    _, err := client.Chat(context.Background(), &literllm.ChatCompletionRequest{
        Model:    "openai/gpt-4o",
        Messages: []literllm.Message{literllm.NewTextMessage(literllm.RoleUser, "Hello")},
    })
    if err == nil {
        return
    }

    switch {
    case errors.Is(err, literllm.ErrAuthentication):
        // 401/403 — rotate the key.
        fmt.Println("auth failed:", err)
    case errors.Is(err, literllm.ErrRateLimit):
        // 429 — transient, back off and retry.
        fmt.Println("rate limited:", err)
    case errors.Is(err, literllm.ErrBudgetExceeded):
        fmt.Println("budget exceeded:", err)
    case errors.Is(err, literllm.ErrProviderError):
        // 5xx — transient on the proxy, terminal from the caller's view.
        fmt.Println("provider error:", err)
    default:
        // Inspect the underlying HTTP status when present.
        var apiErr *literllm.APIError
        if errors.As(err, &apiErr) {
            fmt.Printf("HTTP %d: %s\n", apiErr.StatusCode, apiErr.Message)
            return
        }
        fmt.Println("llm error:", err)
    }
}

Java

import java.util.List;

import dev.kreuzberg.literllm.LlmClient;
import dev.kreuzberg.literllm.LlmException;
import dev.kreuzberg.literllm.BudgetExceededException;
import dev.kreuzberg.literllm.types.ChatCompletionRequest;
import dev.kreuzberg.literllm.types.Types;

public class ErrorHandling {
    public static void main(String[] args) {
        try (var client = LlmClient.builder().apiKey(System.getenv("OPENAI_API_KEY")).build()) {
            var response = client.chat(new ChatCompletionRequest(
                "openai/gpt-4o",
                List.of(new Types.UserMessage("Hello"))
            ));
            System.out.println(response.choices().get(0).message().content());
        } catch (LlmException.AuthenticationException e) {
            // 401/403 — rotate the key.
            System.err.println("auth failed: " + e.getMessage());
        } catch (LlmException.RateLimitException e) {
            // 429 — transient, retry with backoff.
            System.err.println("rate limited: " + e.getMessage());
        } catch (BudgetExceededException e) {
            System.err.println("budget exceeded: " + e.getMessage());
        } catch (LlmException.ProviderException e) {
            // 5xx — inspect getHttpStatus() to decide next step.
            System.err.printf("provider %d: %s%n", e.getHttpStatus(), e.getMessage());
        } catch (LlmException e) {
            // Catch-all for the remaining liter-llm errors.
            System.err.println("llm error (" + e.getErrorCode() + "): " + e.getMessage());
        } catch (Exception e) {
            System.err.println("unexpected: " + e.getMessage());
        }
    }
}

C#

using System;
using System.Threading.Tasks;
using LiterLlm;

class Program
{
    static async Task Main()
    {
        await using var client = new LlmClient(
            apiKey: Environment.GetEnvironmentVariable("OPENAI_API_KEY")!
        );

        try
        {
            var response = await client.ChatAsync(new ChatCompletionRequest(
                model: "openai/gpt-4o",
                messages: new[] { new UserMessage("Hello") }
            ));
            Console.WriteLine(response.Choices[0].Message.Content);
        }
        catch (AuthenticationException e)
        {
            // 401/403 — rotate the key.
            Console.Error.WriteLine($"auth failed: {e.Message}");
        }
        catch (RateLimitException e)
        {
            // 429 — transient, retry with backoff.
            Console.Error.WriteLine($"rate limited: {e.Message}");
        }
        catch (BudgetExceededException e)
        {
            Console.Error.WriteLine($"budget exceeded: {e.Message}");
        }
        catch (ProviderException e)
        {
            Console.Error.WriteLine($"provider error: {e.Message}");
        }
        catch (LlmException e)
        {
            // Catch-all for the remaining liter-llm errors.
            Console.Error.WriteLine($"llm error ({e.ErrorCode}): {e.Message}");
        }
    }
}

Ruby

require 'liter_llm'
require 'json'

client = LiterLlm::LlmClient.new(
  api_key: ENV.fetch('OPENAI_API_KEY')
)

request = {
  model: 'openai/gpt-4o',
  messages: [{ role: 'user', content: 'Hello' }]
}.to_json

begin
  response = JSON.parse(client.chat(request))
  puts response.dig('choices', 0, 'message', 'content')
rescue RuntimeError => e
  # The Ruby binding raises plain RuntimeError. The message is the Rust
  # error's Display string — branch on its prefix to identify the category.
  case e.message
  when /\Arate limited:/            then warn "rate limited: #{e.message}"
  when /\Aauthentication failed:/   then warn "auth failed: #{e.message}"
  when /\Abudget exceeded:/         then warn "budget exceeded: #{e.message}"
  when /\Acontext window exceeded:/ then warn "prompt too long: #{e.message}"
  when /\Aservice unavailable:/     then warn "provider unavailable: #{e.message}"
  else warn "llm error: #{e.message}"
  end
end

PHP

<?php

declare(strict_types=1);

use LiterLlm\LlmClient;
use LiterLlm\BudgetExceededException;
use LiterLlm\HookRejectedException;

$client = new LlmClient(apiKey: getenv('OPENAI_API_KEY'));

$request = [
    'model' => 'openai/gpt-4o',
    'messages' => [['role' => 'user', 'content' => 'Hello']],
];

try {
    $response = json_decode($client->chat(json_encode($request)), true);
    echo $response['choices'][0]['message']['content'] . PHP_EOL;
} catch (BudgetExceededException $e) {
    fwrite(STDERR, "budget exceeded: {$e->getMessage()}\n");
} catch (HookRejectedException $e) {
    fwrite(STDERR, "hook rejected: {$e->getMessage()}\n");
} catch (\RuntimeException $e) {
    // All other liter-llm errors surface as plain RuntimeException.
    // Branch on the provider message text.
    $msg = $e->getMessage();
    if (stripos($msg, 'authentication') !== false) {
        fwrite(STDERR, "auth failed: $msg\n");
    } elseif (stripos($msg, 'rate limit') !== false) {
        fwrite(STDERR, "rate limited: $msg\n");
    } else {
        fwrite(STDERR, "llm error: $msg\n");
    }
}

Elixir

client =
  LiterLlm.Client.new(
    api_key: System.fetch_env!("OPENAI_API_KEY")
  )

request = %{
  model: "openai/gpt-4o",
  messages: [%{role: "user", content: "Hello"}]
}

case LiterLlm.Client.chat(client, request) do
  {:ok, response} ->
    IO.puts(response["choices"] |> hd() |> get_in(["message", "content"]))

  # 401/403 — rotate the key.
  {:error, %LiterLlm.Error{kind: :authentication, message: message}} ->
    IO.warn("auth failed: #{message}")

  # 429 — transient, back off and retry or fall back.
  {:error, %LiterLlm.Error{kind: :rate_limit, message: message}} ->
    IO.warn("rate limited: #{message}")

  {:error, %LiterLlm.Error{kind: :budget_exceeded, message: message}} ->
    IO.warn("budget exceeded: #{message}")

  # 5xx — inspect http_status when present.
  {:error, %LiterLlm.Error{kind: :provider_error, http_status: status, message: message}} ->
    IO.warn("provider #{status}: #{message}")

  {:error, %LiterLlm.Error{kind: kind, message: message}} ->
    IO.warn("llm error (#{kind}): #{message}")
end

WASM

import init, { LlmClient } from "@kreuzberg/liter-llm-wasm";

await init();

const client = new LlmClient({ apiKey: process.env.OPENAI_API_KEY! });

try {
  const response = await client.chat({
    model: "openai/gpt-4o",
    messages: [{ role: "user", content: "Hello" }],
  });
  console.log(response.choices[0].message.content);
} catch (err) {
  // The WASM binding rejects with a plain Error whose message is formatted
  // as "HTTP {status}: {message}". Parse the status to branch on category.
  const message = err instanceof Error ? err.message : String(err);
  const match = message.match(/^HTTP (\d+):/);
  const status = match ? Number(match[1]) : null;

  if (status === 429) {
    console.error("rate limited:", message);
  } else if (status === 401 || status === 403) {
    console.error("auth failed:", message);
  } else if (status === 408 || (status !== null && status >= 500)) {
    console.error("transient error, retry with backoff:", message);
  } else if (message.includes("budget exceeded")) {
    console.error("budget exceeded:", message);
  } else {
    console.error("llm error:", message);
  }
}

Observability¶

The tracing middleware records an error.type span attribute on every failed request, set to the value returned by LiterLlmError::error_type(). The set of possible values matches the variant names in the table above. See Observability for the full span schema.