Python API Reference¶

Installation¶

pip install liter-llm

Client¶

Constructor¶

from liter_llm import LlmClient

client = LlmClient(
    *,
    api_key: str,
    base_url: str | None = None,
    model_hint: str | None = None,
    max_retries: int = 3,
    timeout: int = 60,
)

All parameters are keyword-only.

Parameter	Type	Default	Description
`api_key`	`str`	required	API key for authentication
`base_url`	`str \\| None`	`None`	Override provider base URL
`model_hint`	`str \\| None`	`None`	Hint for provider auto-detection (e.g. `"groq/llama3-70b"`)
`max_retries`	`int`	`3`	Retries on 429 / 5xx responses
`timeout`	`int`	`60`	Request timeout in seconds

The client is immutable after construction and safe to share across tasks.

Methods¶

All methods are async and must be awaited.

`chat(**kwargs)`¶

Send a chat completion request.

async def chat(**kwargs) -> ChatCompletionResponse

Accepts the same keyword arguments as the OpenAI Chat Completions API (model, messages, temperature, max_tokens, etc.).

`chat_stream(**kwargs)`¶

Start a streaming chat completion. Returns an async iterator that yields ChatCompletionChunk objects. The HTTP request is issued immediately when the method is called.

async def chat_stream(**kwargs) -> ChatStreamIterator

Use with async for:

iterator = await client.chat_stream(model="gpt-4", messages=[...])
async for chunk in iterator:
    print(chunk.choices[0].delta.content, end="")

The iterator supports async with for deterministic resource cleanup:

async with await client.chat_stream(model="gpt-4", messages=[...]) as stream:
    async for chunk in stream:
        print(chunk.choices[0].delta.content, end="")

Call iterator.cancel() to signal the background task to stop early.

`embed(**kwargs)`¶

Send an embedding request.

async def embed(**kwargs) -> EmbeddingResponse

Accepts model, input, encoding_format, dimensions, user.

`list_models()`¶

List available models from the provider.

async def list_models() -> ModelsListResponse

`image_generate(**kwargs)`¶

Generate images from a text prompt.

async def image_generate(**kwargs) -> ImagesResponse

Accepts prompt, model, n, size, quality, response_format, style, user.

`speech(**kwargs)`¶

Generate speech audio from text.

async def speech(**kwargs) -> bytes

Accepts model, input, voice, response_format, speed. Returns raw audio bytes.

`transcribe(**kwargs)`¶

Transcribe audio into text.

async def transcribe(**kwargs) -> TranscriptionResponse

Accepts model, file, language, prompt, response_format, temperature.

`moderate(**kwargs)`¶

Classify content for policy violations.

async def moderate(**kwargs) -> ModerationResponse

Accepts input, model.

`rerank(**kwargs)`¶

Rerank documents by relevance to a query.

async def rerank(**kwargs) -> RerankResponse

Accepts model, query, documents, top_n.

`create_file(**kwargs)`¶

Upload a file.

async def create_file(**kwargs) -> dict

Accepts file, purpose, filename. Returns a dict with file object fields.

`retrieve_file(file_id)`¶

Retrieve metadata about an uploaded file.

async def retrieve_file(file_id: str) -> dict

`delete_file(file_id)`¶

Delete an uploaded file.

async def delete_file(file_id: str) -> dict

`list_files(**kwargs)`¶

List uploaded files. Optional keyword arguments: purpose, limit, after.

async def list_files(**kwargs) -> dict

`file_content(file_id)`¶

Download the content of an uploaded file.

async def file_content(file_id: str) -> bytes

`create_batch(**kwargs)`¶

Create a new batch.

async def create_batch(**kwargs) -> dict

Accepts input_file_id, endpoint, completion_window, metadata.

`retrieve_batch(batch_id)`¶

Retrieve a batch by ID.

async def retrieve_batch(batch_id: str) -> dict

`list_batches(**kwargs)`¶

List batches. Optional keyword arguments: limit, after.

async def list_batches(**kwargs) -> dict

`cancel_batch(batch_id)`¶

Cancel a batch.

async def cancel_batch(batch_id: str) -> dict

`create_response(**kwargs)`¶

Create a new response via the Responses API.

async def create_response(**kwargs) -> dict

Accepts model, input, instructions, max_output_tokens, temperature, top_p.

`retrieve_response(response_id)`¶

Retrieve a response by ID.

async def retrieve_response(response_id: str) -> dict

`cancel_response(response_id)`¶

Cancel a response.

async def cancel_response(response_id: str) -> dict

Types¶

`ChatCompletionResponse`¶

Field	Type	Description
`id`	`str`	Response ID
`model`	`str`	Model used
`choices`	`list[Choice]`	Completion choices
`usage`	`Usage \\| None`	Token usage
`created`	`int`	Unix timestamp

`Choice`¶

Field	Type	Description
`index`	`int`	Choice index
`message`	`AssistantMessage`	The assistant's message
`finish_reason`	`str \\| None`	Why generation stopped (`stop`, `length`, `tool_calls`)

`AssistantMessage`¶

Field	Type	Description
`content`	`str \\| None`	Text content
`tool_calls`	`list[ToolCall] \\| None`	Tool calls made by the assistant
`refusal`	`str \\| None`	Refusal message

`ChatCompletionChunk`¶

Yielded by chat_stream().

Field	Type	Description
`id`	`str`	Response ID
`model`	`str`	Model used
`choices`	`list[StreamChoice]`	Stream choices with deltas
`usage`	`Usage \\| None`	Token usage (final chunk only)

`Usage`¶

Field	Type	Description
`prompt_tokens`	`int`	Tokens consumed by the prompt
`completion_tokens`	`int`	Tokens consumed by the completion
`total_tokens`	`int`	Total tokens

`EmbeddingResponse`¶

Field	Type	Description
`data`	`list[EmbeddingObject]`	Embedding vectors
`model`	`str`	Model used
`usage`	`Usage`	Token usage

`ModelsListResponse`¶

Field	Type	Description
`data`	`list[ModelObject]`	Available models

Error Handling¶

All errors are raised as Python exceptions inheriting from liter_llm.LlmError (which itself inherits from Exception). Invalid arguments to the constructor or malformed keyword arguments raise ValueError.

Exception	Trigger
`LlmError`	Base class for all liter-llm errors
`AuthenticationError`	API key rejected (HTTP 401/403)
`RateLimitedError`	Rate limit exceeded (HTTP 429)
`BadRequestError`	Malformed request (HTTP 400)
`ContextWindowExceededError`	Prompt exceeds context window (subclass of `BadRequestError`)
`ContentPolicyError`	Content policy violation (subclass of `BadRequestError`)
`NotFoundError`	Model/resource not found (HTTP 404)
`ServerError`	Provider 5xx error
`ServiceUnavailableError`	Provider temporarily unavailable (HTTP 502/503)
`LlmTimeoutError`	Request timed out
`NetworkError`	Network-level failure
`StreamingError`	Error reading streaming response
`EndpointNotSupportedError`	Provider does not support the endpoint
`InvalidHeaderError`	Custom header name or value is invalid
`SerializationError`	JSON serialization/deserialization failure

from liter_llm import LlmError, RateLimitedError, AuthenticationError

try:
    response = await client.chat(model="gpt-4", messages=[...])
except ValueError as e:
    # Invalid arguments (malformed keyword args, missing fields)
    print(f"Bad request: {e}")
except RateLimitedError as e:
    print(f"Rate limited: {e}")
except AuthenticationError as e:
    print(f"Auth failed: {e}")
except LlmError as e:
    # Catch-all for other liter-llm errors
    print(f"Error: {e}")

Example¶

import asyncio
from liter_llm import LlmClient

async def main():
    client = LlmClient(api_key="sk-...")

    # Non-streaming
    response = await client.chat(
        model="gpt-4",
        messages=[{"role": "user", "content": "Hello!"}],
        max_tokens=256,
    )
    print(response.choices[0].message.content)

    # Streaming
    async with await client.chat_stream(
        model="gpt-4",
        messages=[{"role": "user", "content": "Tell me a joke"}],
    ) as stream:
        async for chunk in stream:
            delta = chunk.choices[0].delta
            if delta.content:
                print(delta.content, end="", flush=True)

asyncio.run(main())

Python API Reference¶

Installation¶

Client¶

Constructor¶

Methods¶

chat(**kwargs)¶

chat_stream(**kwargs)¶

embed(**kwargs)¶

list_models()¶

image_generate(**kwargs)¶

speech(**kwargs)¶

transcribe(**kwargs)¶

moderate(**kwargs)¶

rerank(**kwargs)¶

create_file(**kwargs)¶

retrieve_file(file_id)¶

delete_file(file_id)¶

list_files(**kwargs)¶

file_content(file_id)¶

create_batch(**kwargs)¶

retrieve_batch(batch_id)¶

list_batches(**kwargs)¶

cancel_batch(batch_id)¶

create_response(**kwargs)¶

retrieve_response(response_id)¶

cancel_response(response_id)¶