Python API Reference¶
Installation¶
Client¶
Constructor¶
from liter_llm import LlmClient
client = LlmClient(
*,
api_key: str,
base_url: str | None = None,
model_hint: str | None = None,
max_retries: int = 3,
timeout: int = 60,
)
All parameters are keyword-only.
| Parameter | Type | Default | Description |
|---|---|---|---|
api_key |
str |
required | API key for authentication |
base_url |
str \| None |
None |
Override provider base URL |
model_hint |
str \| None |
None |
Hint for provider auto-detection (e.g. "groq/llama3-70b") |
max_retries |
int |
3 |
Retries on 429 / 5xx responses |
timeout |
int |
60 |
Request timeout in seconds |
The client is immutable after construction and safe to share across tasks.
Methods¶
All methods are async and must be awaited.
chat(**kwargs)¶
Send a chat completion request.
Accepts the same keyword arguments as the OpenAI Chat Completions API (model, messages, temperature, max_tokens, etc.).
chat_stream(**kwargs)¶
Start a streaming chat completion. Returns an async iterator that yields ChatCompletionChunk objects. The HTTP request is issued immediately when the method is called.
Use with async for:
iterator = await client.chat_stream(model="gpt-4", messages=[...])
async for chunk in iterator:
print(chunk.choices[0].delta.content, end="")
The iterator supports async with for deterministic resource cleanup:
async with await client.chat_stream(model="gpt-4", messages=[...]) as stream:
async for chunk in stream:
print(chunk.choices[0].delta.content, end="")
Call iterator.cancel() to signal the background task to stop early.
embed(**kwargs)¶
Send an embedding request.
Accepts model, input, encoding_format, dimensions, user.
list_models()¶
List available models from the provider.
image_generate(**kwargs)¶
Generate images from a text prompt.
Accepts prompt, model, n, size, quality, response_format, style, user.
speech(**kwargs)¶
Generate speech audio from text.
Accepts model, input, voice, response_format, speed. Returns raw audio bytes.
transcribe(**kwargs)¶
Transcribe audio into text.
Accepts model, file, language, prompt, response_format, temperature.
moderate(**kwargs)¶
Classify content for policy violations.
Accepts input, model.
rerank(**kwargs)¶
Rerank documents by relevance to a query.
Accepts model, query, documents, top_n.
create_file(**kwargs)¶
Upload a file.
Accepts file, purpose, filename. Returns a dict with file object fields.
retrieve_file(file_id)¶
Retrieve metadata about an uploaded file.
delete_file(file_id)¶
Delete an uploaded file.
list_files(**kwargs)¶
List uploaded files. Optional keyword arguments: purpose, limit, after.
file_content(file_id)¶
Download the content of an uploaded file.
create_batch(**kwargs)¶
Create a new batch.
Accepts input_file_id, endpoint, completion_window, metadata.
retrieve_batch(batch_id)¶
Retrieve a batch by ID.
list_batches(**kwargs)¶
List batches. Optional keyword arguments: limit, after.
cancel_batch(batch_id)¶
Cancel a batch.
create_response(**kwargs)¶
Create a new response via the Responses API.
Accepts model, input, instructions, max_output_tokens, temperature, top_p.
retrieve_response(response_id)¶
Retrieve a response by ID.
cancel_response(response_id)¶
Cancel a response.
Types¶
ChatCompletionResponse¶
| Field | Type | Description |
|---|---|---|
id |
str |
Response ID |
model |
str |
Model used |
choices |
list[Choice] |
Completion choices |
usage |
Usage \| None |
Token usage |
created |
int |
Unix timestamp |
Choice¶
| Field | Type | Description |
|---|---|---|
index |
int |
Choice index |
message |
AssistantMessage |
The assistant's message |
finish_reason |
str \| None |
Why generation stopped (stop, length, tool_calls) |
AssistantMessage¶
| Field | Type | Description |
|---|---|---|
content |
str \| None |
Text content |
tool_calls |
list[ToolCall] \| None |
Tool calls made by the assistant |
refusal |
str \| None |
Refusal message |
ChatCompletionChunk¶
Yielded by chat_stream().
| Field | Type | Description |
|---|---|---|
id |
str |
Response ID |
model |
str |
Model used |
choices |
list[StreamChoice] |
Stream choices with deltas |
usage |
Usage \| None |
Token usage (final chunk only) |
Usage¶
| Field | Type | Description |
|---|---|---|
prompt_tokens |
int |
Tokens consumed by the prompt |
completion_tokens |
int |
Tokens consumed by the completion |
total_tokens |
int |
Total tokens |
EmbeddingResponse¶
| Field | Type | Description |
|---|---|---|
data |
list[EmbeddingObject] |
Embedding vectors |
model |
str |
Model used |
usage |
Usage |
Token usage |
ModelsListResponse¶
| Field | Type | Description |
|---|---|---|
data |
list[ModelObject] |
Available models |
Error Handling¶
All errors are raised as Python exceptions inheriting from liter_llm.LlmError (which itself inherits from Exception). Invalid arguments to the constructor or malformed keyword arguments raise ValueError.
| Exception | Trigger |
|---|---|
LlmError |
Base class for all liter-llm errors |
AuthenticationError |
API key rejected (HTTP 401/403) |
RateLimitedError |
Rate limit exceeded (HTTP 429) |
BadRequestError |
Malformed request (HTTP 400) |
ContextWindowExceededError |
Prompt exceeds context window (subclass of BadRequestError) |
ContentPolicyError |
Content policy violation (subclass of BadRequestError) |
NotFoundError |
Model/resource not found (HTTP 404) |
ServerError |
Provider 5xx error |
ServiceUnavailableError |
Provider temporarily unavailable (HTTP 502/503) |
LlmTimeoutError |
Request timed out |
NetworkError |
Network-level failure |
StreamingError |
Error reading streaming response |
EndpointNotSupportedError |
Provider does not support the endpoint |
InvalidHeaderError |
Custom header name or value is invalid |
SerializationError |
JSON serialization/deserialization failure |
from liter_llm import LlmError, RateLimitedError, AuthenticationError
try:
response = await client.chat(model="gpt-4", messages=[...])
except ValueError as e:
# Invalid arguments (malformed keyword args, missing fields)
print(f"Bad request: {e}")
except RateLimitedError as e:
print(f"Rate limited: {e}")
except AuthenticationError as e:
print(f"Auth failed: {e}")
except LlmError as e:
# Catch-all for other liter-llm errors
print(f"Error: {e}")
Example¶
import asyncio
from liter_llm import LlmClient
async def main():
client = LlmClient(api_key="sk-...")
# Non-streaming
response = await client.chat(
model="gpt-4",
messages=[{"role": "user", "content": "Hello!"}],
max_tokens=256,
)
print(response.choices[0].message.content)
# Streaming
async with await client.chat_stream(
model="gpt-4",
messages=[{"role": "user", "content": "Tell me a joke"}],
) as stream:
async for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
asyncio.run(main())