LiteLLM Provider
The LiteLLM provider offers unified access to 100+ LLM providers with automatic fallbacks, cost tracking, and streaming support.
Features
- 100+ LLM Providers: OpenAI, Anthropic, Google, Cohere, and more
- Automatic Fallbacks: Seamless failover between providers
- Cost Tracking: Real-time cost monitoring and optimization
- Streaming Support: Server-Sent Events and WebSocket streaming
- Load Balancing: Intelligent routing across providers
Quick Start
from packages.llm import LiteLLMProvider, LiteLLMConfig, RoutingStrategy
# Configure provider
config = LiteLLMConfig(
model="gpt-4",
fallback_models=["claude-3-opus", "gemini-pro"],
routing_strategy=RoutingStrategy.FALLBACK,
enable_streaming=True
)
# Create provider
llm = LiteLLMProvider(config)
# Use provider
response = await llm.ainvoke([
{"role": "user", "content": "Hello, world!"}
])
Configuration
Basic Configuration
config = LiteLLMConfig(
model="gpt-4", # Primary model
api_key="your-api-key", # API key
base_url="https://api.openai.com", # Base URL
temperature=0.7, # Temperature
max_tokens=500, # Max tokens
stream_timeout=60, # Stream timeout
fallback_models=["claude-3"], # Fallback models
routing_strategy=RoutingStrategy.FALLBACK,
enable_streaming=True
)
Advanced Configuration
config = LiteLLMConfig(
model="gpt-4",
fallback_models=["claude-3-opus", "gemini-pro", "llama-2"],
routing_strategy=RoutingStrategy.COST, # Cost-based routing
custom_llm_params={
"top_p": 0.9,
"frequency_penalty": 0.1,
"presence_penalty": 0.1
}
)
Routing Strategies
Fallback Strategy
config = LiteLLMConfig(
model="gpt-4",
fallback_models=["claude-3", "gemini-pro"],
routing_strategy=RoutingStrategy.FALLBACK
)
Cost-Based Routing
config = LiteLLMConfig(
model="gpt-4",
fallback_models=["claude-3", "gemini-pro"],
routing_strategy=RoutingStrategy.COST
)
Latency-Based Routing
config = LiteLLMConfig(
model="gpt-4",
fallback_models=["claude-3", "gemini-pro"],
routing_strategy=RoutingStrategy.LATENCY
)
Streaming
Server-Sent Events (SSE)
# Enable streaming
config = LiteLLMConfig(
model="gpt-4",
enable_streaming=True
)
llm = LiteLLMProvider(config)
# Stream response
async for chunk in llm.astream(messages):
print(chunk.choices[0].delta.content)
WebSocket Streaming
from packages.llm import StreamingHandler, StreamFormat
handler = StreamingHandler(llm, StreamFormat.WEBSOCKET)
# Stream over WebSocket
await handler.stream_websocket(websocket, messages)
Cost Tracking
# Track costs automatically
config = LiteLLMConfig(
model="gpt-4",
enable_cost_tracking=True
)
# Get cost information
cost_info = llm.get_cost_info()
print(f"Total cost: ${cost_info.total_cost}")
print(f"Token usage: {cost_info.total_tokens}")
Error Handling
try:
response = await llm.ainvoke(messages)
except Exception as e:
# Automatic fallback to next provider
print(f"Primary model failed: {e}")
# LiteLLM automatically tries fallback models
Supported Providers
OpenAI
config = LiteLLMConfig(
model="gpt-4",
api_key="your-openai-key"
)
Anthropic
config = LiteLLMConfig(
model="claude-3-opus",
api_key="your-anthropic-key"
)
Google
config = LiteLLMConfig(
model="gemini-pro",
api_key="your-google-key"
)
Local Models
config = LiteLLMConfig(
model="ollama/llama2",
base_url="http://localhost:11434"
)
Best Practices
- Use Fallbacks: Always configure fallback models for reliability
- Monitor Costs: Enable cost tracking for budget management
- Optimize Routing: Choose routing strategy based on your needs
- Handle Errors: Implement proper error handling for production
- Use Streaming: Enable streaming for better user experience
Migration from ProviderFactory
# Old way
from packages.llm import ProviderFactory
factory = ProviderFactory(config)
llm = factory.get_provider()
# New way
from packages.llm import LiteLLMProvider
llm = LiteLLMProvider(config)
API Reference
LiteLLMConfig
| Parameter | Type | Description |
|---|---|---|
model | str | Primary model name |
api_key | str | API key for the model |
base_url | str | Base URL for the API |
temperature | float | Temperature for generation |
max_tokens | int | Maximum tokens to generate |
fallback_models | List[str] | Fallback model names |
routing_strategy | RoutingStrategy | Routing strategy |
enable_streaming | bool | Enable streaming support |
LiteLLMProvider
| Method | Description |
|---|---|
invoke(messages) | Synchronous invocation |
ainvoke(messages) | Asynchronous invocation |
astream(messages) | Asynchronous streaming |
get_cost_info() | Get cost information |