Skip to main content

LiteLLM Provider

The LiteLLM provider offers unified access to 100+ LLM providers with automatic fallbacks, cost tracking, and streaming support.

Features

  • 100+ LLM Providers: OpenAI, Anthropic, Google, Cohere, and more
  • Automatic Fallbacks: Seamless failover between providers
  • Cost Tracking: Real-time cost monitoring and optimization
  • Streaming Support: Server-Sent Events and WebSocket streaming
  • Load Balancing: Intelligent routing across providers

Quick Start

from packages.llm import LiteLLMProvider, LiteLLMConfig, RoutingStrategy

# Configure provider
config = LiteLLMConfig(
model="gpt-4",
fallback_models=["claude-3-opus", "gemini-pro"],
routing_strategy=RoutingStrategy.FALLBACK,
enable_streaming=True
)

# Create provider
llm = LiteLLMProvider(config)

# Use provider
response = await llm.ainvoke([
{"role": "user", "content": "Hello, world!"}
])

Configuration

Basic Configuration

config = LiteLLMConfig(
model="gpt-4", # Primary model
api_key="your-api-key", # API key
base_url="https://api.openai.com", # Base URL
temperature=0.7, # Temperature
max_tokens=500, # Max tokens
stream_timeout=60, # Stream timeout
fallback_models=["claude-3"], # Fallback models
routing_strategy=RoutingStrategy.FALLBACK,
enable_streaming=True
)

Advanced Configuration

config = LiteLLMConfig(
model="gpt-4",
fallback_models=["claude-3-opus", "gemini-pro", "llama-2"],
routing_strategy=RoutingStrategy.COST, # Cost-based routing
custom_llm_params={
"top_p": 0.9,
"frequency_penalty": 0.1,
"presence_penalty": 0.1
}
)

Routing Strategies

Fallback Strategy

config = LiteLLMConfig(
model="gpt-4",
fallback_models=["claude-3", "gemini-pro"],
routing_strategy=RoutingStrategy.FALLBACK
)

Cost-Based Routing

config = LiteLLMConfig(
model="gpt-4",
fallback_models=["claude-3", "gemini-pro"],
routing_strategy=RoutingStrategy.COST
)

Latency-Based Routing

config = LiteLLMConfig(
model="gpt-4",
fallback_models=["claude-3", "gemini-pro"],
routing_strategy=RoutingStrategy.LATENCY
)

Streaming

Server-Sent Events (SSE)

# Enable streaming
config = LiteLLMConfig(
model="gpt-4",
enable_streaming=True
)

llm = LiteLLMProvider(config)

# Stream response
async for chunk in llm.astream(messages):
print(chunk.choices[0].delta.content)

WebSocket Streaming

from packages.llm import StreamingHandler, StreamFormat

handler = StreamingHandler(llm, StreamFormat.WEBSOCKET)

# Stream over WebSocket
await handler.stream_websocket(websocket, messages)

Cost Tracking

# Track costs automatically
config = LiteLLMConfig(
model="gpt-4",
enable_cost_tracking=True
)

# Get cost information
cost_info = llm.get_cost_info()
print(f"Total cost: ${cost_info.total_cost}")
print(f"Token usage: {cost_info.total_tokens}")

Error Handling

try:
response = await llm.ainvoke(messages)
except Exception as e:
# Automatic fallback to next provider
print(f"Primary model failed: {e}")
# LiteLLM automatically tries fallback models

Supported Providers

OpenAI

config = LiteLLMConfig(
model="gpt-4",
api_key="your-openai-key"
)

Anthropic

config = LiteLLMConfig(
model="claude-3-opus",
api_key="your-anthropic-key"
)

Google

config = LiteLLMConfig(
model="gemini-pro",
api_key="your-google-key"
)

Local Models

config = LiteLLMConfig(
model="ollama/llama2",
base_url="http://localhost:11434"
)

Best Practices

  1. Use Fallbacks: Always configure fallback models for reliability
  2. Monitor Costs: Enable cost tracking for budget management
  3. Optimize Routing: Choose routing strategy based on your needs
  4. Handle Errors: Implement proper error handling for production
  5. Use Streaming: Enable streaming for better user experience

Migration from ProviderFactory

# Old way
from packages.llm import ProviderFactory
factory = ProviderFactory(config)
llm = factory.get_provider()

# New way
from packages.llm import LiteLLMProvider
llm = LiteLLMProvider(config)

API Reference

LiteLLMConfig

ParameterTypeDescription
modelstrPrimary model name
api_keystrAPI key for the model
base_urlstrBase URL for the API
temperaturefloatTemperature for generation
max_tokensintMaximum tokens to generate
fallback_modelsList[str]Fallback model names
routing_strategyRoutingStrategyRouting strategy
enable_streamingboolEnable streaming support

LiteLLMProvider

MethodDescription
invoke(messages)Synchronous invocation
ainvoke(messages)Asynchronous invocation
astream(messages)Asynchronous streaming
get_cost_info()Get cost information