LiteLLM Provider

The LiteLLM provider offers unified access to 100+ LLM providers with automatic fallbacks, cost tracking, and streaming support.

Features

100+ LLM Providers: OpenAI, Anthropic, Google, Cohere, and more
Automatic Fallbacks: Seamless failover between providers
Cost Tracking: Real-time cost monitoring and optimization
Streaming Support: Server-Sent Events and WebSocket streaming
Load Balancing: Intelligent routing across providers

Quick Start

from packages.llm import LiteLLMProvider, LiteLLMConfig, RoutingStrategy

# Configure provider
config = LiteLLMConfig(
    model="gpt-4",
    fallback_models=["claude-3-opus", "gemini-pro"],
    routing_strategy=RoutingStrategy.FALLBACK,
    enable_streaming=True
)

# Create provider
llm = LiteLLMProvider(config)

# Use provider
response = await llm.ainvoke([
    {"role": "user", "content": "Hello, world!"}
])

Configuration

Basic Configuration

config = LiteLLMConfig(
    model="gpt-4",                    # Primary model
    api_key="your-api-key",          # API key
    base_url="https://api.openai.com", # Base URL
    temperature=0.7,                  # Temperature
    max_tokens=500,                   # Max tokens
    stream_timeout=60,                # Stream timeout
    fallback_models=["claude-3"],     # Fallback models
    routing_strategy=RoutingStrategy.FALLBACK,
    enable_streaming=True
)

Advanced Configuration

config = LiteLLMConfig(
    model="gpt-4",
    fallback_models=["claude-3-opus", "gemini-pro", "llama-2"],
    routing_strategy=RoutingStrategy.COST,  # Cost-based routing
    custom_llm_params={
        "top_p": 0.9,
        "frequency_penalty": 0.1,
        "presence_penalty": 0.1
    }
)

Routing Strategies

Fallback Strategy

config = LiteLLMConfig(
    model="gpt-4",
    fallback_models=["claude-3", "gemini-pro"],
    routing_strategy=RoutingStrategy.FALLBACK
)

Cost-Based Routing

config = LiteLLMConfig(
    model="gpt-4",
    fallback_models=["claude-3", "gemini-pro"],
    routing_strategy=RoutingStrategy.COST
)

Latency-Based Routing

config = LiteLLMConfig(
    model="gpt-4",
    fallback_models=["claude-3", "gemini-pro"],
    routing_strategy=RoutingStrategy.LATENCY
)

Streaming

Server-Sent Events (SSE)

# Enable streaming
config = LiteLLMConfig(
    model="gpt-4",
    enable_streaming=True
)

llm = LiteLLMProvider(config)

# Stream response
async for chunk in llm.astream(messages):
    print(chunk.choices[0].delta.content)

WebSocket Streaming

from packages.llm import StreamingHandler, StreamFormat

handler = StreamingHandler(llm, StreamFormat.WEBSOCKET)

# Stream over WebSocket
await handler.stream_websocket(websocket, messages)

Cost Tracking

# Track costs automatically
config = LiteLLMConfig(
    model="gpt-4",
    enable_cost_tracking=True
)

# Get cost information
cost_info = llm.get_cost_info()
print(f"Total cost: ${cost_info.total_cost}")
print(f"Token usage: {cost_info.total_tokens}")

Error Handling

try:
    response = await llm.ainvoke(messages)
except Exception as e:
    # Automatic fallback to next provider
    print(f"Primary model failed: {e}")
    # LiteLLM automatically tries fallback models

Supported Providers

OpenAI

config = LiteLLMConfig(
    model="gpt-4",
    api_key="your-openai-key"
)

Anthropic

config = LiteLLMConfig(
    model="claude-3-opus",
    api_key="your-anthropic-key"
)

Google

config = LiteLLMConfig(
    model="gemini-pro",
    api_key="your-google-key"
)

Local Models

config = LiteLLMConfig(
    model="ollama/llama2",
    base_url="http://localhost:11434"
)

Best Practices

Use Fallbacks: Always configure fallback models for reliability
Monitor Costs: Enable cost tracking for budget management
Optimize Routing: Choose routing strategy based on your needs
Handle Errors: Implement proper error handling for production
Use Streaming: Enable streaming for better user experience

Migration from ProviderFactory

# Old way
from packages.llm import ProviderFactory
factory = ProviderFactory(config)
llm = factory.get_provider()

# New way
from packages.llm import LiteLLMProvider
llm = LiteLLMProvider(config)

API Reference

LiteLLMConfig

Parameter	Type	Description
`model`	str	Primary model name
`api_key`	str	API key for the model
`base_url`	str	Base URL for the API
`temperature`	float	Temperature for generation
`max_tokens`	int	Maximum tokens to generate
`fallback_models`	List[str]	Fallback model names
`routing_strategy`	RoutingStrategy	Routing strategy
`enable_streaming`	bool	Enable streaming support

LiteLLMProvider

Method	Description
`invoke(messages)`	Synchronous invocation
`ainvoke(messages)`	Asynchronous invocation
`astream(messages)`	Asynchronous streaming
`get_cost_info()`	Get cost information

Features​

Quick Start​

Configuration​

Basic Configuration​

Advanced Configuration​

Routing Strategies​

Fallback Strategy​

Cost-Based Routing​

Latency-Based Routing​

Streaming​

Server-Sent Events (SSE)​

WebSocket Streaming​

Cost Tracking​

Error Handling​

Supported Providers​

OpenAI​

Anthropic​

Google​

Local Models​

Best Practices​

Migration from ProviderFactory​

API Reference​

LiteLLMConfig​

LiteLLMProvider​

Features

Quick Start

Configuration

Basic Configuration

Advanced Configuration

Routing Strategies

Fallback Strategy

Cost-Based Routing

Latency-Based Routing

Streaming

Server-Sent Events (SSE)

WebSocket Streaming

Cost Tracking

Error Handling

Supported Providers

OpenAI

Anthropic

Google

Local Models

Best Practices

Migration from ProviderFactory

API Reference

LiteLLMConfig

LiteLLMProvider