Cache Configuration Guide
This guide covers all configuration options for the Intelligent Caching System, including cache layers, semantic matching, warming strategies, and performance tuning.
Table of Contents
- Basic Configuration
- Cache Layer Settings
- Semantic Matching Configuration
- Cache Warming Settings
- Distributed Caching
- Performance Optimization
- Monitoring Configuration
- Environment Variables
- Configuration Examples
Basic Configuration
CacheConfig
The main configuration class that controls all caching behavior:
from packages.caching import CacheConfig
config = CacheConfig(
# General settings
max_size_bytes=1024 * 1024 * 1024, # 1GB
max_entries=10000,
default_ttl_seconds=3600, # 1 hour
cleanup_interval_seconds=300, # 5 minutes
# Semantic matching
semantic_threshold=0.85,
max_semantic_candidates=10,
embedding_dimension=1536,
# Warming settings
warming_enabled=True,
warming_batch_size=100,
warming_interval_seconds=1800, # 30 minutes
pattern_analysis_window_hours=24,
# Compression
compression_enabled=True,
compression_threshold_bytes=1024, # 1KB
compression_algorithm="gzip",
# Eviction
eviction_policy="lru", # lru, lfu, ttl, hybrid
eviction_threshold=0.8, # Start evicting at 80% capacity
# Distributed caching
distributed_enabled=False,
cluster_nodes=[],
replication_factor=2,
consistency_level="eventual", # eventual, strong
# Monitoring
metrics_enabled=True,
detailed_logging=False,
performance_tracking=True
)
Configuration Parameters
Parameter | Type | Default | Description |
---|---|---|---|
max_size_bytes | int | 1GB | Maximum memory usage for cache |
max_entries | int | 10000 | Maximum number of cache entries |
default_ttl_seconds | int | 3600 | Default time-to-live for entries |
cleanup_interval_seconds | int | 300 | Background cleanup interval |
semantic_threshold | float | 0.85 | Minimum similarity for semantic matches |
max_semantic_candidates | int | 10 | Maximum candidates for semantic search |
embedding_dimension | int | 1536 | Expected embedding dimension |
warming_enabled | bool | True | Enable cache warming |
warming_batch_size | int | 100 | Batch size for warming operations |
warming_interval_seconds | int | 1800 | Warming operation interval |
pattern_analysis_window_hours | int | 24 | Time window for pattern analysis |
compression_enabled | bool | True | Enable data compression |
compression_threshold_bytes | int | 1024 | Minimum size for compression |
compression_algorithm | str | "gzip" | Compression algorithm |
eviction_policy | str | "lru" | Cache eviction policy |
eviction_threshold | float | 0.8 | Eviction trigger threshold |
distributed_enabled | bool | False | Enable distributed caching |
cluster_nodes | List[str] | [] | List of cluster node addresses |
replication_factor | int | 2 | Number of replicas per entry |
consistency_level | str | "eventual" | Consistency level for distributed cache |
metrics_enabled | bool | True | Enable metrics collection |
detailed_logging | bool | False | Enable detailed logging |
performance_tracking | bool | True | Enable performance tracking |
Cache Layer Settings
Embedding Cache Configuration
from packages.caching import EmbeddingCache
# Create embedding cache with custom settings
embedding_cache = EmbeddingCache(
cache_manager=cache_manager,
config=CacheConfig(
# Embedding-specific settings
default_ttl_seconds=7200, # 2 hours for embeddings
semantic_threshold=0.9, # Higher threshold for embeddings
max_semantic_candidates=5 # Fewer candidates for embeddings
)
)
Search Result Cache Configuration
from packages.caching import SearchResultCache
# Create search result cache
search_cache = SearchResultCache(
cache_manager=cache_manager,
config=CacheConfig(
# Search-specific settings
default_ttl_seconds=1800, # 30 minutes for search results
semantic_threshold=0.8, # Moderate threshold for search
max_semantic_candidates=10
)
)
LLM Response Cache Configuration
from packages.caching import LLMResponseCache
# Create LLM response cache
llm_cache = LLMResponseCache(
cache_manager=cache_manager,
config=CacheConfig(
# LLM-specific settings
default_ttl_seconds=3600, # 1 hour for LLM responses
semantic_threshold=0.85, # High threshold for LLM responses
max_semantic_candidates=3 # Fewer candidates for LLM responses
)
)
Semantic Matching Configuration
Similarity Metrics
from packages.caching.semantic import SimilarityMetric, EmbeddingSimilarity
# Configure similarity metric
similarity = EmbeddingSimilarity(
metric=SimilarityMetric.COSINE # or EUCLIDEAN, DOT_PRODUCT, MANHATTAN
)
# Configure semantic matcher
from packages.caching.semantic import SemanticMatcher
matcher = SemanticMatcher(
similarity_threshold=0.85,
max_candidates=10,
metric=SimilarityMetric.COSINE
)
Query Similarity Settings
from packages.caching.semantic import QuerySimilarity
query_similarity = QuerySimilarity(
similarity_threshold=0.8 # Threshold for query similarity
)
Cache Key Generation
from packages.caching.semantic import SemanticCacheKey
# Generate keys for different content types
embedding_key = SemanticCacheKey.generate_embedding_key(
text="What is machine learning?",
model_name="text-embedding-ada-002"
)
query_key = SemanticCacheKey.generate_query_key(
query="What is ML?",
context={"user_id": "123", "session": "abc"}
)
search_key = SemanticCacheKey.generate_search_key(
query="machine learning",
filters={"category": "AI", "year": 2023},
limit=10
)
llm_key = SemanticCacheKey.generate_llm_key(
prompt="Explain machine learning",
model="gpt-4",
temperature=0.1,
max_tokens=1000
)
Cache Warming Settings
Warming Strategy Configuration
from packages.caching.warming import WarmingStrategy
# Create warming strategy
strategy = WarmingStrategy(
enabled=True,
batch_size=100,
interval_seconds=1800, # 30 minutes
max_concurrent=10,
priority_threshold=0.7,
time_window_hours=24
)
# Add strategy to warmer
from packages.caching.warming import PredictiveWarmer
warmer = PredictiveWarmer(config)
warmer.add_warming_strategy("default", strategy)
Pattern Analysis Configuration
from packages.caching.warming import QueryPatternAnalyzer
analyzer = QueryPatternAnalyzer(config)
# Configure pattern analysis
analysis = analyzer.analyze_patterns(
time_window_hours=24,
min_frequency=2
)
User Behavior Analysis
from packages.caching.warming import UserBehaviorAnalyzer
behavior_analyzer = UserBehaviorAnalyzer(config)
# Track user actions
behavior_analyzer.track_user_action(
user_id="user123",
action="query",
context={"query": "What is ML?", "source": "web"}
)
# Analyze behavior
behavior = behavior_analyzer.analyze_user_behavior("user123")
Distributed Caching
Cluster Configuration
from packages.caching.distributed import CacheNode, CacheCluster
# Create cluster nodes
nodes = [
CacheNode(
node_id="node1",
host="cache1.example.com",
port=6379
),
CacheNode(
node_id="node2",
host="cache2.example.com",
port=6379
),
CacheNode(
node_id="node3",
host="cache3.example.com",
port=6379
)
]
# Create cluster
cluster = CacheCluster(config)
for node in nodes:
await cluster.add_node(node)
# Start health monitoring
await cluster.start_health_checks()
Replication Settings
from packages.caching.distributed import CacheReplication
replication = CacheReplication(config)
# Configure replication
config.replication_factor = 3
config.consistency_level = "strong" # or "eventual"
# Start replication
await replication.start()
Distributed Cache Backend
from packages.caching.distributed import DistributedCache
distributed_cache = DistributedCache(config)
await distributed_cache.initialize()
# Get backend for specific layer
backend = distributed_cache.get_backend(CacheLayer.EMBEDDING)
Performance Optimization
Memory Optimization
from packages.caching.optimization import MemoryOptimizer, OptimizationConfig
opt_config = OptimizationConfig(
memory_limit_bytes=2 * 1024 * 1024 * 1024, # 2GB
memory_cleanup_interval=300, # 5 minutes
memory_pressure_threshold=0.8 # 80%
)
memory_optimizer = MemoryOptimizer(opt_config)
Compression Settings
from packages.caching.optimization import CompressionEngine, CompressionAlgorithm
compression_engine = CompressionEngine(OptimizationConfig(
compression_enabled=True,
compression_threshold_bytes=1024,
compression_algorithm=CompressionAlgorithm.GZIP,
compression_level=6
))
Eviction Policies
from packages.caching.optimization import EvictionPolicy, EvictionPolicyManager
eviction_manager = EvictionPolicyManager(OptimizationConfig(
eviction_policy=EvictionPolicy.HYBRID, # or LRU, LFU, TTL, SIZE_BASED, COST_BASED
eviction_threshold=0.9,
eviction_batch_size=100
))
Cache Optimizer
from packages.caching.optimization import CacheOptimizer
optimizer = CacheOptimizer(OptimizationConfig(
compression_enabled=True,
memory_limit_bytes=1024 * 1024 * 1024,
eviction_policy=EvictionPolicy.HYBRID
))
await optimizer.start()
Monitoring Configuration
Cache Monitor
from packages.caching.monitoring import CacheMonitor
monitor = CacheMonitor(config)
await monitor.start()
# Record cache operations
await monitor.record_request(
layer=CacheLayer.EMBEDDING,
is_hit=True,
response_time_ms=50.0
)
Analytics Configuration
from packages.caching.monitoring import CacheAnalytics
analytics = CacheAnalytics(monitor)
await analytics.start()
# Get insights
insights = analytics.get_insights()
performance_summary = analytics.get_performance_summary()
Dashboard Setup
from packages.caching.monitoring import CacheDashboard
dashboard = CacheDashboard(monitor, analytics)
await dashboard.start()
# Get dashboard data
dashboard_data = dashboard.get_dashboard_data()
layer_data = dashboard.get_layer_dashboard(CacheLayer.EMBEDDING)
Environment Variables
Basic Settings
# Cache size and limits
CACHE_MAX_SIZE_BYTES=1073741824 # 1GB
CACHE_MAX_ENTRIES=10000
CACHE_DEFAULT_TTL_SECONDS=3600
# Semantic matching
CACHE_SEMANTIC_THRESHOLD=0.85
CACHE_MAX_SEMANTIC_CANDIDATES=10
CACHE_EMBEDDING_DIMENSION=1536
# Warming
CACHE_WARMING_ENABLED=true
CACHE_WARMING_BATCH_SIZE=100
CACHE_WARMING_INTERVAL_SECONDS=1800
# Compression
CACHE_COMPRESSION_ENABLED=true
CACHE_COMPRESSION_THRESHOLD_BYTES=1024
CACHE_COMPRESSION_ALGORITHM=gzip
# Eviction
CACHE_EVICTION_POLICY=lru
CACHE_EVICTION_THRESHOLD=0.8
# Distributed caching
CACHE_DISTRIBUTED_ENABLED=false
CACHE_CLUSTER_NODES=node1:6379,node2:6379,node3:6379
CACHE_REPLICATION_FACTOR=2
CACHE_CONSISTENCY_LEVEL=eventual
# Monitoring
CACHE_METRICS_ENABLED=true
CACHE_DETAILED_LOGGING=false
CACHE_PERFORMANCE_TRACKING=true
Advanced Settings
# Memory optimization
CACHE_MEMORY_LIMIT_BYTES=2147483648 # 2GB
CACHE_MEMORY_CLEANUP_INTERVAL=300
CACHE_MEMORY_PRESSURE_THRESHOLD=0.8
# Performance tuning
CACHE_MAX_CONCURRENT_OPERATIONS=100
CACHE_OPERATION_TIMEOUT_SECONDS=30
CACHE_BATCH_PROCESSING_SIZE=50
# Pattern analysis
CACHE_PATTERN_ANALYSIS_WINDOW_HOURS=24
CACHE_MIN_FREQUENCY_THRESHOLD=2
# Health checks
CACHE_HEALTH_CHECK_INTERVAL=30
CACHE_NODE_TIMEOUT_SECONDS=30
Configuration Examples
Development Environment
# Development configuration - smaller, faster
dev_config = CacheConfig(
max_size_bytes=256 * 1024 * 1024, # 256MB
max_entries=1000,
default_ttl_seconds=1800, # 30 minutes
semantic_threshold=0.8,
warming_enabled=False, # Disable warming in dev
compression_enabled=False, # Disable compression in dev
detailed_logging=True
)
Production Environment
# Production configuration - optimized for performance
prod_config = CacheConfig(
max_size_bytes=4 * 1024 * 1024 * 1024, # 4GB
max_entries=50000,
default_ttl_seconds=7200, # 2 hours
semantic_threshold=0.85,
warming_enabled=True,
warming_batch_size=200,
compression_enabled=True,
eviction_policy="hybrid",
distributed_enabled=True,
cluster_nodes=[
{"host": "cache1.prod.com", "port": 6379},
{"host": "cache2.prod.com", "port": 6379},
{"host": "cache3.prod.com", "port": 6379}
],
replication_factor=3,
consistency_level="strong",
metrics_enabled=True,
performance_tracking=True
)
High-Performance Environment
# High-performance configuration - maximum speed
perf_config = CacheConfig(
max_size_bytes=8 * 1024 * 1024 * 1024, # 8GB
max_entries=100000,
default_ttl_seconds=14400, # 4 hours
semantic_threshold=0.9,
max_semantic_candidates=5,
warming_enabled=True,
warming_batch_size=500,
warming_interval_seconds=900, # 15 minutes
compression_enabled=True,
compression_algorithm="lz4", # Faster compression
eviction_policy="cost_based",
eviction_threshold=0.95,
distributed_enabled=True,
replication_factor=2,
consistency_level="eventual", # Faster consistency
metrics_enabled=True,
detailed_logging=False,
performance_tracking=True
)
Memory-Constrained Environment
# Memory-constrained configuration - minimal memory usage
memory_config = CacheConfig(
max_size_bytes=512 * 1024 * 1024, # 512MB
max_entries=5000,
default_ttl_seconds=1800, # 30 minutes
semantic_threshold=0.8,
max_semantic_candidates=3,
warming_enabled=True,
warming_batch_size=50,
compression_enabled=True,
compression_threshold_bytes=512, # Compress smaller items
eviction_policy="lru",
eviction_threshold=0.7, # Evict earlier
cleanup_interval_seconds=60, # More frequent cleanup
metrics_enabled=True,
detailed_logging=False
)
Best Practices
1. Size Your Cache Appropriately
- Start with 1-2GB for most applications
- Monitor memory usage and adjust accordingly
- Consider your data patterns and access frequency
2. Tune Semantic Thresholds
- Higher thresholds (0.9+) for critical applications
- Lower thresholds (0.7-0.8) for broader matching
- Test with your specific data to find optimal values
3. Configure Warming Strategically
- Enable warming for production environments
- Adjust batch sizes based on your load patterns
- Monitor warming effectiveness and adjust intervals
4. Optimize Eviction Policies
- Use "hybrid" for balanced performance
- Use "cost_based" for compute-intensive applications
- Use "lru" for simple, predictable workloads
5. Monitor and Adjust
- Enable comprehensive monitoring
- Set up alerts for performance issues
- Regularly review and adjust configuration
6. Test Configuration Changes
- Test in development environment first
- Use gradual rollout for production changes
- Monitor performance impact of changes
This configuration guide provides comprehensive coverage of all caching system options. Choose the settings that best match your application's requirements and performance goals.