Skip to main content

Cache Configuration Guide

This guide covers all configuration options for the Intelligent Caching System, including cache layers, semantic matching, warming strategies, and performance tuning.

Table of Contents

Basic Configuration

CacheConfig

The main configuration class that controls all caching behavior:

from packages.caching import CacheConfig

config = CacheConfig(
# General settings
max_size_bytes=1024 * 1024 * 1024, # 1GB
max_entries=10000,
default_ttl_seconds=3600, # 1 hour
cleanup_interval_seconds=300, # 5 minutes

# Semantic matching
semantic_threshold=0.85,
max_semantic_candidates=10,
embedding_dimension=1536,

# Warming settings
warming_enabled=True,
warming_batch_size=100,
warming_interval_seconds=1800, # 30 minutes
pattern_analysis_window_hours=24,

# Compression
compression_enabled=True,
compression_threshold_bytes=1024, # 1KB
compression_algorithm="gzip",

# Eviction
eviction_policy="lru", # lru, lfu, ttl, hybrid
eviction_threshold=0.8, # Start evicting at 80% capacity

# Distributed caching
distributed_enabled=False,
cluster_nodes=[],
replication_factor=2,
consistency_level="eventual", # eventual, strong

# Monitoring
metrics_enabled=True,
detailed_logging=False,
performance_tracking=True
)

Configuration Parameters

ParameterTypeDefaultDescription
max_size_bytesint1GBMaximum memory usage for cache
max_entriesint10000Maximum number of cache entries
default_ttl_secondsint3600Default time-to-live for entries
cleanup_interval_secondsint300Background cleanup interval
semantic_thresholdfloat0.85Minimum similarity for semantic matches
max_semantic_candidatesint10Maximum candidates for semantic search
embedding_dimensionint1536Expected embedding dimension
warming_enabledboolTrueEnable cache warming
warming_batch_sizeint100Batch size for warming operations
warming_interval_secondsint1800Warming operation interval
pattern_analysis_window_hoursint24Time window for pattern analysis
compression_enabledboolTrueEnable data compression
compression_threshold_bytesint1024Minimum size for compression
compression_algorithmstr"gzip"Compression algorithm
eviction_policystr"lru"Cache eviction policy
eviction_thresholdfloat0.8Eviction trigger threshold
distributed_enabledboolFalseEnable distributed caching
cluster_nodesList[str][]List of cluster node addresses
replication_factorint2Number of replicas per entry
consistency_levelstr"eventual"Consistency level for distributed cache
metrics_enabledboolTrueEnable metrics collection
detailed_loggingboolFalseEnable detailed logging
performance_trackingboolTrueEnable performance tracking

Cache Layer Settings

Embedding Cache Configuration

from packages.caching import EmbeddingCache

# Create embedding cache with custom settings
embedding_cache = EmbeddingCache(
cache_manager=cache_manager,
config=CacheConfig(
# Embedding-specific settings
default_ttl_seconds=7200, # 2 hours for embeddings
semantic_threshold=0.9, # Higher threshold for embeddings
max_semantic_candidates=5 # Fewer candidates for embeddings
)
)

Search Result Cache Configuration

from packages.caching import SearchResultCache

# Create search result cache
search_cache = SearchResultCache(
cache_manager=cache_manager,
config=CacheConfig(
# Search-specific settings
default_ttl_seconds=1800, # 30 minutes for search results
semantic_threshold=0.8, # Moderate threshold for search
max_semantic_candidates=10
)
)

LLM Response Cache Configuration

from packages.caching import LLMResponseCache

# Create LLM response cache
llm_cache = LLMResponseCache(
cache_manager=cache_manager,
config=CacheConfig(
# LLM-specific settings
default_ttl_seconds=3600, # 1 hour for LLM responses
semantic_threshold=0.85, # High threshold for LLM responses
max_semantic_candidates=3 # Fewer candidates for LLM responses
)
)

Semantic Matching Configuration

Similarity Metrics

from packages.caching.semantic import SimilarityMetric, EmbeddingSimilarity

# Configure similarity metric
similarity = EmbeddingSimilarity(
metric=SimilarityMetric.COSINE # or EUCLIDEAN, DOT_PRODUCT, MANHATTAN
)

# Configure semantic matcher
from packages.caching.semantic import SemanticMatcher

matcher = SemanticMatcher(
similarity_threshold=0.85,
max_candidates=10,
metric=SimilarityMetric.COSINE
)

Query Similarity Settings

from packages.caching.semantic import QuerySimilarity

query_similarity = QuerySimilarity(
similarity_threshold=0.8 # Threshold for query similarity
)

Cache Key Generation

from packages.caching.semantic import SemanticCacheKey

# Generate keys for different content types
embedding_key = SemanticCacheKey.generate_embedding_key(
text="What is machine learning?",
model_name="text-embedding-ada-002"
)

query_key = SemanticCacheKey.generate_query_key(
query="What is ML?",
context={"user_id": "123", "session": "abc"}
)

search_key = SemanticCacheKey.generate_search_key(
query="machine learning",
filters={"category": "AI", "year": 2023},
limit=10
)

llm_key = SemanticCacheKey.generate_llm_key(
prompt="Explain machine learning",
model="gpt-4",
temperature=0.1,
max_tokens=1000
)

Cache Warming Settings

Warming Strategy Configuration

from packages.caching.warming import WarmingStrategy

# Create warming strategy
strategy = WarmingStrategy(
enabled=True,
batch_size=100,
interval_seconds=1800, # 30 minutes
max_concurrent=10,
priority_threshold=0.7,
time_window_hours=24
)

# Add strategy to warmer
from packages.caching.warming import PredictiveWarmer

warmer = PredictiveWarmer(config)
warmer.add_warming_strategy("default", strategy)

Pattern Analysis Configuration

from packages.caching.warming import QueryPatternAnalyzer

analyzer = QueryPatternAnalyzer(config)

# Configure pattern analysis
analysis = analyzer.analyze_patterns(
time_window_hours=24,
min_frequency=2
)

User Behavior Analysis

from packages.caching.warming import UserBehaviorAnalyzer

behavior_analyzer = UserBehaviorAnalyzer(config)

# Track user actions
behavior_analyzer.track_user_action(
user_id="user123",
action="query",
context={"query": "What is ML?", "source": "web"}
)

# Analyze behavior
behavior = behavior_analyzer.analyze_user_behavior("user123")

Distributed Caching

Cluster Configuration

from packages.caching.distributed import CacheNode, CacheCluster

# Create cluster nodes
nodes = [
CacheNode(
node_id="node1",
host="cache1.example.com",
port=6379
),
CacheNode(
node_id="node2",
host="cache2.example.com",
port=6379
),
CacheNode(
node_id="node3",
host="cache3.example.com",
port=6379
)
]

# Create cluster
cluster = CacheCluster(config)
for node in nodes:
await cluster.add_node(node)

# Start health monitoring
await cluster.start_health_checks()

Replication Settings

from packages.caching.distributed import CacheReplication

replication = CacheReplication(config)

# Configure replication
config.replication_factor = 3
config.consistency_level = "strong" # or "eventual"

# Start replication
await replication.start()

Distributed Cache Backend

from packages.caching.distributed import DistributedCache

distributed_cache = DistributedCache(config)
await distributed_cache.initialize()

# Get backend for specific layer
backend = distributed_cache.get_backend(CacheLayer.EMBEDDING)

Performance Optimization

Memory Optimization

from packages.caching.optimization import MemoryOptimizer, OptimizationConfig

opt_config = OptimizationConfig(
memory_limit_bytes=2 * 1024 * 1024 * 1024, # 2GB
memory_cleanup_interval=300, # 5 minutes
memory_pressure_threshold=0.8 # 80%
)

memory_optimizer = MemoryOptimizer(opt_config)

Compression Settings

from packages.caching.optimization import CompressionEngine, CompressionAlgorithm

compression_engine = CompressionEngine(OptimizationConfig(
compression_enabled=True,
compression_threshold_bytes=1024,
compression_algorithm=CompressionAlgorithm.GZIP,
compression_level=6
))

Eviction Policies

from packages.caching.optimization import EvictionPolicy, EvictionPolicyManager

eviction_manager = EvictionPolicyManager(OptimizationConfig(
eviction_policy=EvictionPolicy.HYBRID, # or LRU, LFU, TTL, SIZE_BASED, COST_BASED
eviction_threshold=0.9,
eviction_batch_size=100
))

Cache Optimizer

from packages.caching.optimization import CacheOptimizer

optimizer = CacheOptimizer(OptimizationConfig(
compression_enabled=True,
memory_limit_bytes=1024 * 1024 * 1024,
eviction_policy=EvictionPolicy.HYBRID
))

await optimizer.start()

Monitoring Configuration

Cache Monitor

from packages.caching.monitoring import CacheMonitor

monitor = CacheMonitor(config)
await monitor.start()

# Record cache operations
await monitor.record_request(
layer=CacheLayer.EMBEDDING,
is_hit=True,
response_time_ms=50.0
)

Analytics Configuration

from packages.caching.monitoring import CacheAnalytics

analytics = CacheAnalytics(monitor)
await analytics.start()

# Get insights
insights = analytics.get_insights()
performance_summary = analytics.get_performance_summary()

Dashboard Setup

from packages.caching.monitoring import CacheDashboard

dashboard = CacheDashboard(monitor, analytics)
await dashboard.start()

# Get dashboard data
dashboard_data = dashboard.get_dashboard_data()
layer_data = dashboard.get_layer_dashboard(CacheLayer.EMBEDDING)

Environment Variables

Basic Settings

# Cache size and limits
CACHE_MAX_SIZE_BYTES=1073741824 # 1GB
CACHE_MAX_ENTRIES=10000
CACHE_DEFAULT_TTL_SECONDS=3600

# Semantic matching
CACHE_SEMANTIC_THRESHOLD=0.85
CACHE_MAX_SEMANTIC_CANDIDATES=10
CACHE_EMBEDDING_DIMENSION=1536

# Warming
CACHE_WARMING_ENABLED=true
CACHE_WARMING_BATCH_SIZE=100
CACHE_WARMING_INTERVAL_SECONDS=1800

# Compression
CACHE_COMPRESSION_ENABLED=true
CACHE_COMPRESSION_THRESHOLD_BYTES=1024
CACHE_COMPRESSION_ALGORITHM=gzip

# Eviction
CACHE_EVICTION_POLICY=lru
CACHE_EVICTION_THRESHOLD=0.8

# Distributed caching
CACHE_DISTRIBUTED_ENABLED=false
CACHE_CLUSTER_NODES=node1:6379,node2:6379,node3:6379
CACHE_REPLICATION_FACTOR=2
CACHE_CONSISTENCY_LEVEL=eventual

# Monitoring
CACHE_METRICS_ENABLED=true
CACHE_DETAILED_LOGGING=false
CACHE_PERFORMANCE_TRACKING=true

Advanced Settings

# Memory optimization
CACHE_MEMORY_LIMIT_BYTES=2147483648 # 2GB
CACHE_MEMORY_CLEANUP_INTERVAL=300
CACHE_MEMORY_PRESSURE_THRESHOLD=0.8

# Performance tuning
CACHE_MAX_CONCURRENT_OPERATIONS=100
CACHE_OPERATION_TIMEOUT_SECONDS=30
CACHE_BATCH_PROCESSING_SIZE=50

# Pattern analysis
CACHE_PATTERN_ANALYSIS_WINDOW_HOURS=24
CACHE_MIN_FREQUENCY_THRESHOLD=2

# Health checks
CACHE_HEALTH_CHECK_INTERVAL=30
CACHE_NODE_TIMEOUT_SECONDS=30

Configuration Examples

Development Environment

# Development configuration - smaller, faster
dev_config = CacheConfig(
max_size_bytes=256 * 1024 * 1024, # 256MB
max_entries=1000,
default_ttl_seconds=1800, # 30 minutes
semantic_threshold=0.8,
warming_enabled=False, # Disable warming in dev
compression_enabled=False, # Disable compression in dev
detailed_logging=True
)

Production Environment

# Production configuration - optimized for performance
prod_config = CacheConfig(
max_size_bytes=4 * 1024 * 1024 * 1024, # 4GB
max_entries=50000,
default_ttl_seconds=7200, # 2 hours
semantic_threshold=0.85,
warming_enabled=True,
warming_batch_size=200,
compression_enabled=True,
eviction_policy="hybrid",
distributed_enabled=True,
cluster_nodes=[
{"host": "cache1.prod.com", "port": 6379},
{"host": "cache2.prod.com", "port": 6379},
{"host": "cache3.prod.com", "port": 6379}
],
replication_factor=3,
consistency_level="strong",
metrics_enabled=True,
performance_tracking=True
)

High-Performance Environment

# High-performance configuration - maximum speed
perf_config = CacheConfig(
max_size_bytes=8 * 1024 * 1024 * 1024, # 8GB
max_entries=100000,
default_ttl_seconds=14400, # 4 hours
semantic_threshold=0.9,
max_semantic_candidates=5,
warming_enabled=True,
warming_batch_size=500,
warming_interval_seconds=900, # 15 minutes
compression_enabled=True,
compression_algorithm="lz4", # Faster compression
eviction_policy="cost_based",
eviction_threshold=0.95,
distributed_enabled=True,
replication_factor=2,
consistency_level="eventual", # Faster consistency
metrics_enabled=True,
detailed_logging=False,
performance_tracking=True
)

Memory-Constrained Environment

# Memory-constrained configuration - minimal memory usage
memory_config = CacheConfig(
max_size_bytes=512 * 1024 * 1024, # 512MB
max_entries=5000,
default_ttl_seconds=1800, # 30 minutes
semantic_threshold=0.8,
max_semantic_candidates=3,
warming_enabled=True,
warming_batch_size=50,
compression_enabled=True,
compression_threshold_bytes=512, # Compress smaller items
eviction_policy="lru",
eviction_threshold=0.7, # Evict earlier
cleanup_interval_seconds=60, # More frequent cleanup
metrics_enabled=True,
detailed_logging=False
)

Best Practices

1. Size Your Cache Appropriately

  • Start with 1-2GB for most applications
  • Monitor memory usage and adjust accordingly
  • Consider your data patterns and access frequency

2. Tune Semantic Thresholds

  • Higher thresholds (0.9+) for critical applications
  • Lower thresholds (0.7-0.8) for broader matching
  • Test with your specific data to find optimal values

3. Configure Warming Strategically

  • Enable warming for production environments
  • Adjust batch sizes based on your load patterns
  • Monitor warming effectiveness and adjust intervals

4. Optimize Eviction Policies

  • Use "hybrid" for balanced performance
  • Use "cost_based" for compute-intensive applications
  • Use "lru" for simple, predictable workloads

5. Monitor and Adjust

  • Enable comprehensive monitoring
  • Set up alerts for performance issues
  • Regularly review and adjust configuration

6. Test Configuration Changes

  • Test in development environment first
  • Use gradual rollout for production changes
  • Monitor performance impact of changes

This configuration guide provides comprehensive coverage of all caching system options. Choose the settings that best match your application's requirements and performance goals.