Analytics System Documentation
Overview
The RecoAgent Analytics System provides comprehensive user behavior analytics, query analysis, and optimization insights for enterprise RAG applications. It offers real-time tracking, detailed reporting, and privacy-compliant data collection.
Features
Core Analytics
- Query Analytics: Track popular questions, success rates, and user satisfaction
- User Journey Analysis: Monitor information-seeking patterns and user behavior
- Performance Analytics: Measure response times and retrieval quality
- User Segmentation: Identify different user types and needs
- Feedback Analysis: Analyze user feedback and sentiment
- Predictive Analytics: Forecast capacity needs and feature priorities
Privacy & Compliance
- GDPR Compliance: Full data protection and user rights support
- CCPA Compliance: California Consumer Privacy Act compliance
- Data Anonymization: Automatic IP and personal data anonymization
- Consent Management: User consent tracking and management
- Data Retention: Configurable data retention policies
- Right to be Forgotten: Complete user data deletion
Reporting & Visualization
- Interactive Dashboards: Real-time analytics dashboards
- Automated Reports: Scheduled reports for different stakeholders
- A/B Testing: Framework for system improvements
- Export Capabilities: Data export in multiple formats
Quick Start
Installation
pip install -e packages/analytics
Basic Usage
from packages.analytics.integration import RecoAgentAnalytics
# Initialize analytics
config = {
'database_url': 'sqlite:///analytics.db',
'redis_url': 'redis://localhost:6379/0',
'enable_privacy_mode': True,
'gdpr_compliant': True
}
analytics = RecoAgentAnalytics(config)
# Track a query
await analytics.track_query(
user_id="user-123",
session_id="session-456",
query_text="What is machine learning?",
response_text="Machine learning is a subset of artificial intelligence...",
response_time_ms=1500,
success=True,
satisfaction_score=4.5
)
# Get analytics insights
insights = await analytics.get_analytics_insights(days=30)
Architecture
Core Components
- Analytics Engine (
core.py): Central data collection and storage - Query Analytics (
query_analytics.py): Query pattern analysis - User Journey (
user_journey.py): User behavior tracking - Performance Analytics (
performance.py): System performance monitoring - User Segmentation (
segmentation.py): User classification and clustering - Feedback Analysis (
feedback.py): Sentiment and feedback analysis - Predictive Analytics (
predictive.py): Forecasting and predictions - Dashboard (
dashboard.py): Interactive visualizations - Reporting (
reporting.py): Automated report generation - Privacy Compliance (
privacy.py): Data protection and compliance
Data Flow
User Query → Event Tracking → Data Collection → Analytics Processing → Insights & Reports
↓
Privacy Filtering → Anonymization → Storage → Retention Management
Configuration
Analytics Configuration
from packages.analytics.core import AnalyticsConfig
config = AnalyticsConfig(
database_url="postgresql://user:pass@localhost/analytics",
redis_url="redis://localhost:6379/0",
enable_privacy_mode=True,
data_retention_days=365,
batch_size=1000,
flush_interval_seconds=60,
enable_real_time=True,
anonymize_ips=True,
track_user_agents=True
)
Privacy Configuration
from packages.analytics.privacy import PrivacyConfig, PrivacyLevel, DataRetentionPolicy
privacy_config = PrivacyConfig(
privacy_level=PrivacyLevel.STANDARD,
data_retention_policy=DataRetentionPolicy.STANDARD,
anonymize_ips=True,
anonymize_user_agents=True,
hash_personal_data=True,
enable_consent_management=True,
gdpr_compliant=True,
ccpa_compliant=True,
data_retention_days=365,
consent_required=True,
allow_data_export=True,
allow_data_deletion=True
)
API Reference
Core Analytics
track_query()
Track a user query with optional response data.
event_id = await analytics.track_query(
user_id="user-123",
session_id="session-456",
query_text="What is AI?",
response_text="AI is artificial intelligence...",
response_time_ms=1200,
success=True,
satisfaction_score=4.0,
user_segment="power_user"
)
track_user_feedback()
Track user feedback for queries.
feedback_id = await analytics.track_user_feedback(
user_id="user-123",
session_id="session-456",
query_id="query-789",
feedback_type="rating",
rating=4,
comment="Very helpful response!"
)
track_user_journey_event()
Track user journey events.
journey_id = await analytics.track_user_journey_event(
user_id="user-123",
session_id="session-456",
event_type="document_view",
page_url="/documents/ml-guide",
time_spent_ms=30000
)
Analytics Insights
get_analytics_insights()
Get comprehensive analytics insights.
insights = await analytics.get_analytics_insights(days=30)
# Access specific insights
query_insights = insights['query_analytics']
user_journey = insights['user_journey']
performance = insights['performance']
get_dashboard_data()
Get dashboard data for visualization.
# Overview dashboard
overview = await analytics.get_dashboard_data('overview', days=30)
# Specific analytics dashboard
query_dashboard = await analytics.get_dashboard_data('query_analytics', days=30)
Reporting
generate_report()
Generate analytics reports.
# Executive summary report
report = await analytics.generate_report(
report_type='executive_summary',
stakeholder_type='executive',
days=30,
format='html'
)
Privacy & Compliance
record_user_consent()
Record user consent for data processing.
consent_id = await analytics.record_user_consent("user-123", {
'status': 'granted',
'version': '1.0',
'data_categories': ['analytics', 'performance'],
'purposes': ['analytics', 'improvement'],
'retention_days': 365
})
export_user_data()
Export all user data (GDPR compliance).
user_data = await analytics.export_user_data("user-123")
delete_user_data()
Delete all user data (GDPR compliance).
success = await analytics.delete_user_data("user-123")
Dashboard Usage
Overview Dashboard
The overview dashboard provides key metrics and visualizations:
- Query volume trends
- Success rate analysis
- User segment distribution
- Response time distribution
- Top queries table
Detailed Dashboards
Query Analytics Dashboard
- Query intent distribution
- Query complexity analysis
- Success rate by segment
- Popular query patterns
User Journey Dashboard
- Journey stage distribution
- Behavior pattern analysis
- Session duration trends
- User engagement metrics
Performance Dashboard
- Response time trends
- System resource usage
- Throughput analysis
- Performance alerts
Segmentation Dashboard
- User type distribution
- Engagement level analysis
- Behavior clustering
- Segment-specific metrics
Feedback Dashboard
- Sentiment distribution
- Feedback categorization
- Rating trends
- Common themes
Predictive Dashboard
- Capacity predictions
- User growth forecasts
- Feature priority rankings
- Performance projections
A/B Testing
Creating A/B Tests
from packages.analytics.ab_testing import ABTestingFramework, ABTest, TestVariant, TestType
# Initialize A/B testing framework
ab_framework = ABTestingFramework(analytics_engine)
# Create test configuration
test = ABTest(
test_id="query-optimization-test",
name="Query Optimization A/B Test",
description="Test different query processing algorithms",
test_type=TestType.QUERY_OPTIMIZATION,
variants=[
TestVariant(
variant_id="control",
variant_type=VariantType.CONTROL,
name="Current Algorithm",
description="Current query processing algorithm",
configuration={"algorithm": "current"},
traffic_percentage=50.0,
is_control=True
),
TestVariant(
variant_id="treatment",
variant_type=VariantType.TREATMENT,
name="New Algorithm",
description="Improved query processing algorithm",
configuration={"algorithm": "new"},
traffic_percentage=50.0,
is_control=False
)
],
target_metrics=["response_time", "success_rate", "satisfaction_score"],
success_criteria={"response_time": 0.1, "success_rate": 0.05},
min_sample_size=1000,
confidence_level=0.95
)
# Create and start test
await ab_framework.create_test(test)
await ab_framework.start_test("query-optimization-test")
Assigning Users to Variants
# Assign user to variant
variant = await ab_framework.assign_user_to_variant(
test_id="query-optimization-test",
user_id="user-123",
session_id="session-456"
)
# Track test results
await ab_framework.track_test_result(
test_id="query-optimization-test",
variant_id=variant.variant_id,
user_id="user-123",
session_id="session-456",
metric_name="response_time",
metric_value=1200.0
)
Privacy Compliance
GDPR Compliance
The analytics system is fully GDPR compliant with:
- Consent Management: Track and manage user consent
- Data Minimization: Collect only necessary data
- Purpose Limitation: Use data only for specified purposes
- Storage Limitation: Automatic data retention management
- Right to Access: Export user data on request
- Right to Rectification: Update user data
- Right to Erasure: Delete user data (right to be forgotten)
- Data Portability: Export data in machine-readable format
CCPA Compliance
The system also supports CCPA compliance with:
- Right to Know: Transparent data collection practices
- Right to Delete: User data deletion
- Right to Opt-Out: Opt-out of data collection
- Non-Discrimination: Equal service regardless of privacy choices
Data Anonymization
Automatic anonymization of sensitive data:
- IP addresses (remove last octet)
- User agent strings (generalize browser info)
- Personal data hashing
- Query text hashing (optional)
Performance Considerations
Data Collection Performance
- Batch Processing: Events are batched for efficient database writes
- Asynchronous Processing: Non-blocking event tracking
- Redis Caching: Fast data access and session management
- Connection Pooling: Efficient database connections
Storage Optimization
- Data Compression: Compress stored data where possible
- Indexing: Optimized database indexes for fast queries
- Partitioning: Time-based data partitioning
- Cleanup: Automatic old data cleanup
Query Performance
- Materialized Views: Pre-computed analytics views
- Caching: Redis caching for frequently accessed data
- Query Optimization: Optimized SQL queries
- Pagination: Large result set pagination
Monitoring & Alerting
System Metrics
- Query volume trends
- Response time monitoring
- Error rate tracking
- Resource usage monitoring
Alert Conditions
- High error rates
- Slow response times
- Low success rates
- Resource exhaustion
- Data quality issues
Health Checks
- Database connectivity
- Redis connectivity
- Data freshness
- System performance
Troubleshooting
Common Issues
-
Database Connection Errors
- Check database URL configuration
- Verify database server is running
- Check network connectivity
-
Redis Connection Errors
- Check Redis URL configuration
- Verify Redis server is running
- Check Redis authentication
-
Performance Issues
- Check database indexes
- Monitor query performance
- Consider data partitioning
-
Privacy Compliance Issues
- Verify consent management setup
- Check data anonymization settings
- Review retention policies
Debug Mode
Enable debug logging:
import logging
logging.getLogger('packages.analytics').setLevel(logging.DEBUG)
Health Check
# Check system health
health = await analytics.get_system_health()
print(health)
Contributing
Development Setup
- Clone the repository
- Install development dependencies:
pip install -e packages/analytics[dev] - Run tests:
pytest packages/analytics/tests/
Code Style
- Follow PEP 8 guidelines
- Use type hints
- Write comprehensive docstrings
- Add unit tests for new features
Testing
- Unit tests for all modules
- Integration tests for workflows
- Privacy compliance tests
- Performance tests
License
This analytics system is part of the RecoAgent project and is licensed under the MIT License.
Support
For support and questions:
- Documentation: docs.recoagent.com/analytics
- Issues: GitHub Issues
- Email: analytics@recoagent.com