Skip to main content

Analytics System Documentation

Overview

The RecoAgent Analytics System provides comprehensive user behavior analytics, query analysis, and optimization insights for enterprise RAG applications. It offers real-time tracking, detailed reporting, and privacy-compliant data collection.

Features

Core Analytics

  • Query Analytics: Track popular questions, success rates, and user satisfaction
  • User Journey Analysis: Monitor information-seeking patterns and user behavior
  • Performance Analytics: Measure response times and retrieval quality
  • User Segmentation: Identify different user types and needs
  • Feedback Analysis: Analyze user feedback and sentiment
  • Predictive Analytics: Forecast capacity needs and feature priorities

Privacy & Compliance

  • GDPR Compliance: Full data protection and user rights support
  • CCPA Compliance: California Consumer Privacy Act compliance
  • Data Anonymization: Automatic IP and personal data anonymization
  • Consent Management: User consent tracking and management
  • Data Retention: Configurable data retention policies
  • Right to be Forgotten: Complete user data deletion

Reporting & Visualization

  • Interactive Dashboards: Real-time analytics dashboards
  • Automated Reports: Scheduled reports for different stakeholders
  • A/B Testing: Framework for system improvements
  • Export Capabilities: Data export in multiple formats

Quick Start

Installation

pip install -e packages/analytics

Basic Usage

from packages.analytics.integration import RecoAgentAnalytics

# Initialize analytics
config = {
'database_url': 'sqlite:///analytics.db',
'redis_url': 'redis://localhost:6379/0',
'enable_privacy_mode': True,
'gdpr_compliant': True
}

analytics = RecoAgentAnalytics(config)

# Track a query
await analytics.track_query(
user_id="user-123",
session_id="session-456",
query_text="What is machine learning?",
response_text="Machine learning is a subset of artificial intelligence...",
response_time_ms=1500,
success=True,
satisfaction_score=4.5
)

# Get analytics insights
insights = await analytics.get_analytics_insights(days=30)

Architecture

Core Components

  1. Analytics Engine (core.py): Central data collection and storage
  2. Query Analytics (query_analytics.py): Query pattern analysis
  3. User Journey (user_journey.py): User behavior tracking
  4. Performance Analytics (performance.py): System performance monitoring
  5. User Segmentation (segmentation.py): User classification and clustering
  6. Feedback Analysis (feedback.py): Sentiment and feedback analysis
  7. Predictive Analytics (predictive.py): Forecasting and predictions
  8. Dashboard (dashboard.py): Interactive visualizations
  9. Reporting (reporting.py): Automated report generation
  10. Privacy Compliance (privacy.py): Data protection and compliance

Data Flow

User Query → Event Tracking → Data Collection → Analytics Processing → Insights & Reports

Privacy Filtering → Anonymization → Storage → Retention Management

Configuration

Analytics Configuration

from packages.analytics.core import AnalyticsConfig

config = AnalyticsConfig(
database_url="postgresql://user:pass@localhost/analytics",
redis_url="redis://localhost:6379/0",
enable_privacy_mode=True,
data_retention_days=365,
batch_size=1000,
flush_interval_seconds=60,
enable_real_time=True,
anonymize_ips=True,
track_user_agents=True
)

Privacy Configuration

from packages.analytics.privacy import PrivacyConfig, PrivacyLevel, DataRetentionPolicy

privacy_config = PrivacyConfig(
privacy_level=PrivacyLevel.STANDARD,
data_retention_policy=DataRetentionPolicy.STANDARD,
anonymize_ips=True,
anonymize_user_agents=True,
hash_personal_data=True,
enable_consent_management=True,
gdpr_compliant=True,
ccpa_compliant=True,
data_retention_days=365,
consent_required=True,
allow_data_export=True,
allow_data_deletion=True
)

API Reference

Core Analytics

track_query()

Track a user query with optional response data.

event_id = await analytics.track_query(
user_id="user-123",
session_id="session-456",
query_text="What is AI?",
response_text="AI is artificial intelligence...",
response_time_ms=1200,
success=True,
satisfaction_score=4.0,
user_segment="power_user"
)

track_user_feedback()

Track user feedback for queries.

feedback_id = await analytics.track_user_feedback(
user_id="user-123",
session_id="session-456",
query_id="query-789",
feedback_type="rating",
rating=4,
comment="Very helpful response!"
)

track_user_journey_event()

Track user journey events.

journey_id = await analytics.track_user_journey_event(
user_id="user-123",
session_id="session-456",
event_type="document_view",
page_url="/documents/ml-guide",
time_spent_ms=30000
)

Analytics Insights

get_analytics_insights()

Get comprehensive analytics insights.

insights = await analytics.get_analytics_insights(days=30)

# Access specific insights
query_insights = insights['query_analytics']
user_journey = insights['user_journey']
performance = insights['performance']

get_dashboard_data()

Get dashboard data for visualization.

# Overview dashboard
overview = await analytics.get_dashboard_data('overview', days=30)

# Specific analytics dashboard
query_dashboard = await analytics.get_dashboard_data('query_analytics', days=30)

Reporting

generate_report()

Generate analytics reports.

# Executive summary report
report = await analytics.generate_report(
report_type='executive_summary',
stakeholder_type='executive',
days=30,
format='html'
)

Privacy & Compliance

Record user consent for data processing.

consent_id = await analytics.record_user_consent("user-123", {
'status': 'granted',
'version': '1.0',
'data_categories': ['analytics', 'performance'],
'purposes': ['analytics', 'improvement'],
'retention_days': 365
})

export_user_data()

Export all user data (GDPR compliance).

user_data = await analytics.export_user_data("user-123")

delete_user_data()

Delete all user data (GDPR compliance).

success = await analytics.delete_user_data("user-123")

Dashboard Usage

Overview Dashboard

The overview dashboard provides key metrics and visualizations:

  • Query volume trends
  • Success rate analysis
  • User segment distribution
  • Response time distribution
  • Top queries table

Detailed Dashboards

Query Analytics Dashboard

  • Query intent distribution
  • Query complexity analysis
  • Success rate by segment
  • Popular query patterns

User Journey Dashboard

  • Journey stage distribution
  • Behavior pattern analysis
  • Session duration trends
  • User engagement metrics

Performance Dashboard

  • Response time trends
  • System resource usage
  • Throughput analysis
  • Performance alerts

Segmentation Dashboard

  • User type distribution
  • Engagement level analysis
  • Behavior clustering
  • Segment-specific metrics

Feedback Dashboard

  • Sentiment distribution
  • Feedback categorization
  • Rating trends
  • Common themes

Predictive Dashboard

  • Capacity predictions
  • User growth forecasts
  • Feature priority rankings
  • Performance projections

A/B Testing

Creating A/B Tests

from packages.analytics.ab_testing import ABTestingFramework, ABTest, TestVariant, TestType

# Initialize A/B testing framework
ab_framework = ABTestingFramework(analytics_engine)

# Create test configuration
test = ABTest(
test_id="query-optimization-test",
name="Query Optimization A/B Test",
description="Test different query processing algorithms",
test_type=TestType.QUERY_OPTIMIZATION,
variants=[
TestVariant(
variant_id="control",
variant_type=VariantType.CONTROL,
name="Current Algorithm",
description="Current query processing algorithm",
configuration={"algorithm": "current"},
traffic_percentage=50.0,
is_control=True
),
TestVariant(
variant_id="treatment",
variant_type=VariantType.TREATMENT,
name="New Algorithm",
description="Improved query processing algorithm",
configuration={"algorithm": "new"},
traffic_percentage=50.0,
is_control=False
)
],
target_metrics=["response_time", "success_rate", "satisfaction_score"],
success_criteria={"response_time": 0.1, "success_rate": 0.05},
min_sample_size=1000,
confidence_level=0.95
)

# Create and start test
await ab_framework.create_test(test)
await ab_framework.start_test("query-optimization-test")

Assigning Users to Variants

# Assign user to variant
variant = await ab_framework.assign_user_to_variant(
test_id="query-optimization-test",
user_id="user-123",
session_id="session-456"
)

# Track test results
await ab_framework.track_test_result(
test_id="query-optimization-test",
variant_id=variant.variant_id,
user_id="user-123",
session_id="session-456",
metric_name="response_time",
metric_value=1200.0
)

Privacy Compliance

GDPR Compliance

The analytics system is fully GDPR compliant with:

  • Consent Management: Track and manage user consent
  • Data Minimization: Collect only necessary data
  • Purpose Limitation: Use data only for specified purposes
  • Storage Limitation: Automatic data retention management
  • Right to Access: Export user data on request
  • Right to Rectification: Update user data
  • Right to Erasure: Delete user data (right to be forgotten)
  • Data Portability: Export data in machine-readable format

CCPA Compliance

The system also supports CCPA compliance with:

  • Right to Know: Transparent data collection practices
  • Right to Delete: User data deletion
  • Right to Opt-Out: Opt-out of data collection
  • Non-Discrimination: Equal service regardless of privacy choices

Data Anonymization

Automatic anonymization of sensitive data:

  • IP addresses (remove last octet)
  • User agent strings (generalize browser info)
  • Personal data hashing
  • Query text hashing (optional)

Performance Considerations

Data Collection Performance

  • Batch Processing: Events are batched for efficient database writes
  • Asynchronous Processing: Non-blocking event tracking
  • Redis Caching: Fast data access and session management
  • Connection Pooling: Efficient database connections

Storage Optimization

  • Data Compression: Compress stored data where possible
  • Indexing: Optimized database indexes for fast queries
  • Partitioning: Time-based data partitioning
  • Cleanup: Automatic old data cleanup

Query Performance

  • Materialized Views: Pre-computed analytics views
  • Caching: Redis caching for frequently accessed data
  • Query Optimization: Optimized SQL queries
  • Pagination: Large result set pagination

Monitoring & Alerting

System Metrics

  • Query volume trends
  • Response time monitoring
  • Error rate tracking
  • Resource usage monitoring

Alert Conditions

  • High error rates
  • Slow response times
  • Low success rates
  • Resource exhaustion
  • Data quality issues

Health Checks

  • Database connectivity
  • Redis connectivity
  • Data freshness
  • System performance

Troubleshooting

Common Issues

  1. Database Connection Errors

    • Check database URL configuration
    • Verify database server is running
    • Check network connectivity
  2. Redis Connection Errors

    • Check Redis URL configuration
    • Verify Redis server is running
    • Check Redis authentication
  3. Performance Issues

    • Check database indexes
    • Monitor query performance
    • Consider data partitioning
  4. Privacy Compliance Issues

    • Verify consent management setup
    • Check data anonymization settings
    • Review retention policies

Debug Mode

Enable debug logging:

import logging
logging.getLogger('packages.analytics').setLevel(logging.DEBUG)

Health Check

# Check system health
health = await analytics.get_system_health()
print(health)

Contributing

Development Setup

  1. Clone the repository
  2. Install development dependencies:
    pip install -e packages/analytics[dev]
  3. Run tests:
    pytest packages/analytics/tests/

Code Style

  • Follow PEP 8 guidelines
  • Use type hints
  • Write comprehensive docstrings
  • Add unit tests for new features

Testing

  • Unit tests for all modules
  • Integration tests for workflows
  • Privacy compliance tests
  • Performance tests

License

This analytics system is part of the RecoAgent project and is licensed under the MIT License.

Support

For support and questions: