Skip to main content

Recommendation Bandits

Multi-armed bandit algorithms for adaptive and efficient recommendation systems.

Overview

The recommendation bandits system provides various multi-armed bandit algorithms for adaptive recommendation strategies that balance exploration and exploitation.

Core Features

  • Multiple Algorithms: Thompson Sampling, UCB, Epsilon-Greedy
  • Adaptive Learning: Dynamic strategy adjustment
  • Exploration vs Exploitation: Balanced exploration strategies
  • Contextual Bandits: Context-aware recommendations
  • Performance Optimization: Efficient bandit implementations

Usage Examples

Basic Bandit Algorithm

from recoagent.recommendations.bandits import ThompsonSamplingBandit

# Create Thompson Sampling bandit
bandit = ThompsonSamplingBandit(
n_arms=10,
alpha=1.0,
beta=1.0
)

# Select arm (recommendation)
arm = bandit.select_arm(context={"user_id": "user_123"})

# Update with reward
bandit.update(arm, reward=0.8)

Advanced Contextual Bandit

from recoagent.recommendations.bandits import ContextualBandit

# Create contextual bandit
contextual_bandit = ContextualBandit(
algorithm="linucb",
context_dim=50,
exploration_parameter=0.1
)

# Select arm with context
arm = contextual_bandit.select_arm(
context={
"user_features": [0.1, 0.5, 0.3],
"item_features": [0.2, 0.4, 0.6]
}
)

# Update with reward and context
contextual_bandit.update(arm, reward=0.9, context=context)

API Reference

ThompsonSamplingBandit Methods

select_arm(context: Dict = None) -> int

Select arm using Thompson Sampling

Parameters:

  • context (Dict, optional): Context information

Returns: Selected arm index

update(arm: int, reward: float) -> None

Update bandit with reward

Parameters:

  • arm (int): Arm index
  • reward (float): Reward value

See Also