Skip to main content

Integrate MongoDB Atlas Vector Search

Difficulty: ⭐⭐ Intermediate | Time: 1 hour

🎯 The Problem

You want to use MongoDB for both your application data AND vector search, avoiding the need for a separate vector database. You need unified storage but aren't sure how to enable and configure MongoDB's vector search capabilities.

This guide solves: Setting up MongoDB Atlas Vector Search so you can store documents and perform semantic search in a single database.

⚡ TL;DR - Quick Start

from packages.rag.stores import MongoDBAtlasStore

# 1. Connect to MongoDB Atlas
store = MongoDBAtlasStore(
connection_string="mongodb+srv://user:pass@cluster.mongodb.net",
database_name="recoagent",
collection_name="documents",
index_name="vector_index"
)

# 2. Add documents
store.add_document("doc_1", "Your content", metadata={"type": "guide"})

# 3. Search
results = store.search("your query", k=5)
print(f"✅ MongoDB vector search working! Found {len(results)} docs")

Expected: Documents indexed and searchable in MongoDB Atlas!


Full Guide

This guide walks you through integrating MongoDB Atlas Vector Search with RecoAgent for unified document and vector storage.

Prerequisites

  • MongoDB Atlas cluster with Vector Search enabled
  • Python 3.8+
  • RecoAgent installed
  • MongoDB Atlas credentials

Step 1: Set Up MongoDB Atlas

Create MongoDB Atlas Cluster

  1. Sign up for MongoDB Atlas

  2. Create a New Cluster

    • Click "Build a Database"
    • Choose "M0 Sandbox" (free tier) or higher
    • Select your preferred cloud provider and region
    • Click "Create Cluster"
  3. Enable Vector Search

    • In your cluster, go to "Search" tab
    • Click "Create Search Index"
    • Choose "Vector Search" option
    • Configure your vector search index

Configure Database Access

  1. Create Database User

    • Go to "Database Access" in the left sidebar
    • Click "Add New Database User"
    • Choose "Password" authentication
    • Create a strong password
    • Assign "Atlas Admin" role
    • Click "Add User"
  2. Configure Network Access

    • Go to "Network Access" in the left sidebar
    • Click "Add IP Address"
    • Choose "Allow Access from Anywhere" (0.0.0.0/0) for development
    • For production, add specific IP addresses
    • Click "Confirm"

Get Connection String

  1. Get Connection URI
    • Go to "Database" in the left sidebar
    • Click "Connect" on your cluster
    • Choose "Connect your application"
    • Select "Python" and version "3.6 or later"
    • Copy the connection string
    • Replace <password> with your database user password

Step 2: Install Dependencies

# Install MongoDB dependencies
pip install pymongo motor

# Or install all RecoAgent dependencies
pip install -r requirements.txt

Step 3: Configure Environment Variables

Create a .env file or set environment variables:

# MongoDB Atlas Configuration
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
MONGODB_DATABASE=recoagent
MONGODB_COLLECTION=documents
MONGODB_VECTOR_SEARCH_INDEX=vector_index

# Vector Store Configuration
VECTOR_STORE_TYPE=mongodb_atlas
EMBEDDING_DIMENSION=3072

# Connection Pool Settings
MONGODB_MAX_POOL_SIZE=100
MONGODB_MIN_POOL_SIZE=10
MONGODB_MAX_IDLE_TIME_MS=30000
MONGODB_CONNECT_TIMEOUT_MS=10000
MONGODB_SERVER_SELECTION_TIMEOUT_MS=10000

Step 4: Basic Integration

from packages.rag.stores import MongoDBAtlasVectorStore, VectorDocument
from packages.rag.mongodb_retrievers import MongoDBVectorRetriever

# Initialize vector store
vector_store = MongoDBAtlasVectorStore(
uri="mongodb+srv://username:password@cluster.mongodb.net/",
database="recoagent",
collection="documents",
vector_search_index="vector_index"
)

# Initialize retriever
retriever = MongoDBVectorRetriever(vector_store)

# Create sample documents
documents = [
VectorDocument(
id="doc1",
content="Machine learning is a subset of artificial intelligence",
embedding=[0.1, 0.2, 0.3, ...], # Your embedding
metadata={"category": "AI", "year": 2023}
)
]

# Add documents
vector_store.add_documents(documents)

# Search documents
results = retriever.retrieve("machine learning", k=5)

Hybrid Search Setup

from packages.rag.mongodb_retrievers import MongoDBHybridRetriever, MongoDBHybridConfig

# Configure hybrid search
config = MongoDBHybridConfig(
text_weight=0.3,
vector_weight=0.7
)

retriever = MongoDBHybridRetriever(vector_store, config)

# Create text index for hybrid search
retriever.create_text_index(['content', 'title'])

# Perform hybrid search
results = retriever.retrieve("machine learning algorithms", k=10)

Step 5: Advanced Configuration

Production Configuration

from config.settings import get_config
from packages.rag.stores import get_vector_store

# Get configuration
config = get_config()

# Initialize with production settings
vector_store = get_vector_store(
"mongodb_atlas",
uri=config.vector_store.mongodb_uri,
database=config.vector_store.mongodb_database,
collection=config.vector_store.mongodb_collection,
vector_search_index=config.vector_store.mongodb_vector_search_index,
embedding_dim=config.llm.embedding_dimension,
max_pool_size=config.vector_store.mongodb_max_pool_size,
min_pool_size=config.vector_store.mongodb_min_pool_size
)

Async Operations

import asyncio
from packages.rag.mongodb_retrievers import MongoDBVectorRetriever

async def async_search_example():
retriever = MongoDBVectorRetriever(vector_store)

# Async search
results = await retriever.retrieve_async("artificial intelligence", k=10)

return results

# Run async example
results = asyncio.run(async_search_example())

Step 6: Integration with RecoAgent API

Update API Configuration

The MongoDB integration is already included in the RecoAgent API. Simply set the environment variable:

VECTOR_STORE_TYPE=mongodb_atlas

Test API Integration

# Start the API
python -m uvicorn apps.api.main:app --reload

# Test the search endpoint
curl -X POST "http://localhost:8000/search" \
-H "Content-Type: application/json" \
-d '{"query": "machine learning", "k": 5}'

Step 7: Monitoring and Optimization

Performance Monitoring

# Get collection statistics
stats = vector_store.get_stats()
print(f"Total documents: {stats['total_documents']}")
print(f"Storage size: {stats['storage_size']} bytes")

# Monitor query performance
import time

start_time = time.time()
results = vector_store.search(query_embedding, k=10)
search_time = time.time() - start_time
print(f"Search completed in {search_time:.3f} seconds")

Connection Pool Optimization

# Optimize for high concurrency
vector_store = MongoDBAtlasVectorStore(
uri=uri,
database=database,
collection=collection,
vector_search_index=index_name,
max_pool_size=200, # Increase for high concurrency
min_pool_size=50, # Keep minimum connections alive
max_idle_time_ms=60000 # Longer idle timeout
)

Step 8: Troubleshooting

Common Issues

  1. Connection Errors

    Error: ServerSelectionTimeoutError
    Solution: Check network access and IP whitelist
  2. Authentication Failed

    Error: Authentication failed
    Solution: Verify username, password, and database permissions
  3. Vector Search Index Not Found

    Error: Index not found
    Solution: Create vector search index manually in MongoDB Atlas

Debug Mode

import logging

# Enable debug logging
logging.getLogger('pymongo').setLevel(logging.DEBUG)
logging.getLogger('motor').setLevel(logging.DEBUG)

# Test with debug logging
vector_store = MongoDBAtlasVectorStore(uri=uri, database=database)

Step 9: Production Deployment

Environment-Specific Configuration

Development:

MONGODB_URI=mongodb://localhost:27017
MONGODB_DATABASE=recoagent_dev
MONGODB_MAX_POOL_SIZE=10

Staging:

MONGODB_URI=mongodb+srv://username:password@staging-cluster.mongodb.net/
MONGODB_DATABASE=recoagent_staging
MONGODB_MAX_POOL_SIZE=50

Production:

MONGODB_URI=mongodb+srv://username:password@production-cluster.mongodb.net/
MONGODB_DATABASE=recoagent
MONGODB_MAX_POOL_SIZE=200

Security Best Practices

  1. Use Strong Passwords
  2. Restrict Network Access to specific IP addresses
  3. Enable Encryption in transit and at rest
  4. Regular Security Updates
  5. Monitor Access Logs

Step 10: Testing

Unit Tests

# Run MongoDB tests
python -m pytest tests/test_mongodb_vector_search.py -v

Integration Tests

# Run integration tests with real MongoDB
MONGODB_URI=your_uri python -m pytest tests/test_mongodb_vector_search.py::TestMongoDBIntegration -v

Performance Tests

# Run performance benchmarks
python examples/mongodb_performance_benchmark.py

Next Steps

Support

For additional help:

Your MongoDB Atlas Vector Search integration is now ready for production use! 🚀