Integrate MongoDB Atlas Vector Search
Difficulty: ⭐⭐ Intermediate | Time: 1 hour
🎯 The Problem
You want to use MongoDB for both your application data AND vector search, avoiding the need for a separate vector database. You need unified storage but aren't sure how to enable and configure MongoDB's vector search capabilities.
This guide solves: Setting up MongoDB Atlas Vector Search so you can store documents and perform semantic search in a single database.
⚡ TL;DR - Quick Start
from packages.rag.stores import MongoDBAtlasStore
# 1. Connect to MongoDB Atlas
store = MongoDBAtlasStore(
connection_string="mongodb+srv://user:pass@cluster.mongodb.net",
database_name="recoagent",
collection_name="documents",
index_name="vector_index"
)
# 2. Add documents
store.add_document("doc_1", "Your content", metadata={"type": "guide"})
# 3. Search
results = store.search("your query", k=5)
print(f"✅ MongoDB vector search working! Found {len(results)} docs")
Expected: Documents indexed and searchable in MongoDB Atlas!
Full Guide
This guide walks you through integrating MongoDB Atlas Vector Search with RecoAgent for unified document and vector storage.
Prerequisites
- MongoDB Atlas cluster with Vector Search enabled
- Python 3.8+
- RecoAgent installed
- MongoDB Atlas credentials
Step 1: Set Up MongoDB Atlas
Create MongoDB Atlas Cluster
-
Sign up for MongoDB Atlas
- Go to MongoDB Atlas
- Create a free account or sign in
-
Create a New Cluster
- Click "Build a Database"
- Choose "M0 Sandbox" (free tier) or higher
- Select your preferred cloud provider and region
- Click "Create Cluster"
-
Enable Vector Search
- In your cluster, go to "Search" tab
- Click "Create Search Index"
- Choose "Vector Search" option
- Configure your vector search index
Configure Database Access
-
Create Database User
- Go to "Database Access" in the left sidebar
- Click "Add New Database User"
- Choose "Password" authentication
- Create a strong password
- Assign "Atlas Admin" role
- Click "Add User"
-
Configure Network Access
- Go to "Network Access" in the left sidebar
- Click "Add IP Address"
- Choose "Allow Access from Anywhere" (0.0.0.0/0) for development
- For production, add specific IP addresses
- Click "Confirm"
Get Connection String
- Get Connection URI
- Go to "Database" in the left sidebar
- Click "Connect" on your cluster
- Choose "Connect your application"
- Select "Python" and version "3.6 or later"
- Copy the connection string
- Replace
<password>
with your database user password
Step 2: Install Dependencies
# Install MongoDB dependencies
pip install pymongo motor
# Or install all RecoAgent dependencies
pip install -r requirements.txt
Step 3: Configure Environment Variables
Create a .env
file or set environment variables:
# MongoDB Atlas Configuration
MONGODB_URI=mongodb+srv://username:password@cluster.mongodb.net/
MONGODB_DATABASE=recoagent
MONGODB_COLLECTION=documents
MONGODB_VECTOR_SEARCH_INDEX=vector_index
# Vector Store Configuration
VECTOR_STORE_TYPE=mongodb_atlas
EMBEDDING_DIMENSION=3072
# Connection Pool Settings
MONGODB_MAX_POOL_SIZE=100
MONGODB_MIN_POOL_SIZE=10
MONGODB_MAX_IDLE_TIME_MS=30000
MONGODB_CONNECT_TIMEOUT_MS=10000
MONGODB_SERVER_SELECTION_TIMEOUT_MS=10000
Step 4: Basic Integration
Simple Vector Search
from packages.rag.stores import MongoDBAtlasVectorStore, VectorDocument
from packages.rag.mongodb_retrievers import MongoDBVectorRetriever
# Initialize vector store
vector_store = MongoDBAtlasVectorStore(
uri="mongodb+srv://username:password@cluster.mongodb.net/",
database="recoagent",
collection="documents",
vector_search_index="vector_index"
)
# Initialize retriever
retriever = MongoDBVectorRetriever(vector_store)
# Create sample documents
documents = [
VectorDocument(
id="doc1",
content="Machine learning is a subset of artificial intelligence",
embedding=[0.1, 0.2, 0.3, ...], # Your embedding
metadata={"category": "AI", "year": 2023}
)
]
# Add documents
vector_store.add_documents(documents)
# Search documents
results = retriever.retrieve("machine learning", k=5)
Hybrid Search Setup
from packages.rag.mongodb_retrievers import MongoDBHybridRetriever, MongoDBHybridConfig
# Configure hybrid search
config = MongoDBHybridConfig(
text_weight=0.3,
vector_weight=0.7
)
retriever = MongoDBHybridRetriever(vector_store, config)
# Create text index for hybrid search
retriever.create_text_index(['content', 'title'])
# Perform hybrid search
results = retriever.retrieve("machine learning algorithms", k=10)
Step 5: Advanced Configuration
Production Configuration
from config.settings import get_config
from packages.rag.stores import get_vector_store
# Get configuration
config = get_config()
# Initialize with production settings
vector_store = get_vector_store(
"mongodb_atlas",
uri=config.vector_store.mongodb_uri,
database=config.vector_store.mongodb_database,
collection=config.vector_store.mongodb_collection,
vector_search_index=config.vector_store.mongodb_vector_search_index,
embedding_dim=config.llm.embedding_dimension,
max_pool_size=config.vector_store.mongodb_max_pool_size,
min_pool_size=config.vector_store.mongodb_min_pool_size
)
Async Operations
import asyncio
from packages.rag.mongodb_retrievers import MongoDBVectorRetriever
async def async_search_example():
retriever = MongoDBVectorRetriever(vector_store)
# Async search
results = await retriever.retrieve_async("artificial intelligence", k=10)
return results
# Run async example
results = asyncio.run(async_search_example())
Step 6: Integration with RecoAgent API
Update API Configuration
The MongoDB integration is already included in the RecoAgent API. Simply set the environment variable:
VECTOR_STORE_TYPE=mongodb_atlas
Test API Integration
# Start the API
python -m uvicorn apps.api.main:app --reload
# Test the search endpoint
curl -X POST "http://localhost:8000/search" \
-H "Content-Type: application/json" \
-d '{"query": "machine learning", "k": 5}'
Step 7: Monitoring and Optimization
Performance Monitoring
# Get collection statistics
stats = vector_store.get_stats()
print(f"Total documents: {stats['total_documents']}")
print(f"Storage size: {stats['storage_size']} bytes")
# Monitor query performance
import time
start_time = time.time()
results = vector_store.search(query_embedding, k=10)
search_time = time.time() - start_time
print(f"Search completed in {search_time:.3f} seconds")
Connection Pool Optimization
# Optimize for high concurrency
vector_store = MongoDBAtlasVectorStore(
uri=uri,
database=database,
collection=collection,
vector_search_index=index_name,
max_pool_size=200, # Increase for high concurrency
min_pool_size=50, # Keep minimum connections alive
max_idle_time_ms=60000 # Longer idle timeout
)
Step 8: Troubleshooting
Common Issues
-
Connection Errors
Error: ServerSelectionTimeoutError
Solution: Check network access and IP whitelist -
Authentication Failed
Error: Authentication failed
Solution: Verify username, password, and database permissions -
Vector Search Index Not Found
Error: Index not found
Solution: Create vector search index manually in MongoDB Atlas
Debug Mode
import logging
# Enable debug logging
logging.getLogger('pymongo').setLevel(logging.DEBUG)
logging.getLogger('motor').setLevel(logging.DEBUG)
# Test with debug logging
vector_store = MongoDBAtlasVectorStore(uri=uri, database=database)
Step 9: Production Deployment
Environment-Specific Configuration
Development:
MONGODB_URI=mongodb://localhost:27017
MONGODB_DATABASE=recoagent_dev
MONGODB_MAX_POOL_SIZE=10
Staging:
MONGODB_URI=mongodb+srv://username:password@staging-cluster.mongodb.net/
MONGODB_DATABASE=recoagent_staging
MONGODB_MAX_POOL_SIZE=50
Production:
MONGODB_URI=mongodb+srv://username:password@production-cluster.mongodb.net/
MONGODB_DATABASE=recoagent
MONGODB_MAX_POOL_SIZE=200
Security Best Practices
- Use Strong Passwords
- Restrict Network Access to specific IP addresses
- Enable Encryption in transit and at rest
- Regular Security Updates
- Monitor Access Logs
Step 10: Testing
Unit Tests
# Run MongoDB tests
python -m pytest tests/test_mongodb_vector_search.py -v
Integration Tests
# Run integration tests with real MongoDB
MONGODB_URI=your_uri python -m pytest tests/test_mongodb_vector_search.py::TestMongoDBIntegration -v
Performance Tests
# Run performance benchmarks
python examples/mongodb_performance_benchmark.py
Next Steps
Support
For additional help:
- Check the MongoDB Atlas Documentation
- Review the RecoAgent MongoDB Guide
- Open an issue in the RecoAgent repository
Your MongoDB Atlas Vector Search integration is now ready for production use! 🚀