🎬 Movie Recommendation System

Ask about movies by genre, actors, plot summaries, or reviews — just like chatting with a friend.

🎯 Why This Project?

Traditional movie search is broken. You know the feeling:

🤔 "I want something like Inception but not sci-fi"
😤 "Show me action movies but make them smart"
🎭 "Find me movies with that actor from that thing"

✨ Features

💬 Conversational Movie Search
Ask natural questions about movies by genre, cast, plot, or reviews.
🎯 Smart Recommendations
Get personalized suggestions based on your interests using state-of-the-art AI.
⚡ Instant Movie Info
Instantly see ratings, summaries, and reviews—no more manual searches.
🤖 Multi-Agent Architecture
Orchestrated agents for retrieval, recommendations, and external data integration.
🔍 Vector-Powered Search
FAISS-based similarity search with OpenAI embeddings (no heavy downloads required).
⚡ Lightning Fast Setup
No CUDA libraries or large ML models - just install and go!

This system gets it. Instead of keyword matching, it understands context, meaning, and relationships between movies. Ask naturally, get perfect results.

🧠 The Secret Sauce

RAG Architecture - Combines retrieval with generation for nuanced responses
Vector Similarity - Finds movies based on meaning, not just keywords
Multi-Agent System - Specialized AI agents work together for complex queries
No Setup Hell - Lightweight, fast, and CUDA-free

🛠️ Technologies & Tools

Technology	Purpose	Version
🐍 Python	Core Language	3.8+
🤖 OpenAI GPT-3.5	Language Generation	Latest
🦜 LangChain	LLM Framework	0.3+
🔍 FAISS	Vector Similarity Search	CPU Version (No CUDA)
🔗 OpenAI Embeddings	Text Embeddings	text-embedding-3-small
🎨 Gradio	Web UI Framework	5.0+
📊 Pandas	Data Processing	Latest
📓 Jupyter	Interactive Development	Latest

🚀 Quick Start

Get up and running in under 2 minutes!

Option 1: Python Application (Recommended)

# 1. Clone the repository
git clone <repository-url>
cd rag-movie-rec

# 2. Create virtual environment (recommended)
python3 -m venv movie-env
source movie-env/bin/activate  # On Windows: movie-env\Scripts\activate

# 3. Install dependencies (lightweight, no CUDA!)
pip install -r requirements.txt

# 4. Set up your OpenAI API key
export OPENAI_API_KEY="your-openai-api-key"

# 5. Build the vector store
python main.py --mode build

# 6. Launch the web UI
python main.py --mode ui

💡 Pro tip: Get your OpenAI API key at platform.openai.com/api-keys

Option 2: Jupyter Notebook (Step-by-step)

# 1. Install dependencies
pip install -r requirements.txt

# 2. Launch Jupyter
jupyter notebook

# 3. Open and run notebooks/MovieFinder_Main.ipynb

🧠 How It Works

Architecture Overview

graph TB
    A[User Query] --> B[Vector Search]
    B --> C[FAISS Index]
    C --> D[Similar Movies]
    D --> E[RAG Pipeline]
    E --> F[GPT-3.5 Turbo]
    F --> G[Personalized Response]
    
    H[Movie Dataset] --> I[Text Chunking]
    I --> J[Sentence Transformers]
    J --> K[Embeddings]
    K --> C

The Process

📂 Data Ingestion
Load and clean IMDb movie dataset with ratings, cast, genres, and descriptions.
✍️ Description Generation
Create natural language descriptions for each movie combining all metadata.
🔄 Text Chunking
Split descriptions into overlapping chunks for better retrieval granularity.
🧬 Embedding Creation
Convert text chunks to high-dimensional vectors using Sentence Transformers.
🗃️ Vector Store Building
Build FAISS index for lightning-fast similarity search across 7,000+ chunks.
🔍 Query Processing
Convert user queries to embeddings and find most similar movie content.
🤖 AI Generation
Use retrieved context with GPT-3.5-turbo to generate personalized responses.
🎨 User Interface
Present results through intuitive Gradio web interface or CLI.

📋 Usage Examples

Web Interface

Launch the web UI and try these queries:

"Recommend documentaries about famous people"
"What are some good action movies from the 2000s?"
"Movies similar to The Lord of the Rings"
"Comedy movies with high IMDb ratings"

Command Line

# Interactive CLI mode
python main.py --mode cli

# Build/rebuild vector store
python main.py --mode build

Python API

from src.movie_recommender.main import MovieRecommenderApp

app = MovieRecommenderApp()
app.setup_agents()

result = app.orchestrator.process_query("Sci-fi movies with robots")
print(result)

🏗️ Project Structure

rag-movie-rec/
├── src/movie_recommender/          # Core application modules
│   ├── __init__.py
│   ├── config.py                   # Configuration management
│   ├── data_processor.py           # Data processing utilities
│   ├── vector_store.py             # FAISS vector operations
│   ├── agents.py                   # Multi-agent orchestration
│   ├── rag_pipeline.py             # RAG implementation
│   ├── ui.py                       # Gradio interface
│   └── main.py                     # Application entry point
├── notebooks/                      # Jupyter development notebooks
│   ├── MovieFinder_Main.ipynb      # Primary development notebook
│   └── MovieFinder_Supplemental.ipynb
├── scripts/                        # Data preparation scripts
│   ├── SetupData.py                # Dataset downloading & merging
│   ├── format_movie.py              # Cast column formatting
│   └── ...
├── tests/                          # Test suite
├── data/                           # Generated data files
├── main.py                         # Application launcher
├── requirements.txt                # Python dependencies
└── README.md                       # This file

🔧 Development

Setting Up Development Environment

# Install development dependencies
pip install -r requirements.txt

# Install in editable mode
pip install -e .

# Run tests
python -m pytest tests/

# Format code
black src/ tests/

# Lint code
flake8 src/ tests/

Data Pipeline

# 1. Download and prepare dataset (requires Kaggle API)
python scripts/SetupData.py

# 2. Format cast columns
python scripts/format_movie.py

# 3. Build vector store
python main.py --mode build

🧪 Testing

# Run all tests
python -m pytest tests/ -v

# Run specific test categories
python -m pytest tests/test_vector_store.py -v
python -m pytest tests/test_agents.py -v

# Run with coverage
python -m pytest tests/ --cov=src/movie_recommender --cov-report=html

📝 Configuration

Environment Variables

# Required
export OPENAI_API_KEY="your-openai-api-key"

# Optional
export OMDB_API_KEY="your-omdb-api-key"

Customization

Edit src/movie_recommender/config.py to customize:

Model parameters (temperature, max tokens)
Chunk sizes and overlap
Vector store settings
UI configuration

📊 Performance

Dataset: 3,653 movies with 7,225 text chunks
Vector Search: Sub-second similarity queries
Memory Usage: ~100MB total installation (no heavy ML libraries)
Embedding Model: 1,536-dimensional vectors (OpenAI text-embedding-3-small)
Setup Time: Under 2 minutes vs 30+ minutes with local models

🤝 Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI for GPT-3.5-turbo language model
Meta AI for FAISS vector search library
Hugging Face for Sentence Transformers
LangChain for LLM orchestration framework
Gradio for the beautiful web interface

IMDb for the movie dataset

Ready to roll? Let's build a smarter way to search for movies. 🍿

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
images		images
notebooks		notebooks
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
IMDb_Dataset_Composite_Cleaned.csv		IMDb_Dataset_Composite_Cleaned.csv
MovieFinder.code-workspace		MovieFinder.code-workspace
Movie_Descriptions.csv		Movie_Descriptions.csv
README.md		README.md
demo.py		demo.py
image.png		image.png
main.py		main.py
movie_chunks_metadata.csv		movie_chunks_metadata.csv
movie_vector_store.index		movie_vector_store.index
pytest.ini		pytest.ini
requirements.txt		requirements.txt
run_tests.py		run_tests.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎬 Movie Recommendation System

🎯 Why This Project?

✨ Features

🧠 The Secret Sauce

🛠️ Technologies & Tools

🚀 Quick Start

Option 1: Python Application (Recommended)

Option 2: Jupyter Notebook (Step-by-step)

🧠 How It Works

Architecture Overview

The Process

📋 Usage Examples

Web Interface

Command Line

Python API

🏗️ Project Structure

🔧 Development

Setting Up Development Environment

Data Pipeline

🧪 Testing

📝 Configuration

Environment Variables

Customization

📊 Performance

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

wbott/rag-movie-rec

Folders and files

Latest commit

History

Repository files navigation

🎬 Movie Recommendation System

🎯 Why This Project?

✨ Features

🧠 The Secret Sauce

🛠️ Technologies & Tools

🚀 Quick Start

Option 1: Python Application (Recommended)

Option 2: Jupyter Notebook (Step-by-step)

🧠 How It Works

Architecture Overview

The Process

📋 Usage Examples

Web Interface

Command Line

Python API

🏗️ Project Structure

🔧 Development

Setting Up Development Environment

Data Pipeline

🧪 Testing

📝 Configuration

Environment Variables

Customization

📊 Performance

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages