Skip to content

SiD-array/movie-recommender

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

26 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

CineMatch - Movie Recommender System

A content-based movie recommendation system that suggests similar movies based on your selection. Built with Python and Streamlit, featuring a modern Netflix-inspired UI and an optimized TF-IDF algorithm.

Python Streamlit scikit--learn Streamlit App

Features

  • Content-Based Filtering - Analyzes movie metadata (genres, cast, crew, keywords, plot)
  • TF-IDF Algorithm - Optimized vectorization with bigram support for better recommendations
  • Modern UI - Netflix-inspired dark theme with smooth animations
  • Fast Performance - Cached model loading and API responses
  • Secure - API keys stored in secrets, not in code

Demo

πŸ”— Live App: cinematch-movie-recommend.streamlit.app

Select any movie from the database of 4,800+ films and get instant recommendations!

Run Locally

Prerequisites

Installation

# Clone the repository
git clone https://github.com/SiD-array/movie-recommender.git
cd movie-recommender

# Create virtual environment (recommended)
python -m venv venv
venv\Scripts\activate  # Windows
# source venv/bin/activate  # macOS/Linux

# Install dependencies
pip install -r requirements.txt

# Configure API key
copy .streamlit\secrets.toml.example .streamlit\secrets.toml
# Edit secrets.toml and add your TMDB API key

# Run the app
streamlit run app.py

Open http://localhost:8501 in your browser.

Project Structure

movie-recommender/
β”œβ”€β”€ .streamlit/
β”‚   β”œβ”€β”€ config.toml           # Streamlit theme & server config
β”‚   └── secrets.toml.example  # API key template
β”œβ”€β”€ models/
β”‚   β”œβ”€β”€ movies_dict.pkl       # Processed movie data (4806 movies)
β”‚   └── similarity.pkl        # TF-IDF similarity matrix (Git LFS)
β”œβ”€β”€ app.py                    # Main Streamlit application
β”œβ”€β”€ build_improved_model.py   # Script to rebuild/improve the model
β”œβ”€β”€ download_models.py        # Script to download model files
β”œβ”€β”€ requirements.txt          # Python dependencies
└── README.md

Algorithm

How It Works

The system uses Content-Based Filtering with TF-IDF Vectorization:

Movie Features β†’ Text Preprocessing β†’ TF-IDF Vectorization β†’ Cosine Similarity β†’ Recommendations

TF-IDF (Term Frequency-Inverse Document Frequency)

Unlike simple word counting, TF-IDF weighs words by their importance:

Word Type Example Weight
Rare (discriminative) "christophernolan", "pixar" High
Common (generic) "action", "movie", "story" Low

Formula:

weight = log(1 + term_frequency) Γ— log(total_documents / documents_containing_term)

Features Used

  • Genres - Action, Comedy, Drama, etc.
  • Keywords - Plot-specific tags
  • Cast - Top actors
  • Crew - Director
  • Overview - Plot summary

Key Optimizations

Feature Benefit
Bigrams Captures phrases like "science fiction" as single features
Sublinear TF Diminishing returns for repeated words
Document Frequency Limits Filters out typos and overly common words

Tech Stack

Component Technology
Frontend Streamlit
ML/NLP scikit-learn (TF-IDF, Cosine Similarity)
Data Pandas, NumPy
API TMDB API
Dataset TMDB 5000 Movie Dataset

Rebuilding the Model

To rebuild or customize the recommendation model:

python build_improved_model.py

This script allows you to adjust:

  • max_features - Vocabulary size
  • ngram_range - Unigrams, bigrams, etc.
  • min_df / max_df - Document frequency thresholds

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE for details.

Acknowledgments

Contact

Siddharth Bhople - sid.work0403@gmail.com

Project: github.com/SiD-array/movie-recommender

About

🎬 CineMatch β€” Content-based movie recommender using TF-IDF & cosine similarity. Netflix-inspired UI with 4,800+ movies. Try the live demo!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages