A full-stack movie recommendation web application with content-based and cluster-based recommendation algorithms.
This project is a movie recommendation system that suggests movies based on user preferences, movie similarity, and genre preferences. The system uses content-based filtering and clustering techniques to provide personalized recommendations from a dataset of over 25 million movie entries.
- Search Functionality: Search movies by title or partial title match
- Genre-Based Search: Browse movies by specific genres
- Content-Based Recommendations: Get movies similar to ones you like based on genres and features
- Cluster-Based Recommendations: Alternative recommendation method using KMeans clustering
- Caching System: Improved performance with cached results for repeat searches
- Responsive Web Interface: Clean and user-friendly web interface
- Performance Optimizations: Efficient algorithms to handle large datasets
- Activity Logging: Comprehensive logging for monitoring and debugging
- Enhanced Persistent Caching:
- Added dedicated
cache/directory for persistent cache files - Implemented shelve-based persistent caching for recommendations and genre queries
- Added JSON fallback for cache entries when shelve operations fail
- Cache now survives server restarts with improved data persistence
- Added dedicated
- Fast Lookup Indices: Implemented multi-tier search with specialized indices for titles, titles without years, and word matching
- Batch Processing: Added batch processing for index building to reduce memory usage with large datasets
- Annoy Integration: Added approximate nearest-neighbor search with lazy loading and timeout protection
- Regex Error Handling: Fixed "bad character range" errors in genre regex by properly escaping special characters
- Variable Scope: Resolved "local variable referenced before assignment" issues in recommendation logic
- Cache Directory Management: Added automatic creation of cache directories with proper permissions
- Error Recovery: Improved error handling with detailed logging and graceful fallbacks
- ENABLE_ANNOY: Toggle expensive Annoy-based recommendation features
- ENABLE_CONTENT_BASED: Control content-based recommendation algorithms
- Useful to disable on systems with limited memory (< 16GB)
- System will automatically fall back to cluster-based recommendations on memory errors
- FAST_STARTUP: Run in development mode with
--fastflag to load only a subset of data
- Multi-Strategy Search: Enhanced title matching with hierarchical fallback strategies
- Smart Caching: Improved in-memory and persistent caching for recommendations and genre searches
- Fallback Mechanisms: Graceful degradation with timeout protection and multiple fallback methods
- CORS Support: Added proper cross-origin resource sharing support
- JSONP Support: Added JSONP response wrapping via a decorator
- Enhanced Error Handling: Comprehensive error recovery with detailed logging
- Python 3.10: Core programming language
- Flask: Web server framework
- Pandas: Data manipulation and analysis
- NumPy: Numerical computations
- scikit-learn: Machine learning algorithms for clustering and recommendations
- Flask-CORS: Cross-Origin Resource Sharing support
- HTML/CSS/JavaScript: Frontend development
- Responsive Design: Works on desktop and mobile devices
The system uses three main datasets:
df.csv(~3GB): Main dataset with movie information and pre-computed featuresmovies.csv(~3MB): Basic movie informationratings.csv(~678MB): User ratings data
- Python 3.10 or higher
- Sufficient RAM (16GB+ recommended for optimal performance)
- For smaller systems, use the
--fastflag to reduce memory requirements - System will automatically fall back to less memory-intensive algorithms if needed
- For smaller systems, use the
- Git
git clone https://github.com/Deepanshu-Pahwa/Movie-Reccomendation-Website-With-K-Means.git
cd Movie-Reccomendation-Website-With-K-Meanspip install -r requirements.txtpython backend_df_direct_fixed.pyFor faster startup during development (loads a smaller dataset subset):
python backend_df_direct_fixed.py --fastThe application will be available at http://localhost:3000
Note: On first run, the system will process the dataset and build necessary indices, which may take some time. Cache files will be automatically generated to speed up subsequent startups.
- Home Page: Enter a movie title in the search box or browse by genre
- Search Results: View matching movies and select one for recommendations
- Recommendation Page: Get personalized movie recommendations based on your selection
- Discover: Explore popular movies and trending genres
/search: Search for movies by title or get recommendations/recommend: Get movie recommendations based on a title (with persistent caching)/movies-by-genre: Get movies filtered by genre (with persistent caching)/movie-details: Get detailed information about a movie/api-status: Check API status/clear-cache: Clear the in-memory recommendation cache/clear-persistent-caches: Clear all persistent cache files (shelve and JSON)
- Multi-level Caching System:
- Memory caching for fast, frequent queries
- Persistent disk-based caching (shelve) for recommendations and genre queries
- JSON backup files for cache data redundancy
- Automatic cache directory creation and management
- Timeout mechanisms prevent long-running operations
- Vectorized operations and efficient filtering for large dataset handling
- Boolean indexing and sampling techniques for improved performance
- Regex optimization with proper escape handling for special characters
- Memory Management:
- Graceful fallbacks when memory-intensive operations fail
- Automatic switch to cluster-based recommendations if content-based filtering exceeds memory limits
- Memory error handling for large matrix operations (especially with 25M+ rows)
- Configurable via feature flags to adapt to available system resources
The system implements a sophisticated caching strategy:
- In-memory Caching: Fastest access for recent queries
- Persistent Caching:
- Uses Python's
shelvemodule for dictionary-like persistent storage - Maintains separate files for genre and recommendation caches
- Creates JSON backups for each cache entry as fallback
- Cache files persist across server restarts
- Uses Python's
- Cache Management:
- Clear in-memory cache via
/clear-cacheendpoint - Clear persistent caches via
/clear-persistent-cachesendpoint - Automatic cache validation and error recovery
- Clear in-memory cache via
βββ backend_df_direct_fixed.py # Main Flask application
βββ df.csv # Main dataset
βββ movies.csv # Movie information
βββ ratings.csv # User ratings
βββ requirements.txt # Python dependencies
βββ index.html # Home page
βββ search-results.html # Search results page
βββ styles.css # Main CSS styles
βββ cache/ # Persistent cache directory
β βββ genre_cache_*.json # JSON backup for genre cache
β βββ genre_cache_*.db # Shelve-based genre cache
β βββ recommendation_cache_*.json # JSON backup for recommendation cache
β βββ recommendation_cache_*.db # Shelve-based recommendation cache
βββ static/ # Static assets
β βββ css/ # CSS files
β βββ js/ # JavaScript files
βββ activity_log_*.log # Activity logs
Note: Cache files (
df_cache.pklandindices_cache.pkl) will be generated automatically when you run the application for the first time. Additionally, persistent cache files for recommendations and genres will be created in thecache/directory. These files significantly improve startup time and query performance on subsequent runs and should not be committed to version control.
Cyril Sabu George
- GitHub: @phoneix116
- LinkedIn: [www.linkedin.com/in/cyrilsabugeorge]
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add some amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- Kaggle - For the dataset
- scikit-learn - For machine learning algorithms
- Flask - For the web framework
Project Link: https://github.com/phoenix116/Movie-Reccomendation-Website-With-K-Means