MovieMate – TF-IDF & Cosine Similarity Movie Recommender

🎬 MovieMate — Personalized Movie Recommender

Content-Based Recommendation System with TF-IDF & Cosine Similarity

Machine Learning | Streamlit | TMDB API Integration

Project Overview

Built MovieMate, a personalized movie recommendation system that suggests 5 similar movies for any given title using content-based filtering on the TMDB 5000 movie dataset. The system analyzes movie attributes — genres, cast, crew, keywords, and overviews — to compute cosine similarity between films and surface intelligent recommendations.

The project is delivered as an interactive Streamlit web app with live poster fetching via the TMDB API, packaged with a clean modular structure (notebooks for EDA & training, src for production code, artifacts for serialized models) and ready for one-command deployment.

MovieMate Movie Recommender Demo

Recommendation Pipeline Architecture

📥
Data Ingestion
TMDB 5000
🧹
Feature Engineering
Tags Creation
🔤
Vectorization
TF-IDF
📐
Similarity Matrix
Cosine
📦
Model Artifacts
Pickle Files
🎬
Streamlit App
TMDB API

🏗️ Content-Based Filtering

Approach: Movies are converted into feature vectors using TF-IDF on combined tags (genres + cast + crew + keywords + overview). Cosine similarity between vectors ranks the top 5 most similar films — scores range from 0 (no overlap) to 1 (identical).

Key Features & Capabilities

🎯

Personalized Recommendations

Suggests 5 highly relevant movies for any selected title from a catalog of 5000+ films, ranked by cosine similarity score.

🧮

TF-IDF Vectorization

Term Frequency–Inverse Document Frequency encoding converts movie metadata into meaningful numerical vectors for similarity computation.

📐

Cosine Similarity Engine

Measures angular distance between movie vectors to rank films — robust to vector magnitude and ideal for sparse text features.

🖼️

Live Poster Fetching

Real-time integration with the TMDB API to fetch and display movie posters alongside recommendations for a polished UX.

Streamlit Web Interface

Clean, interactive UI with movie search dropdown, one-click recommendations, and instant visual feedback with posters.

📦

Pre-Computed Artifacts

Similarity matrix and processed movie data serialized as pickle files for instant inference without re-training at runtime.

Technical Implementation

📥 Data Processing & Feature Engineering

  • Dataset: TMDB 5000 Movies & Credits dataset from Kaggle
  • Feature Extraction: Parsed JSON-formatted columns (genres, cast, crew, keywords) into clean lists
  • Tag Construction: Combined overview, genres, top cast, director, and keywords into a unified “tags” feature
  • Text Cleaning: Lowercased, removed spaces in entity names, applied stemming with PorterStemmer
  • Data Merging: Joined movies and credits datasets on title for complete metadata

🧠 Recommendation Algorithm

  • Vectorization: TF-IDF / CountVectorizer with 5000 max features and English stop-word removal
  • Similarity Computation: Pairwise cosine similarity matrix across all movie vectors
  • Ranking Logic: Sort similarity scores per movie, return top 5 excluding the queried film
  • Algorithm Type: Content-based filtering (no cold-start issues for new movies with metadata)
  • Output: Top-N recommendations with movie titles and poster images

🎨 Frontend & API Integration

  • Streamlit UI: Dropdown movie selector, recommend button, 5-column poster grid layout
  • TMDB API: Fetches movie posters via movie_id with secure API key management
  • Environment Variables: API keys stored in .env.local (gitignored) with .env.example template
  • Error Handling: Graceful fallback for missing posters and failed API calls
  • Modular Code: Separated recommender logic (src/recommender.py) from UI (app.py)

🚀 Project Structure & Deployment

  • Notebooks: EDA and model training in notebooks/movie_recommender_analysis.ipynb
  • Artifacts: Serialized movie_dict.pkl and similarity.pkl for production use
  • Deployment Ready: Procfile + setup.sh for Heroku/Streamlit Cloud deployment
  • Dependencies: Pinned in requirements.txt for reproducible builds
  • Licensing: MIT licensed for open use and contribution

Recommendation System Approaches

🎯 Content-Based ✅

Used in MovieMate. Recommends based on movie attributes — genres, cast, crew, keywords. No user data required, no cold-start for new movies.

👥 Collaborative Filtering

Recommends based on user-item interaction patterns and similarity between users. Powerful but requires interaction history.

🔀 Hybrid Systems

Combines content + collaborative approaches. Used in production by Netflix, Spotify, and Amazon for best-in-class personalization.

Technology Stack

🐍 Python
🎈 Streamlit
🧪 Scikit-learn
🐼 Pandas
🔢 NumPy
📚 NLTK
🎥 TMDB API
Visited 1 times, 1 visit(s) today