🎬 MovieMate — Personalized Movie Recommender

Content-Based Recommendation System with TF-IDF & Cosine Similarity

Machine Learning | Streamlit | TMDB API Integration

Project Overview

Built MovieMate, a personalized movie recommendation system that suggests 5 similar movies for any given title using content-based filtering on the TMDB 5000 movie dataset. The system analyzes movie attributes — genres, cast, crew, keywords, and overviews — to compute cosine similarity between films and surface intelligent recommendations.

The project is delivered as an interactive Streamlit web app with live poster fetching via the TMDB API, packaged with a clean modular structure (notebooks for EDA & training, src for production code, artifacts for serialized models) and ready for one-command deployment.

Recommendation Pipeline Architecture

📥

Data Ingestion

TMDB 5000

→

🧹

Feature Engineering

Tags Creation

→

🔤

Vectorization

TF-IDF

→

📐

Similarity Matrix

Cosine

→

📦

Model Artifacts

Pickle Files

→

🎬

Streamlit App

TMDB API

🏗️ Content-Based Filtering

Approach: Movies are converted into feature vectors using TF-IDF on combined tags (genres + cast + crew + keywords + overview). Cosine similarity between vectors ranks the top 5 most similar films — scores range from 0 (no overlap) to 1 (identical).

Key Features & Capabilities

🎯

Personalized Recommendations

Suggests 5 highly relevant movies for any selected title from a catalog of 5000+ films, ranked by cosine similarity score.

🧮

TF-IDF Vectorization

Term Frequency–Inverse Document Frequency encoding converts movie metadata into meaningful numerical vectors for similarity computation.

📐

Cosine Similarity Engine

Measures angular distance between movie vectors to rank films — robust to vector magnitude and ideal for sparse text features.

🖼️

Live Poster Fetching

Real-time integration with the TMDB API to fetch and display movie posters alongside recommendations for a polished UX.

⚡

Streamlit Web Interface

Clean, interactive UI with movie search dropdown, one-click recommendations, and instant visual feedback with posters.

📦

Pre-Computed Artifacts

Similarity matrix and processed movie data serialized as pickle files for instant inference without re-training at runtime.

Technical Implementation

📥 Data Processing & Feature Engineering

Dataset: TMDB 5000 Movies & Credits dataset from Kaggle
Feature Extraction: Parsed JSON-formatted columns (genres, cast, crew, keywords) into clean lists
Tag Construction: Combined overview, genres, top cast, director, and keywords into a unified “tags” feature
Text Cleaning: Lowercased, removed spaces in entity names, applied stemming with PorterStemmer
Data Merging: Joined movies and credits datasets on title for complete metadata

🧠 Recommendation Algorithm

Vectorization: TF-IDF / CountVectorizer with 5000 max features and English stop-word removal
Similarity Computation: Pairwise cosine similarity matrix across all movie vectors
Ranking Logic: Sort similarity scores per movie, return top 5 excluding the queried film
Algorithm Type: Content-based filtering (no cold-start issues for new movies with metadata)
Output: Top-N recommendations with movie titles and poster images

🎨 Frontend & API Integration

Streamlit UI: Dropdown movie selector, recommend button, 5-column poster grid layout
TMDB API: Fetches movie posters via movie_id with secure API key management
Environment Variables: API keys stored in .env.local (gitignored) with .env.example template
Error Handling: Graceful fallback for missing posters and failed API calls
Modular Code: Separated recommender logic (src/recommender.py) from UI (app.py)

🚀 Project Structure & Deployment

Notebooks: EDA and model training in notebooks/movie_recommender_analysis.ipynb
Artifacts: Serialized movie_dict.pkl and similarity.pkl for production use
Deployment Ready: Procfile + setup.sh for Heroku/Streamlit Cloud deployment
Dependencies: Pinned in requirements.txt for reproducible builds
Licensing: MIT licensed for open use and contribution

Recommendation System Approaches

🎯 Content-Based ✅

Used in MovieMate. Recommends based on movie attributes — genres, cast, crew, keywords. No user data required, no cold-start for new movies.

👥 Collaborative Filtering

Recommends based on user-item interaction patterns and similarity between users. Powerful but requires interaction history.

🔀 Hybrid Systems

Combines content + collaborative approaches. Used in production by Netflix, Spotify, and Amazon for best-in-class personalization.

Technology Stack

🐍 Python

🎈 Streamlit

🧪 Scikit-learn

🐼 Pandas

🔢 NumPy

📚 NLTK

🎥 TMDB API

Visited 1 times, 1 visit(s) today

MovieMate – TF-IDF & Cosine Similarity Movie Recommender