Comprehensive data analysis using pandas to explore a dataset of movies and TV shows, including cast information, genres, release years, and IMDb ratings
Business Context: Understanding streaming content landscape for content acquisition or production decisions.
This project demonstrates fundamental data analysis skills including data cleaning and standardization, exploratory data analysis (EDA), data filtering and manipulation with pandas, custom function development for reusable analysis, and IMDb rating categorization and insights.
The analysis uses the movies_and_shows.csv dataset, which contains comprehensive information about actors, characters, roles, titles, content types, release years, genres, and IMDb ratings and votes.
Standardized inconsistent column names, converted mixed-case headers to lowercase with underscores, and replaced special characters for consistency.
Developed reusable functions like get_actors_for_title() to retrieve cast lists and categorize_imdb_score() to classify content quality.
Implemented a tiered rating system: Excellent (≥9.0), Good (7.0-8.9), Average (5.0-6.9), and Low (<5.0) for meaningful insights.
Returns a comma-separated list of all actors for a given movie or show:
get_actors_for_title("Taxi Driver")
# Returns: "Robert De Niro, Jodie Foster, Harvey Keitel, ..."
Categorizes movies/shows into quality tiers based on IMDb scores: