Movies & Shows Analysis | Victor Foster

Project Overview

Business Context: Understanding streaming content landscape for content acquisition or production decisions.

This project demonstrates fundamental data analysis skills including data cleaning and standardization, exploratory data analysis (EDA), data filtering and manipulation with pandas, custom function development for reusable analysis, and IMDb rating categorization and insights.

What This Demonstrates

Learning Challenge

Pandas library fundamentals (filtering, sorting, grouping)
Working with mixed content types and missing data
Understanding streaming industry dynamics

Problem-Solving Process

Tool Mastery: Systematically learned pandas operations through this project
Data Quality: Addressed missing data and inconsistencies professionally
Content Analysis: Examined genres, release patterns, and content distribution
Insight Generation: Identified trends that could guide content strategy

Professional Outcome

Created analysis that a content team could use for programming decisions
Demonstrated systematic approach to learning a new technical library
Built foundation skills transferable to any data manipulation task

Tools Utilized

VS Code with GitHub Copilot for development
Jupyter Notebook for interactive analysis
Git/GitHub for version control

Dataset

The analysis uses the movies_and_shows.csv dataset, which contains comprehensive information about actors, characters, roles, titles, content types, release years, genres, and IMDb ratings and votes.

Key Features

Data Cleaning

Standardized inconsistent column names, converted mixed-case headers to lowercase with underscores, and replaced special characters for consistency.

Custom Functions

Developed reusable functions like get_actors_for_title() to retrieve cast lists and categorize_imdb_score() to classify content quality.

Rating Categorization

Implemented a tiered rating system: Excellent (≥9.0), Good (7.0-8.9), Average (5.0-6.9), and Low (<5.0) for meaningful insights.

Technical Highlights

get_actors_for_title() Function

Returns a comma-separated list of all actors for a given movie or show:

get_actors_for_title("Taxi Driver")
# Returns: "Robert De Niro, Jodie Foster, Harvey Keitel, ..."

categorize_imdb_score() Function

Categorizes movies/shows into quality tiers based on IMDb scores:

Excellent: IMDb score ≥ 9.0
Good: IMDb score 7.0 - 8.9
Average: IMDb score 5.0 - 6.9
Low: IMDb score < 5.0

Skills Demonstrated

Pandas DataFrame manipulation and filtering
Data cleaning and standardization techniques
Custom function development for code reusability
Exploratory data analysis (EDA)
String manipulation and text processing
Conditional logic and data categorization
Jupyter Notebook documentation

Technologies Used

Python 3.x Pandas Jupyter Notebook

Project Links

GitHub View Notebook Back to Projects

Movies & Shows Data Analysis