← Back to Projects

Instacart Market Basket Analysis

An exploratory data analysis of customer shopping behavior using Instacart's grocery shopping dataset to provide actionable insights for inventory management, marketing strategies, and customer retention

Python Pandas Matplotlib EDA

Project Overview

Business Context: Understanding online grocery shopping patterns to optimize product recommendations and inventory management.

This project analyzes shopping patterns, product preferences, and customer reordering behavior from Instacart's transactional data. The analysis provides actionable insights for inventory management, marketing strategies, and customer retention through comprehensive exploratory data analysis (EDA).

What This Demonstrates

Learning Challenge

  • Unfamiliar with retail analytics and customer behavior patterns
  • Needed to understand market basket analysis concepts
  • First exposure to large-scale transactional datasets

Problem-Solving Process

  1. Data Exploration: Used Jupyter Notebook to systematically explore 5 interconnected datasets (orders, products, aisles, departments)
  2. Pattern Recognition: Identified shopping trends, reorder patterns, and product relationships
  3. Tool Utilization: Leveraged Copilot to explore different visualization approaches and pandas operations
  4. Hypothesis Testing: Formed and validated hypotheses about customer behavior through data

Professional Outcome

  • Created an interactive notebook that tells a complete story from raw data to actionable insights
  • Delivered findings that a product manager or business analyst could immediately understand and act upon
  • Demonstrated ability to work with real-world, messy data

Tools Utilized

  • VS Code with GitHub Copilot for development
  • Jupyter Notebook for interactive analysis
  • Git/GitHub for version control

Dataset

The analysis uses five interconnected CSV files containing comprehensive grocery shopping data:

  • instacart_orders.csv - Order history with timestamps (~479k orders)
  • products.csv - Product catalog (~50k products)
  • departments.csv - Department categories (21 departments)
  • aisles.csv - Aisle classifications (134 aisles)
  • order_products.csv - Order line items (~30M items)

Note: All CSV files use semicolon (;) as the delimiter instead of comma.

Key Questions Answered

When do people shop?

Analysis of temporal patterns including time of day and day of week trends to identify peak shopping periods.

What do people buy?

Identification of most popular products, frequently reordered items, and products commonly added to cart first.

How do people shop?

Investigation of basket sizes, reorder frequency patterns, and the balance between new purchases vs. reorders.

Key Findings

Shopping Patterns

  • Peak Hours: Most orders occur between 9:00 AM - 5:00 PM, with notable peaks at 10:00 AM and 3:00 PM
  • Weekly Trends: Orders concentrate at the beginning of the week (Sunday/Monday)
  • Reorder Frequency: Most customers reorder within 7 days, with common patterns at 7, 14, 21, and 28-day intervals

Product Insights

  • Top Categories: Fresh produce and dairy dominate both total orders and reorders
  • Basket Size: Typical orders contain 5-6 items, with most orders having 1-20 items
  • First Additions: Produce, dairy, and beverages are most commonly added to cart first

Customer Behavior

  • Reorder Rate: Customers show strong loyalty to specific products
  • Order Frequency: Most customers place 1-10 orders in the dataset
  • Subscription Patterns: Notable spike at 30-day intervals suggests subscription-based ordering

Technical Approach

  • Multi-file data integration and relationship mapping
  • Temporal pattern analysis (time-based aggregations)
  • Product categorization and segmentation
  • Customer behavior clustering and pattern recognition
  • Data visualization for insight communication
  • Statistical distribution analysis

Skills Demonstrated

  • Exploratory Data Analysis (EDA) methodology
  • Multi-table data integration and joins
  • Temporal data analysis
  • Customer behavior analytics
  • Data visualization with Matplotlib
  • Pattern recognition and insight extraction
  • Business intelligence and actionable recommendations

Technologies Used

Python 3.x Pandas Matplotlib Jupyter Notebook

Business Impact

This analysis provides actionable insights for:

  • Inventory Management: Optimize stock levels based on peak shopping times and popular products
  • Marketing Strategy: Target promotions during high-traffic periods and focus on frequently reordered items
  • Customer Retention: Leverage reorder patterns to implement subscription models and personalized recommendations
  • Operational Planning: Staff scheduling aligned with peak demand hours and days