Introduction.

Project Overview

This project uses several visualizations to understand shopping behavior, with most of the analysis focused on Massachusetts. Even though the dataset includes information from across the U.S., looking closely at Massachusetts helps us see clearer patterns in what people buy, how much they spend, and which products or sizes are most popular in the state. We're also currently Massachusetts resident, and that made studying our state's shopping behavior especially exciting!

Each graph answers a specific question about customer behavior. For example, how Massachusetts spending compares to other states, which categories people in Massachusetts spend the most on, whether incentives like discounts affect their spending, which sizes are bought most often, and how age and gender relate to shopping frequency. Together, these visualizations help explain where to market, what to stock, and how to price products based on the behavior of shoppers in Massachusetts. The goal is to turn the data into simple insights that can help guide practical business decisions.

About the Data

Source:

We are using a “Shopping Behaviours” dataset from Kaggle that was posted a month ago. The dataset provides information about consumer behavior and shopping habits across different demographics, locations, and product categories.

Size:

It includes 3,900 customer records with 18 attributes describing purchase details, shopping tendencies, and feedback.

Key Attributes:

Customer ID (int), Age (int), Gender (string), Item Purchased (string), Category (string), Purchase Amount (USD) (int), Location (string), Size (string), Color (string), Season (string), Review Rating (float64), Subscription Status (string), Shipping Type (string), Discount Applied (string), Promo Code Used (string), Previous Purchases (int), Payment Method (string), Frequency of Purchases (string)

Pre-Processing:

First, we handled missing and null values. Next, we verified that numerical columns like Age, Purchase Amount (USD), and Review Rating were already in appropriate numerical formats. We then checked and confirmed that categorical variables such as Location, Category, and Item Purchased were consistently formatted. Finally, we ensured there were no duplicates.