What is cross-sectional data?
Common types of cross-sectional data and vizzes
Analyzing cross-sectional data
Project 2 overview and example
Cross-sectional data is collected by observing a study population at a single point in time or for a period of time and aggregating information to a single observation per subject.
We call it cross-sectional data because we are observing information for a slice, snapshot, or cross-section of a group subjects.
This differs from time series data in that we only observe information at a single point in time.
Individual-level data
Business or point of interest-level data
Individual-level data
Business or point of interest-level data
Country-level data
Region-level data
Spatial data
Cross-sectional analysis typically involves:
One powerful method to uncover these groups is Cluster Analysis, which we introduce next.
Cluster analysis helps businesses, policymakers, and organizations group similar observations to better understand their characteristics, behaviors, and needs.
For example:
Usually, we repeat this whole process a number of times and choose the group assignment that minimizes the overall variance.
After clustering, summarize attributes of the clusters to understand the groups.
Two common methods for selecting the right number of clusters:
Suppose you manage convenience stores and want to segment your customers based on purchasing behavior:
Example clusters could reveal insights like:
Goal: Use sales data to divide your customers into groups to better tailor promotions for each customer segment.
After doing the cluster analysis, you identify three clusters:
Cluster | Avg. spending | Most frequent purchase | Avg. visits per month |
---|---|---|---|
1 | $4.37 | Cigars | 27 |
2 | $16.20 | Water | 6 |
3 | $4.25 | Carbonated Soft Drinks | 4 |
Insights:
In your analysis, you’ll:
Form groups of 2 (THIS WEEK)
We will provide convenience store data on shoppers, stores, and purchases
Steps of the project include: