What is cross-sectional data?
Common types of cross-sectional data and vizzes
Analyzing cross-sectional data
Project 2 overview and example
Cross-sectional data is collected by observing a study population at a single point in time or for a period of time and aggregating information to a single observation per subject.
We call it cross-sectional data because we are observing information for a slice, snapshot, or cross-section of a group subjects.
This differs from time series data in that we only observe information at a single point in time.
Individual-level data
Business or point of interest-level data
Individual-level data
Business or point of interest-level data
Country-level data
Individual-level data
Business or point of interest-level data
Country-level data
Region-level data
Spatial data
Cross-section data analysis typically involves studying:
similarities or differences among subjects
trends and associations in the population
Questions about frequency:
how common is the outcome?
how many people are impacted?
Questions about associations:
Questions about similarity or dissimilarity among units
Use sales data to divide your customers into groups to better target marketing
Choose the number of clusters (k) that you want to identify in the data.
Randomly initialize the k cluster centroids (points in space) within the data range.
Assign each data point to the nearest centroid, based on the Euclidean distance between the point and each centroid.
Calculate the mean (centroid) of each cluster based on the data points assigned to it.
Update the cluster centroids to be the means of the data points assigned to them.
Repeat steps 3-5 until convergence (when the cluster assignments no longer change or a maximum number of iterations is reached).
Usually, we repeat this whole process a number of times and choose the group assignment that minimizes the overall variance.
After clustering, summarize attributes of the clusters to understand the groups.
How might you use cross-sectional data to decide where to open a new grocery store?
Take a minute to write out what information you would need in a dataset to apply cluster analysis.
Form groups of 2 (THIS WEEK)
We will provide convenience store data on shoppers, stores, and purchases
Perform an exploratory data analysis and assemble summary statistics; analyze spatial patterns, correlations, etc.; and construct visualizations to answer your research question.
Conduct market segmentation analysis via clustering
Label clusters based on characteristics
Develop simple marketing strategy to target clusters
Research Objective:
Important background details:
Wildfire mitigation: community fuel treatments, grant programs to help offset costs, regulations and enforcement
Some communities already do wildfire risk mitigation. Does one size fit all, or are there certain strategies that work better in certain communities?
Collect data on community characteristics (socioeconomic and demographics from census) and wildfire risk
Use clustering to identify which communities are similar to each other
Name the clusters
Characterize the wildfire risk mitigation approaches that worked within clusters