Spatial data overview
Spatial joins
Project 3 overview
In unit 1, we learned about time series data.
In unit 2, we learned about cross-sectional data.
In unit 3, we will learn about panel data.
Panel data is a combination of time series and cross-sectional data, where we observe multiple units over multiple time periods.
Before we get to regression, we’ll take a short detour into spatial data.
Why?
Many panel datasets include geographic identifiers (like zip codes or counties).
To work with these, we need to know how to connect (i.e., join) different layers of spatial information (like store locations to zip codes).
This week’s tool: spatial joins.
To use spatial joins, we need to know what kind of spatial data we’re working with.
Data about a point or area defined in space.
Vector Data
Point (e.g., locations of stores, geotagged tweets)
Line Data (e.g., roads, rivers)
Polygon Data (e.g., boundaries of countries, lakes)
Raster Data
Private sources (e.g., Google Maps)
Crowdsourced data (e.g., PurpleAir)
Distance and proximity
Geographic Information Systems (GIS)
ESRI ArcGIS
QGIS (free GUI alternative)
R/Python libraries (scripted)
Spatial functions in databases (e.g., PostGIS)
Geocoding and reverse geocoding
Calculating distance between objects
Calculating area of polygons
Spatial joins
Geocoding: Converting an address to a set of coordinates
Reverse Geocoding: The process from coordinates to address
Spatial data references a location
Spatial data enables calculation and manipulation
Spatial information enables joining to other data (e.g., census)
Read and process spatial data in R
Join spatial data
Prepare data for mapping in Tableau
The goal of this project is for you to apply panel data analysis to answer a real world question.
This is also a chance to showcase your creativity and analytic skills, putting everything together that you have learned over the semester.
For this project, you will use the convenience store data (shopper_info, store_info, gtin) to select your \(y\), and choose one of the two additional datasets (census data or weather data) to choose your \(x\).
Option 1: Convenience store data and Demographic data from the US Census
Option 2: Convenience store data and Weather data from NOAA
Your question for project 3 should be in the form of: What is the association between \(x\) (an explanatory variable) and \(y\) (some outcome)?
Choose one of the two options.
Based on your question, determine the outcome (\(y\)) to evaluate from the convenience store data.
Identify the controls (e.g., weather, population density) that you want to include. This (or these) will serve as your treatment or exposure variable(s), \(x\).
You are responsible for merging or joining your datasets. This means that you must identify the unit of analysis in your datasets and aggregate data to the common unit of analysis.
Using Census
Using Weather