Spatial data overview
Spatial joins
Project 3 overview
Data about a point or area defined in space
Vector Data
Point (e.g., locations of stores, geotagged tweets)
Line Data (e.g., roads, rivers)
Polygon Data (e.g., boundaries of countries, lakes)
Raster Data
Private sources (e.g., Google Maps)
Crowdsourced data (e.g., PurpleAir)
Distance and proximity
Geographic Information Systems (GIS)
ESRI ArcGIS
QGIS (free GUI alternative)
R/Python libraries (scripted)
Spatial functions in databases (e.g., PostGIS)
Geocoding and reverse geocoding
Calculating distance between objects
Calculating area of polygons
Spatial joins
Geocoding: Converting an address to a set of coordinates
Reverse Geocoding: The process from coordinates to address
Spatial data references a location
Spatial data enables calculation and manipulation
Spatial information enables joining to other data (e.g., census)
The goal of this project is for you to apply panel data analysis to answer a real world question. This is also a chance to showcase your creativity and analytic skills, putting everything together that you have learned over the semester
Your question for project 3 should be in the form of: What is the association between \(x\) (an explanatory variable) and \(y\) (some outcome)?
For this project, you will use the convenience store data (shopper_info, store_info, gtin) to select your \(y\), and choose one of the two additional datasets (census data or weather data) to choose your \(x\).
Option 1: Convenience store data and Demographic data from the US Census
Option 2: Convenience store data and Weather data from NOAA
You are posing some form of the question: What is the impact of \(x\) (treatment or exposure) on some outcome \(y\)?
Choose one of the two options.
Find an outcome to evaluate. In other words, you need to identify an outcome \(y\) from the convenience store data.
Identify any controls that you may need (e.g., weather, population density) – this (or these) will serve as your treatment or exposure variable(s), \(x\).
You are responsible for merging or joining your datasets. This means that you must identify the common unit of analysis in each of your datasets and aggregate data if necessary.