Week 13:
Spatial Data and
Project 3 Introduction

Agenda

Spatial data overview

Spatial joins

Project 3 overview

What is spatial data?

Data about a point or area defined in space

Vector Data

  • Point (e.g., locations of stores, geotagged tweets)

  • Line Data (e.g., roads, rivers)

  • Polygon Data (e.g., boundaries of countries, lakes)

Raster Data

  • Grid/raster data is reported in a uniformly sized grid over some area (e.g., satellite images)

Common Sources of Spatial Data

Understanding Spatial Relationships

Concepts of Spatial Relationships

  • Distance and proximity

    • Spherical distance
  • Topology (e.g., adjacency, containment)
  • Accessibility and connectivity

Tools for Analyzing Spatial Relationships

  • Geographic Information Systems (GIS)

    • ESRI ArcGIS

    • QGIS (free GUI alternative)

    • R/Python libraries (scripted)

  • Spatial functions in databases (e.g., PostGIS)

Working with Spatial Data

Data Collection Techniques

  • GPS data collection
  • Remote sensing and aerial photography

Data Processing and Cleaning

  • Geocoding and reverse geocoding

  • Calculating distance between objects

  • Calculating area of polygons

  • Spatial joins

Geocoding and Reverse Geocoding

  • Geocoding: Converting an address to a set of coordinates

  • Reverse Geocoding: The process from coordinates to address

Spatial Joins

  • Joining data based on shared location
  • Point to point
  • Points/lines to polygons
  • Polygon to polygon (intersecting polygons)

Conclusion

  • Spatial data references a location

  • Spatial data enables calculation and manipulation

  • Spatial information enables joining to other data (e.g., census)

Project 3

The Question

The goal of this project is for you to apply panel data analysis to answer a real world question. This is also a chance to showcase your creativity and analytic skills, putting everything together that you have learned over the semester

Your question for project 3 should be in the form of: What is the association between \(x\) (an explanatory variable) and \(y\) (some outcome)?

The Data

For this project, you will use the convenience store data (shopper_info, store_info, gtin) to select your \(y\), and choose one of the two additional datasets (census data or weather data) to choose your \(x\).

  • Option 1: Convenience store data and Demographic data from the US Census

  • Option 2: Convenience store data and Weather data from NOAA

Bringing It Together

You are posing some form of the question: What is the impact of \(x\) (treatment or exposure) on some outcome \(y\)?

  1. Choose one of the two options.

  2. Find an outcome to evaluate. In other words, you need to identify an outcome \(y\) from the convenience store data.

  3. Identify any controls that you may need (e.g., weather, population density) – this (or these) will serve as your treatment or exposure variable(s), \(x\).

  • Including control variables in the regression explains a portion of the variation in your data leaving your variable of interest to explain the remaining variation.

You are responsible for merging or joining your datasets. This means that you must identify the common unit of analysis in each of your datasets and aggregate data if necessary.