Project 2 – Problem Set 1: Cross Sectional Data Analysis

Problem Set Overview

This problem set is designed to get you started analyzing the data for your project.

Exploratory Data Analysis (EDA)

Explore the data from lab, Arizona Grocery Store. The purpose of this step is to gain familiarity with your data and identify any issues (e.g., missing data, outliers). Complete the following steps with your data.

  1. Generate a table of summary statistics. The table should include at least: variable names, mean (average), standard deviation, min, and max.

  2. Perform any necessary processing (e.g., removing outliers). Write a narrative of your processing steps justifying your decisions.

  3. Write a narrative for your webpage describing your EDA and providing an interpretation of your data. Describe your data so that readers understand what is measured. See the Canvas assignment for a codebook describing the data. Here are some prompt questions for your narrative:

    • Which grocery store has the most foot traffic in AZ? Is it the same with the most unique visitors?

    • What is the most common store type (in your data) in AZ?

    • Which city has the most stores?

    • Which store do people travel the farthest to visit?

Cluster Analysis

  1. Use K-means clustering to cluster on multiple numeric dimensions in your data. Note that this does not mean run the clustering algorithm multiple times. Explain why you clustered on the dimensions you chose and what the clusters tell you. Try a few different cluster sizes and choose one that makes sense. Write a narrative in your webpage explaining your cluster analysis and what you learned.

How to Submit

You should create a new webpage on your google site titled Project 2 Problem Set 1. Submit the link to your google site webpage in Canvas.