Week 4 Lab: Mini Project

This Lab Contributes to Course Objectives: 2, 3, 4, 5, 8

Learning Objectives R

  • Read data from a URL (web address)

  • Append data to create a single dataset

  • Use data manipulation tools to clean data

Learning Objectives Tableau

  • Combine sheets into a dashboard

  • Add interactivity to your dashboard

  • Format your dashboard

Project Overview

The objective of the project is to integrate the R and Tableau skills that you have learned so far in the course, conduct an analysis, and present your findings. We have covered most of the skills you will need to conduct your analysis. This lab will cover a few tasks that will aid you in developing an entire analytics workflow. By workflow, we mean a sequence of steps beginning with an R script and ending with a set of steps in Tableau to create visuals.

Link to Project 00 (Problem Set 4) Mini Project

R

Getting Started

Download the data:

You now have 3 years worth of data (2017 - 2019). The data are hosted on the course materials webpage here: https://csu-arec-330.github.io/course-materials/.

Import the data:

Each dataset has the same structure, and you will need to combine them into a single dataframe for analysis.

Inspect the data:

Before proceeding, check for any inconsistencies or missing values that need to be addressed. Once cleaned, you will export the data for use in Tableau.

Set up your R script

Create a new R script and include the following:

  • Write a brief description of the script’s purpose.

  • Set your working directory.

  • Load required packages (e.g., readr, dplyr, etc.)

  • Load a new package called tidyr which contains the replace_na() function we will use.

Reading data from a URL

Previously, we downloaded data before reading it into R or Tableau. Understanding how to manage local file paths is important, but R can also read data directly from the web. This is useful for automating updates when datasets change frequently online.

Use the read_csv() function from the readr package to read in the datasets:

super_2017 <- read_csv("https://csu-arec-330.github.io/materials/unit_00/inputs/supermarket_sales_2017.csv")
super_2018 <- read_csv("https://csu-arec-330.github.io/materials/unit_00/inputs/supermarket_sales_2018.csv")
super_2019 <- read_csv("https://csu-arec-330.github.io/materials/unit_00/inputs/supermarket_sales_2019.csv")

This method saves time when working with regularly updated datasets hosted online.

Data processing

Now you have three dataframes (super_2017, super_2018, super_2019) with the same columns, you need to:

  1. Append the datasets: Use bind_rows() to combine all three years into a single dataframe.
super_sales_all <- bind_rows(super_2017, super_2018, super_2019)
  1. Check for missing data: Use the dplyr functions (mutate() and replace_na()) to fill missing values.

Your Task: Find which variables contain NA values and replace them the the appropriate default value. Note the data type—replace strings with a string and numeric values with a numeric value.

# Use mutate_all() to apply replace_na() to all columns. Automatically fills numeric columns with 0 and character columns with "Unknown". 
super_sales_clean <- super_sales_all %>%
  mutate_all(~replace_na(.x, ifelse(is.numeric(.x), 0, "Unknown")))

# Check for missing values after filling
colSums(is.na(super_sales_clean))
Note

A default value is a value that is automatically assigned to a variable when no other value is provided. In the context of replace_na(column_name, "Default Value"), it means that any missing (NA) values in column_name will be replaced with "Default Value" instead of remaining empty or undefined.

Exporting dataframes

Once your data is cleaned, export it in a format that can be read into Tableau. The readr package contains write_*() functions for exporting data, which work similarly to read_*() functions.

write_csv(super_sales_clean,"super_sales_clean.csv")

Important:

  • Specify the dataframe you want to export (e.g., super_sales_clean).

  • Choose an appropriate filename (e.g., "super_sales_clean.csv"). Check that your punctuation matches the example code!

  • If needed, specify a different directory for the output file.

Read the documentation for write_csv() to understand additional options for exporting data efficiently.

Tableau

So far, we have focused on making single graphics. One of Tableau’’’s strengths is the ability to create interactive dashboards that connect multiple visual elements.

Part 1: Create your visualizations

Follow the steps from previous labs to connect to the dataset and create two graphics:

  1. A line graph plotting total sales by date

  2. A bar chart plotting the count of transactions by payment type

Part 2: Build your dashboard

Now, bring these visualizations into a dashboard by following these steps:

  1. Click the ‘New Dashboard’ icon at the bottom of Tableau Desktop.

  2. Drag each of your visualizations (one at a time) to the dashboard window.

  3. Experiment with moving these visualizations around, resizing them, and editing their titles.

Part 3: Add Interactivity

  1. Filter by payment type:
  • Select the bar chart.

  • Click ‘More Options’ (the small down arrow on the toolbar next to the chart).

  • Select ‘Use as filter’.

Question: What happens when you click different payment types?

  1. Filter by city (applied to both visualizations)
  • Open the line graph worksheet.

  • Drag ‘City’ to the Filters card and select all cities.

  • Return to your dashboard.

  • Select the line chart, then click ‘More Options’ \(\rightarrow\) ‘Filter’ \(\rightarrow\) ‘City’

  • Change the city filter and observe the changes.

Question: What does this filter control?

  • Select the City filter pane.

  • Click ‘More Options’ \(\rightarrow\) ‘Apply to Worksheets’ \(\rightarrow\) ‘Selected Worksheets’ (this dropdown appears next to the city filter).

Question: What effect does this have on your dashboard?

Part 4: Customize Your Dashboard

  1. Add text: Try adding a text box. Can you reposition it on top of your figures?

  2. Insert an image: Add an image from the web and adjust its placement.

Take time to explore different formatting options to improve the clarity and appearance of your dashboard.