<- read_csv("https://csu-arec-330.github.io/materials/unit_00/inputs/supermarket_sales_2017.csv")
super_2017 <- read_csv("https://csu-arec-330.github.io/materials/unit_00/inputs/supermarket_sales_2018.csv")
super_2018 <- read_csv("https://csu-arec-330.github.io/materials/unit_00/inputs/supermarket_sales_2019.csv") super_2019
Week 4 Lab: Mini Project
This Lab Contributes to Course Objectives: 2, 3, 4, 5, 8
Learning Objectives R
Read data from a URL (web address)
Append data to create a single dataset
Use data manipulation tools to clean data
Learning Objectives Tableau
Combine sheets into a dashboard
Add interactivity to your dashboard
Format your dashboard
Project Overview
The objective of the project is to integrate the R and Tableau skills that you have learned so far in the course, conduct an analysis, and present your findings. We have covered most of the skills you will need to conduct your analysis. This lab will cover a few tasks that will aid you in developing an entire analytics workflow. By workflow, we mean a sequence of steps beginning with an R script and ending with a set of steps in Tableau to create visuals.
R
Getting Started
Download the data:
You now have 3 years worth of data (2017 - 2019). The data are hosted on the course materials webpage here: https://csu-arec-330.github.io/course-materials/.
Import the data:
Each dataset has the same structure, and you will need to combine them into a single dataframe for analysis.
Inspect the data:
Before proceeding, check for any inconsistencies or missing values that need to be addressed. Once cleaned, you will export the data for use in Tableau.
Set up your R script
Create a new R script and include the following:
Write a brief description of the script’s purpose.
Set your working directory.
Load required packages (e.g.,
readr
,dplyr
, etc.)Load a new package called
tidyr
which contains thereplace_na()
function we will use.
Reading data from a URL
Previously, we downloaded data before reading it into R or Tableau. Understanding how to manage local file paths is important, but R can also read data directly from the web. This is useful for automating updates when datasets change frequently online.
Use the read_csv()
function from the readr
package to read in the datasets:
This method saves time when working with regularly updated datasets hosted online.
Data processing
Now you have three dataframes (super_2017
, super_2018
, super_2019
) with the same columns, you need to:
- Append the datasets: Use
bind_rows()
to combine all three years into a single dataframe.
<- bind_rows(super_2017, super_2018, super_2019) super_sales_all
- Check for missing data: Use the
dplyr
functions (mutate()
andreplace_na()
) to fill missing values.
Your Task: Find which variables contain NA
values and replace them the the appropriate default value. Note the data type—replace strings with a string and numeric values with a numeric value.
# Use mutate_all() to apply replace_na() to all columns. Automatically fills numeric columns with 0 and character columns with "Unknown".
<- super_sales_all %>%
super_sales_clean mutate_all(~replace_na(.x, ifelse(is.numeric(.x), 0, "Unknown")))
# Check for missing values after filling
colSums(is.na(super_sales_clean))
A default value is a value that is automatically assigned to a variable when no other value is provided. In the context of replace_na(column_name, "Default Value")
, it means that any missing (NA
) values in column_name will be replaced with "Default Value"
instead of remaining empty or undefined.
Exporting dataframes
Once your data is cleaned, export it in a format that can be read into Tableau. The readr
package contains write_*()
functions for exporting data, which work similarly to read_*()
functions.
write_csv(super_sales_clean,"super_sales_clean.csv")
Important:
Specify the dataframe you want to export (e.g.,
super_sales_clean
).Choose an appropriate filename (e.g.,
"super_sales_clean.csv"
). Check that your punctuation matches the example code!If needed, specify a different directory for the output file.
Read the documentation for write_csv()
to understand additional options for exporting data efficiently.
Tableau
So far, we have focused on making single graphics. One of Tableau’’’s strengths is the ability to create interactive dashboards that connect multiple visual elements.
Part 1: Create your visualizations
Follow the steps from previous labs to connect to the dataset and create two graphics:
A line graph plotting total sales by date
A bar chart plotting the count of transactions by payment type
Part 2: Build your dashboard
Now, bring these visualizations into a dashboard by following these steps:
Click the ‘New Dashboard’ icon at the bottom of Tableau Desktop.
Drag each of your visualizations (one at a time) to the dashboard window.
Experiment with moving these visualizations around, resizing them, and editing their titles.
Part 3: Add Interactivity
- Filter by payment type:
Select the bar chart.
Click ‘More Options’ (the small down arrow on the toolbar next to the chart).
Select ‘Use as filter’.
Question: What happens when you click different payment types?
- Filter by city (applied to both visualizations)
Open the line graph worksheet.
Drag ‘City’ to the Filters card and select all cities.
Return to your dashboard.
Select the line chart, then click ‘More Options’ \(\rightarrow\) ‘Filter’ \(\rightarrow\) ‘City’
Change the city filter and observe the changes.
Question: What does this filter control?
Select the City filter pane.
Click ‘More Options’ \(\rightarrow\) ‘Apply to Worksheets’ \(\rightarrow\) ‘Selected Worksheets’ (this dropdown appears next to the city filter).
Question: What effect does this have on your dashboard?
Part 4: Customize Your Dashboard
Add text: Try adding a text box. Can you reposition it on top of your figures?
Insert an image: Add an image from the web and adjust its placement.
Take time to explore different formatting options to improve the clarity and appearance of your dashboard.