Problem Set 3: Data Processing with R

Link to the description of the supermarket sales data available in the Supermarket Sales Data Overview.

Instructions

Write an R script that completes the following tasks. Make sure you follow best practices, including commenting on your code.

Note on Log File Output:

When running your R script, you may notice that some lines in your log file appear with “[TRUNCATED]” comments. This happens when the output exceeds R’s default display limits.

To minimize truncation, add line breaks before the grey vertical line in the Source Pane to keep lines within the display limit. Additionally, avoid printing large datasets directly—use glimpse() or head() instead of printing entire data frames.

Response Submissions: For the responses to the questions that require explanations, please post your responses as text on your Google site. Be sure to clearly indicate which question each response refers to.

Tasks

  1. Read in the dataset supermarket_sales.csv using read_csv().
  • Question: What packages do you need to have installed before loading in the data using this command?
  • Write your response as text on your Google site, indicating which question you are answering.
  1. Calculate the total value of the sale using the unit_price and quantity columns.
  • Create a new column called subtotal.
  • Verify that the value labeled tax_5_percent is indeed 5% of the subtotal by creating a new variable called tax_verify.
  • Assign the object to a new dataframe.
  1. Filter the dataset to create a dataframe containing only the sales from the product line “Food and beverages”.

  2. Select relevant columns: Create a dataframe that contains only the columns city, product_line, unit_price, quantity, total, rating for sales in “Food and beverages”.

  3. Sort the dataframe by quantity in descending order.

  4. Calculate median sales by payment type using the appropriate commands.

  5. Suppose you are asked to develop a new performance indicator for the company. You decide that “rating per unit price” or “rup” would be a helpful metric.

  • Compute the rating per unit price for each transaction by creating a new column called rup.
  • Explain: What insights might this performance indicator provide to decision-makers in the company?
  • Write your response as text on the Google site, clearly indicating which question you are answering.
  1. Summarize rup and unit price by product line:
  • Compute the mean(rup) and mean(unit_price) grouped by product line.
  1. Print the summarized dataframe:
  • Use the print() function to display the results
  • Explain: What conclusions can you draw from your analysis?
  • Write your response as text on the Google site, clearly indicating which question you are answering.
  1. Include the following lines of code to ensure that you have loaded all the necessary packages for this assignment:
version
print(.packages())
  1. Save your R script as problem_set_3.R.

  2. Generate a log file to show that it ran successfully in R. Use the sink() command to capture both the code and its output. Run the script and store the log in problem_set_3.log. Below is a hint for structuring this:

sink("problem_set_3.log")  # Start capturing output
source("problem_set_3.R", echo = TRUE)  # Run your script and show commands + output
sink()  # Stop capturing output

Refer to Week 3 Lab Notes if you’re still having trouble.

How to Submit

  1. Create a Webpage on Your Google Site:
  • Title the new webpage Problem Set 3.
  • This page should contain an R script log file that responds to all questions of the problem set.
  1. Submit the Link to Your Google Site Page:
  • Copy the URL of your Google Site webpage for Problem Set 3.
  • Submit this link via Canvas.