versionprint(.packages())
Problem Set 3: Data Processing with R
Part 1
Read in the dataset supermarket_sales.csv
.
Calculate the total value of the sale using the
unit_price
andquantity
columns. Name the new columnsubtotal
. Then verify that the value labeledtax_5_percent
is indeed 5% of thesubtotal
by creating a new variable calledtax_verify
. Assign the object to a new dataframe.Create a dataframe containing only the subset of sales from the product line
Food and beverages
.Create a dataframe containing only the columns
city, product_line, unit_price, quantity, total, rating
where the product line isFood and beverages
.Sort the dataframe by
quantity
in descending order.Generate a log file from your script. See the section in the lab file for reference.
Part 2
Use
dplyr
commands to calculate the median sales by payment typeYou are asked to develop a new performance indicator for the company. You wonder if the transaction
rating
perunit price
might provide insights into consumer preferences for different product lines. Calculate the rating per unit price for each transaction and call this new variablerup
. Explain what this performance indicator might tell decision-makers at the company.Then calculate the mean
rup
andunit price
by product line across the dataset. Print the contents of this dataframe into the console using theprint()
function. What conclusions do you draw from your analysis? Explain.
Use the sink command to generate a text file that indicates your script ran in R.
How to Submit
You should create a new webpage on your Google site titled Problem Set 3
. This webpage should include your responses to Part 2 of the problem set.
Parts 1 & 2 should contain an R script with the code used to answer the questions. Generate a log file and submit that to Canvas. Please include the following lines in your R script:
Submit the link to your Google site webpage in Canvas.