Project 2 – Presentation and Report: Bringing it all together
Overview
The goal of this project is for you to apply the tools for cross sectional data analysis that we have covered in the course to answer a real world question. This is also a chance to showcase your creativity and analytical skills.
The Question
Your question should focus on identifying distinct segments or clusters of customers, stores, or products based on the transaction-level data from convenience stores across the U.S. during July 2023. Some potential questions could be:
Customer Segmentation
What are the distinct segments of shoppers based on their purchasing behavior (products bought, spending patterns, store preferences)?
How can we segment shoppers based on their demographic characteristics and shopping habits?
Store Segmentation
What are the distinct clusters of stores based on their product mix, sales performance, or customer base?
How can we group similar stores together based on their location, size, or other characteristics?
Product Segmentation
What are the distinct clusters of products based on their sales performance, customer preferences, or complementary relationships?
How can we group similar products together based on their categories, pricing, or other attributes?
The key is to formulate a question that can be answered by performing cluster analysis on the provided dataset, which includes information about shoppers, transactions, products, and stores.
The Data
You will be using the following dataset for your analysis:
shopper_info.csv: This file contains transaction-level data, including information about the shopper, the store where the purchase was made, the products purchased (identified by GTIN), and the quantities and prices.
gtin.csv: This file provides additional details about the products, which can be linked to the shopper_info.csv file using the GTIN (Global Trade Item Number) variable.
store_info.csv: This file includes information about the stores, such as location, size, and other characteristics, which can be linked to the shopper_info.csv file using the store_id variable.
Your analysis should leverage the provided data files to perform cluster analysis and identify distinct segments or clusters based on your chosen research question. You may need to combine or transform the data as necessary to suit your analysis.
The Analysis
Our expectation is that you will use cross sectional data from any of the sources provided on the course website to answer your proposed question. The only strict requirements are:
- Present summary statistics of your data. We recommend including: a visual analysis including a regression line, a map that highlights geographical differences in your important variables, and the mean, standard deviation, and range of values for important variables.
- Perform a cluster analysis.
Explain your analysis, including justifying your variable choice, explaining why cluster analysis is the appropriate tool, clearly explaining how the analysis should help to answer your research question.
Discuss any assumptions you made in your analysis and identify limitations (e.g., do your data cover the entire population, or is it a sample?)
- Present and interpret your results. Use your analysis to clearly answer your research question and tell the audience what to do with the information provided. Discuss the meaning of your different clusters. Use appropriate visual tools to accomplish this.
The Presentation (80 points)
Overall presentation expectations: (20 points total)
Your presentations should be 6-9 minutes long. (5 points)
Your presentation should be engaging, follow a logical structure, and should tell a cohesive story that answers your stated question. (5 points)
Your visualizations should be well formatted and appropriate for your analysis/presentation (10 points)
Presentation Introduction, Background, and Question: (15 points total)
You should begin your presentation with the following: (5 points)
Who are you? (what business, organization, etc. wants an answer to your question)
Who is your audience? (business executive team, policy makers, consumers?)
You should then:
Give us any relevant background info that is necessary to understanding your question (5 points)
State your question (5 points - the question should be realistic (i.e., something a real world company would like an answer to), clearly stated, and answerable with a cross sectional analysis)
Presentation Data and Analysis: (35 points total)
Data:
Describe where your data is from (5 points)
Provide summary statistics and visualizations (10 points, see step 1 under
Analysis
above)
Analysis:
What model did you use? (5 points)
What assumptions did you make in your analysis? What are limitations to your analysis? (5 points, see step 3 under
Analysis
above)Plot AND describe your analysis. (10 points)
Presentation Discussion/Conclusion: (10 points total)
How did you use your analysis to answer your question? What is the answer to your question?
What are the key take-aways for your intended audience?
The Write-up/Online Submission (40 points)
You and your partner should also produce a written version of this analysis to post on your website. This should include all of the above components that went into your presentation (think of this as the report that you give to your audience to accompany your presentation) in addition to your R code. The written portion of the assignment is due later so that you can incorporate feedback that you receive during the oral presentation in class.
Because much of the presentation grade is tied to the content of your report, grading for the report will be based on:
Is your code correct and well-explained (use comments to explain what you do at each step!)? Produce an R log.txt file using the
sink()
command and include it as an appendix to your write up. (15 points)Is your report readable and useful? We will not assess the quality of your research question, but we will assess whether your report is something you could hand out to your intended audience and have them understand what you did, why you did it, and what they should do with the information. (10 points)
Does your report adequately address the feedback provided after your presentation? We will assess whether and to what extent you incorporated our suggestions. (10 points)
Organization, formatting and structure. Is your report organized, presented in a logical order, with appropriate integration of writing, analysis, and code? (5 points)
Peer Evaluation (15 points)
Each group will be required to provide feedback for three other groups in the class. This feedback process is an essential component of the project, as it encourages constructive criticism and fosters a collaborative learning environment. To facilitate this, each group will complete a Google Form designed specifically for peer evaluations.
Instructions:
Complete the Google Form: You must answer every question on the Google Form. The form is structured to cover various aspects of the presentations you will evaluate.
Fair Scoring: You cannot assign full scores in every category for each group. Be thoughtful and provide a balanced evaluation that reflects the strengths and areas for improvement in each presentation.
Be Respectful: It’s important to be respectful and constructive in your comments.
Impact on Grades: The peer feedback you provide will not be factored into the group’s final grade on the presentation. The purpose of this exercise is to give you an opportunity to critically engage with your peers’ work and to offer them valuable insights from an outsider’s perspective.
A link to the Google Form is also available on Canvas and must be completed by the due date. Please ensure that you dedicate enough time to watch each assigned presentation thoroughly before completing the evaluation form.
Your participation in the peer evaluation process is highly valued and contributes significantly to the learning experience in this course.