Project 2 – Presentation and Report: Bringing it all together
Overview
In this project, you will apply cross-sectional data analysis techniques from class to answer a practical business question. You will use cluster analysis to identify meaningful segments or clusters of customers, stores, or products based on July 2023 transaction-level convenience store data. The clusters you identify in this project will directly support your panel regression analysis in Project 3.
The Question
Select one of the following segments for your cluster analysis project and formulate a clear, focused research question. I have provided examples below:
Customer Segmentation:
- What distinct shopper segments exist based on purchasing patterns, spending habits, or demographics?
Store Segmentation:
- How can stores be grouped based on product assortment, sales performance, customer base, location, or size?
Product Segmentation:
- How can products be grouped based on sales trends, customer preferences, pricing, or complementary purchasing patterns?
Your question should clearly specify who will benefit from the answer (e.g., business executives, marketing teams, or store managers).
The Data
You will use three datasets:
shopper_info.csv: This file contains transaction-level data, including information about the shopper, the store where the purchase was made, the products purchased (identified by GTIN), and the quantities and prices.
gtin.csv: This file provides additional details about the products, which can be linked to the shopper_info.csv file using the GTIN (Global Trade Item Number) variable.
store_info.csv: This file includes information about the stores, such as location, size, and other characteristics, which can be linked to the shopper_info.csv file using the store_id variable.
Combine and transform these datasets as necessary to perform your cluster analysis.
Analysis Requirements
Your analysis must include the following:
- Summary Statistics: Clearly present summary statistics for your data, including visualizations such as maps or charts illustrating geographic or distributional differences. Include means, standard deviations, and ranges. 
- Cluster Analysis: Perform cluster analysis (e.g., K-means clustering) using your chosen variables. Clearly justify the number of clusters and the variables used. 
- Interpretation and Justification: Explain why you selected your variables and why cluster analysis was the most suitable method to answer your question. 
- Assumptions and Limitations: Discuss assumptions inherent in your analysis and highlight any limitations of your dataset or approach. 
- Results Interpretation: Clearly interpret the clusters you’ve identified, highlighting key characteristics and practical recommendations for your audience. Use appropriate visualizations to effectively communicate your findings. 
The Presentation (80 points)
Overall presentation expectations: (20 points total)
Your presentations should be 6-9 minutes long. (5 points)
Your presentation should be engaging, follow a logical structure, and should tell a cohesive story that answers your stated question. (5 points)
Your visualizations should be well formatted and appropriate for your analysis/presentation (10 points)
Presentation Introduction, Background, and Question: (15 points total)
You should begin your presentation with the following: (5 points)
- Who are you? (what business, organization, etc. wants an answer to your question) 
- Who is your audience? (business executive team, policy makers, consumers?) 
You should then:
- Give us any relevant background info that is necessary to understanding your question (5 points) 
- State your question (5 points - the question should be realistic (i.e., something a real world company would like an answer to), clearly stated, and answerable with a cross sectional analysis) 
Presentation Data and Analysis: (35 points total)
Data:
- Describe where your data is from (5 points) 
- Provide summary statistics and visualizations (10 points, see step 1 under - Analysisabove)
Analysis:
- What model did you use? (5 points) 
- What assumptions did you make in your analysis? What are limitations to your analysis? (5 points, see step 3 under - Analysisabove)
- Plot AND describe your analysis. (10 points) 
Presentation Discussion/Conclusion: (10 points total)
- How did you use your analysis to answer your question? What is the answer to your question? 
- What are the key take-aways for your intended audience? 
The Write-up/Online Submission (40 points)
You and your partner should also produce a written version of this analysis to post on your website. This should include all of the above components that went into your presentation (think of this as the report that you give to your audience to accompany your presentation) in addition to your R code. The written portion of the assignment is due later so that you can incorporate feedback that you receive during the oral presentation in class.
Because much of the presentation grade is tied to the content of your report, grading for the report will be based on:
- Is your code correct and well-explained (use comments to explain what you do at each step!)? Produce an R log.txt file using the - sink()command, OR by compiling a report, and include it as an appendix to your write up. (15 points)
- Is your report readable and useful? We will not assess the quality of your research question, but we will assess whether your report is something you could hand out to your intended audience and have them understand what you did, why you did it, and what they should do with the information. (10 points) 
- Does your report adequately address the feedback provided after your presentation? We will assess whether and to what extent you incorporated our suggestions. (10 points) 
- Organization, formatting and structure. Is your report organized, presented in a logical order, with appropriate integration of writing, analysis, and code? (5 points) 
Peer Evaluation (15 points)
Each group will be required to provide feedback for three other groups in the class. This feedback process is an essential component of the project, as it encourages constructive criticism and fosters a collaborative learning environment. To facilitate this, each group will complete a Google Form designed specifically for peer evaluations.
Instructions:
- Complete the Google Form: You must answer every question on the Google Form. The form is structured to cover various aspects of the presentations you will evaluate. 
- Fair Scoring: You cannot assign full scores in every category for each group. Be thoughtful and provide a balanced evaluation that reflects the strengths and areas for improvement in each presentation. 
- Be Respectful: It’s important to be respectful and constructive in your comments. 
- Impact on Grades: The peer feedback you provide will not be factored into the group’s final grade on the presentation. The purpose of this exercise is to give you an opportunity to critically engage with your peers’ work and to offer them valuable insights from an outsider’s perspective. 
A link to the Google Form is also available on Canvas and must be completed by the due date. Please ensure that you dedicate enough time to watch each assigned presentation thoroughly before completing the evaluation form.
Your participation in the peer evaluation process is highly valued and contributes significantly to the learning experience in this course.