Project 2 – Problem Set 2: Cross Sectional Data Analysis
Problem Set Overview
This problem set helps you continue analyzing your data and prepares you for your final Project 2 deliverables by guiding you through exploratory visualizations, cluster analysis, and interpreting results.
Part 1: R
More Exploratory Data Analysis in R
Begin with your R script from Project 01 - Problem Set 2. This problem set builds on this. Note: Review any comments from the instructor and update your code accordingly.
Update your R script to reflect any additional cleaning and processing steps you performed (e.g., handling missing data, dropping observations, transformations, aggregations) since Project 02 - Problem set 01. Add comments throughout your code so that a reader would be able to reproduce your steps based solely on your descriptions.
Summarize the variables using
datasummary()
. Present a nicely formatted table and discuss your results.Generate a
ggpairs
plot (i.e., the density/scatter/correlation plot) of your clustering variables.Write a narrative that interprets the
ggpairs()
plot. Describe the distribution of each variable and the correlations between variables, or other notable features you observe and explain their relevance to your analysis.
K-Means Cluster Analysis in R
Using your merged and cleaned dataset, perform your final K-means cluster analysis. Follow these steps clearly:
Explain which variables you chose for clustering and why these dimensions help answer your research question. Clearly interpret your resulting clusters. What insights or patterns do the clusters reveal?
Provide evidence supporting your choice of K (number of clusters). Include either a silhouette plot or elbow plot and explain your decision.
Perform the k-means clustering. Then, export a dataset containing all variables used in your cluster analysis along with each observation’s assigned cluster.
Part 2: Tableau Visualizations
Create one visualization to illustrate findings from your exploratory data analysis. Integrate relevant summary statistics directly into your Tableau visualization using the Tooltip in the Marks Card.
Create one visualization that shows your clusters on a map. How do the clusters appear across the map you created? Do they appear grouped together or interspersed?
Create one visualization to show how your clusters compare across the variables you used to define them (create scatter plots). Use these visuals to support a description of each cluster.
Write a narrative discussing the practical takeaways for your research question based on your Tableau visualizations. Clearly describe what your analysis uncovered.
How to Submit
Create a new page on your Google Site titled Project 2 Problem Set 2
. Remember: Add this as a new page to your existing website. Do not create a new website.
This assignment has two components:
- R code
- Start with your R code from Problem Set 1.
- Make any changes you to your code from Problem Set 1 based on feedback you received from Problem Set 1
- Include comments throughout your code that describe each step
- Add the
ggpairs()
visualizations - Write your interpretation of the
ggpairs()
plot - Finalize your k-means cluster analysis and show supporting evidence
- Export your dataset
- Tableau visualizations
- EDA visualization
- Two cluster visualizations with accompanying descriptions
- Clear takeaways related to your research question
Submit your webpage link on Canvas.
Going going forward, please use the “Compile Report” process to submit your R code. We will no longer use “sink-source-sink”.
- Under the File menu, select Compile Report…
- The default under “Report output format” is HTML.
- Select Compile
- Along the tool bar, select Open in Browser
- Hover your cursor over white space and right click.
- Select View page source.
- A new screen will appear with HTML code. (Don’t worry, this might look like gibberish, but it’s not!)
- Using your keyboard, click
CTRL
+A
. This will highlight all of the HTML code. - While the HTML code is highlighted, right click and select Copy.
- Go to your Google website.
- Similar to as you would embed your Tableau visualization, select Embed under Insert.
- Then, select Embed code.
- Paste your embed code from your clipboard by either right-clicking and selecting Paste or
CTRL
+V
. - Select Next.
- Resize the image of your R script by expanding the window to fit the entire script.
Following these steps will display your entire code and output without truncation.