Week 1 Lab: Introduction to R and Tableau

Learning Objectives

  • This lab contributes to Course Objectives 2, 3, & 4

  • Open R/Rstudio and ensure it is functioning properly

  • Get oriented with RStudio

  • Conduct basic calculations in R

  • Navigate directories from within R

  • Read in an external data set in R

  • Load a package in R

  • Create your google site and a project page

  • Apply for Teableau academic license

  • Download Tableau desktop and ensure working correctly

  • Register for Tableau public

Introduction to R

Welcome to the world of R programming! R is a powerful and widely used programming language in the field of data analysis and statistical computing. It is known for its vast collection of libraries and packages that allow users to easily perform complex data manipulation and analysis tasks.

Before we get started, you will need to install R on your computer. You can download the latest version of R for free from the official website https://cran.r-project.org/. Once the installation is complete, you will also need to install RStudio, a popular integrated development environment (IDE) for R. You can download RStudio from the official website https://rstudio.com/.

RStudio provides a user-friendly interface for working with R and offers a variety of features to make working with R easier and more efficient. Open up RStudio and you should see something that looks like this:

The RStudio interface is divided into several panels:

  • The Console panel is where you can enter and run R commands.

  • The Source panel is where you can open and edit R scripts, which are files that contain R code.

  • The Environment and History panels display information about the current environment and command history.

  • The Output panel includes several tabs (Files, Plots, Packages, and Help) that provide access to various resources and tools.

Now that you have both R and RStudio installed and running, let’s dive into some basic concepts of the R programming language.

Open a new R script

Because R is a command-line application, we can write a script (i.e., text file) with a sequence of commands. There are many benefits to scripting:

  1. You can save a record of the commands if you need to reproduce the analysis.
  2. You can share it with others so they can reproduce it.
  3. You can easily change a part of your script to see what happens.

Locating your working directory

The working directory is the folder on your computer where R looks for files to read and where it saves any files you create. It’s like the “home base” for your R session. Knowing your working directory is important because:

  • If you want to read a data file (like a .csv file), R will look for it in the working directory unless you provide a full path.
  • When you save files, such as plots or datasets, R will save them in the working directory by default.

Think of it as the default folder R uses to organize your work during a session.

You can check your current working directory by using the getwd() function:

getwd()

This will display the path of the current working directory in the console.

To set your working directory to a specific folder, use the setwd() function and provide the path to the folder. For example:

setwd("C:/Users/YourName/Documents/MyRProjects")

After running this command, R will use the specified folder as the working directory for the rest of the session.

Tips for Managing Your Working Directory

  1. In RStudio, you can also set the working directory interactively by going to Session > Set Working Directory > Choose Directory… and selecting the folder.

  2. You can see the contents of your working directory in the Files panel in RStudio.

  3. For reproducibility, it’s a good idea to include the setwd() command in your R script so that others can run your code without confusion about file locations.

Think of the working directory as a specific “drawer” in your desk where you store tools. When you start working, you know where to find your materials and where to put things back.

Basics

R is an interpreted language, which means that you can enter commands directly into the console and see the results immediately. You can start the R console by opening RStudio and clicking on the “Console” tab.

To assign a value to a variable in R, you can use the assignment operator <- or = like this:

x <- 5

x = 5 #Does the same thing

Here, we have created a variable called x and assigned it the value of 5. Anything following the hashtag on the same line is interpreted as a comment and not executed. Once you assign a variable a value, you can see it in your environment panel. You can use the print() function to display the value of a variable:

print(x)
[1] 5

You can also perform basic arithmetic operations in R using the standard operators (+, -, *, /):

y <- 3
z <- x + y
print(z)
[1] 8

First, we assign the variable y the value 3. Then, we assign z the value of x (assigned 5 earlier) plus y (just assigned 3), which sum to 8.

Comprehension Check

Try assigning the variables a and b different values and dividing them.

R also has a wide range of data types, including: numerical, character, and logical. Variables can be different datatypes. Collections of variables are called a vector - a one-dimensional array of data. You can create a vector using the “c()” function, which stands for concatenate. A vector may only contain one type of data.

numbers <- c(1, 2, 3, 4, 5)
characters <- c("a", "b", "c")
logical <- c(TRUE, FALSE, TRUE)

Indexing

A single entry in a vector is called an element. You can access elements of a vector using the square bracket notation:

print(numbers[1])
[1] 1
Comprehension Check

Try accessing a different element of the vector numbers or one of the other vectors.

You can also access series of elements in the vector. To access a continuous range of values use the colon :

print(characters[2:3])
[1] "b" "c"

You can even use a vector of index values to call discontinuous subsets of the vector.

print(characters[c(1,3)])
[1] "a" "c"

Depending on the datatype, there are functions that operate on the vector. For example, you can sum the elements of the numeric vector numbers.

print(sum(numbers))
[1] 15
Comprehension Check

Can you sum the numbers of the vector characters? How about logical?

Challenge

Can you figure out how to calculate the product of the vector numbers?

You can also add, subtract, multiply, and divide all of the numbers in a vector

print(numbers+5)
[1]  6  7  8  9 10

Or add, subtract, multiply, and divide numbers of equal-length vectors

print(numbers + numbers*2)
[1]  3  6  9 12 15

data.frames

Using R like a fancy calculator is intended to help you understand how R works. R is generally used to analyze data sets. Data sets are collections of equal-length vectors in a table or what R calls a data.frame. Here is a simple example:

print(data.frame(a=c(1:3),
                 b=c(4:6),
                 d=c(7:9)))
  a b d
1 1 4 7
2 2 5 8
3 3 6 9

Where a,b,d are the column or variable names and the 1,2,3 on the far left are row numbers.

You can build a data.frame using the vectors above and assign the object a name.

df <- data.frame(numbers=numbers[1:3],
                 characters,
                 logical)

print(df)
  numbers characters logical
1       1          a    TRUE
2       2          b   FALSE
3       3          c    TRUE
Comprehension Check

Why did I subset the vector numbers?

Reading data

Creating data.frames by hand is tedious. Fortunately, R has some utilities to read data stored in certain files types. Before we explore R’s ability to read data, we must understand how R interacts with the computer’s file system.

Recall

At any point in time, you are in an active directory (similar to windows explorer or macos finder). You can ask R to tell you the current working directory (where R is looking at the moment).

getwd()
[1] "C:/Users/lachenar/OneDrive - Colostate/Documents/GitProjectsWithR/csu-arec-330.github.io/materials/unit_00/week_01"

The response to this command will depend on the directory structure on your machine.

You can navigate your directory structure using the command setwd(). When you have multiple projects with multiple files on your machine, it will be important that you understand R’s working directory and how to change it. The syntax will depend on whether you are on a windows or mac. First, locate the data file that you downloaded for this exercise. You will treat this like the project directory and instruct R to navigate your directory structure by typing the path to the directory in the setwd("directory/subdirectory1") command. The path needs to be in quotes to tell R that you are inputing characters and not variables.

This video describes a computer’s file system and organization: https://www.youtube.com/watch?v=hUW5MEKDtMM

To read data into R, we will use another function built into R. We will be reading a file type called comma-separated values denoted by the file extension .csv. The commas separate the variables, so R knows where a new variable starts. We will read the file into a data.frame called super_sales. You can download the file here.

super_sales <- read.csv("../inputs/supermarket_sales.csv")

Naming Objects

You can give objects almost any name as long as it does not start with a number and it cannot contain a space. However, you want to choose descriptive names so that your future self and others will find it easier to read. I prefer what is called snake_case where words are separated by an underscore and all lower case. There are other options and it is a matter of opinion. I suggest choosing one and sticking with it.

Packages or Libraries

Finally, R has a vast collection of libraries or packages that provide a wide range of functions for data manipulation and analysis. Since R is open source, anyone can write libraries and share them with the world.1 You can install and load a package using the “install.packages()” and “library()” functions:

#Run only once 
#install.packages("dplyr")
library(dplyr)
Warning: package 'dplyr' was built under R version 4.4.1

Attaching package: 'dplyr'
The following objects are masked from 'package:stats':

    filter, lag
The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

Here, we have installed and loaded the dplyr package, which provides a wide range of functions for data manipulation. Note that you only need to install packages once, but you need to load them using the library() command each session. If you are working on the server, the packages you need will be installed; you just need to load them.

Running scripts and saving output

In this class, you’ll often write scripts, which are text files containing a series of R commands (like a recipe for your analysis). Once your script is complete, you may want R to execute all the commands at once. The command to run a script in R is source().

You can run source() directly in the Console, or click the Source button in the upper-right corner of the Source panel in RStudio. For example, suppose you have a script named lab_01_example.R saved in your current working directory. You can run it by typing the following in the Console:

source("lab_01_example.R", echo = TRUE)

echo = TRUE tells R to print each command and its output in the Console as the script runs, making it easier to follow what’s happening.

Creating a log file with sink()

For assignments in this class, you’ll need to demonstrate that your scripts run without errors. A helpful way to do this is to generate a log file that captures both the commands from your script and their output. You can create a log file using source() together with another function called sink().

Here’s how to generate a log file for a script named lab_01_example.R:

  1. Open a new script in RStudio.
  2. Type the following commands (lines) into the new script or directly into the Console:
sink(file = "lab_01_example.log")
source("lab_01_example.R",echo = TRUE)
sink()

How it works:

  • sink(file = "lab_01_example.log"): This starts redirecting all output from R into a file named lab_01_example.log.
  • source("lab_01_example.R", echo = TRUE): Runs the script and ensures both the commands and their results are included in the log file.
  • sink(): Closes the connection to the log file, returning R to normal behavior (printing output to the Console).

Once the script runs, you can open the log file (lab_01_example.log) to review everything that was executed and ensure your code ran without errors.

Documentation

Documentation is critical to understanding how functions and packages work in R. There is a help tab in the lower right panel in RStudio. You can also place your cursor in the function name in your code and press F1 to bring up the help for that function. R documentation always provides a brief description, information about arguments (or inputs), information about outputs, and some examples of how you would use it. Learning to read the documentation is critical for learning new packages.

Here are some additional resources to help you get started with R:

Google Site

Google sites is an easy-to-use website editing and hosting service. You will create a website as a portfolio for your work in the course. You will complete your assignments by building webpages in your website. We encourage you to develop this website to showcase your work and market yourself to future employers.

Tableau Setup

We will be using two Tableau products in this course (although there are many more - see handout): Tableau Desktop and Tableau Public. Tableau desktop typically requires a paid subscription, but as students (and teachers) we get it for free.

Visit Tableau for students and apply for your 1-year academic license.

Visit Tableau Desktop and download Tableau Desktop using the 14-day free trial. You can enter your product license key once you receive an email with your academic license.

Visit Tableau Public and create an account. You will use this account to host your data visualizations, which you can then embed on your google site. You will be required to use this to upload homework assignments.

If you are ever feeling lost in this class, I recommend checking out some of Tableau’s free eLearning resources. We will use some videos from these trainings throughout the course and I am happy to recommend specific resources to fit your needs.


This Week’s Assignment

Get started on the problem set here: https://csu-arec-330.github.io/materials/unit_00/week_01/ps1.html

Footnotes

  1. If you like this concept, you might explore the linux operating system. I use Ubuntu.↩︎