Week 7 Lab: Forecasting and Prediction Intervals

This Lab Contributes to Course Objectives: 1, 2, 3, 4, 7, 8

Learning Objectives R

  • Understand the statistics of prediction intervals

  • Understand how to conduct a forecast with prediction intervals in R

Learning Objectives Tableau

  • Plot your prediction intervals

  • Using Tableau stories to present

  • Embedding Tableau in Powerpoint to present

R: Time Series Forecast and Prediction Interval

Last week:

  • We learned how to build a forecast from the components of the time series decomposition (Trend, Seasonality, and Residual).

  • We made some relatively simple assumptions about how these components would behave in the future.

R has many time series analysis and forecasting tools. Because the libraries containing the tools are open source and written by R users, the structure of the functions and object created by them may vary.

This week:

We are going to start using a new library called fpp3 that you will need to install if it isn’t already installed.

  • fpp3 is a meta-library that loads a series of time series forecasting libraries that make time series analysis and forecasting much easier. The library fpp3 was created as the companion to the textbook Forecasting: Principles and Practice (3rd ed).

The main libraries we will be using are fable and feasts:

  • fable contains many time series models that you might want to use to understand and forecast data.

  • feasts contains a set of dplyr-like tools to work with time series data. This set of packages makes it very easy to run models and extract the information you want.

The goal today is to look at prediction intervals, which are related to confidence intervals but pertain to your forecast.

Step 1: Load your data

First, let’s load the time series data into R and set up your script (description, working directory, load packages). We will continue using the carrot price data.

#This lab...

library(tidyquant)
library(fpp3)

carrot <- tq_get(c("WPU01130212"),get = "economic.data",from="2007-08-01")

Step 2: Convert your data into a time series object

Like last week, we need to convert our data into a time series dataframe called a tsibble:

#Prep data
carrot_ts <- carrot %>%
  select(date,price) %>% #select only the date and price variables
  mutate(date=yearmonth(date)) %>% #convert the date into yearmonth format so the models understand the unit of observation
  as_tsibble() #convert the data to a tsibble object - similar to a tibble or data.frame but for time series data

This will create a new time series tibble object called carrot_ts.

Step 3: Tidy forecast workflow

Our objective is to build a forecast model and extract the key pieces of information, so that we can send them to Tableau.

One of the benefits of the fpp3 package is that it simplifies model building process in a structure that is compatible with the pipe, %>%.

We will recreate the forecasting model that we built in parts last week:

  1. Recall that we first decomposed the carrot price time series into trend, seasonal, and random components, and built a forecasting model based on each component.

  2. Then we put it all back together.1

This process is called a workflow: a set of steps for completing a task. We will translate the workflow we built into the structure of the new package.

3a: Define the model

We have the time series data, carrot_ts, prepared for modeling. Now, we need to define the model.

We defined individual forecasts for each of the three components in week 2. We used the function, forecast, which uses an exponential smoothing method by default. We can be more explicit in defining an exponential smoothing model of each component.

#Fit the model
fit <- carrot_ts %>%
  model(my_ets=ETS(price ~ trend(method="A") + season(method="A") + error(method="A"))) #fit model

The function model() is used to define the forecasting model. In this code, we have done the following:

  • Defined an exponential smoothing model, ETS(), based on three components: trend, season, and error.

  • Told the model that we want those components to be additive rather than multiplicative. (The argument method="A" represents additive.)

  • We have defined the model as my_ets.

In short, this defines the model that we built in parts in week 2. Inspecting the data frame fit shows us that this simply defines the model but doesn’t do anything with it.2

Note that model() is a powerful function capable of specifying many models at once. You may want to compare model fits or estimate models on several price series simultaneously. See more information here.

3b: Forecast the model

Now that we have specified the model, we need to actually run the forecast.

We will use a function called forecast() from the fabletools library (related to fable). However, this function is different from the previously used forecast function. When there are potential conflicts between functions with the same name from different libraries, you can specify the library, so R does not try to use the wrong one.

We will specify that we want to forecast our ETS model for 5 years. The function understands that you want 5 years worth of monthly forecasts (remember the data is stored as monthly data); this is equivalent to h=60 because 60 months is 5 years.

#Forecast the model
carrot_forecast <- fit %>%
  fabletools::forecast(h="5 years") #forecast model for 5 years

Inspect carrot_forecast and you will see that it contains mean and variance information for each month of the 5 year forecast.

3c: Plot the forecast

Finally, you want to visualize your forecast.

You can pass carrot_forecast into the function autoplot() which understands how to read forecast output. Note that autoplot() plots only the forecast by default. Adding the observed data carrot_ts tells the function to include the observed data in the plot.

#Plot the forecast
autoplot(carrot_forecast,carrot_ts) #forecast model for 5 years

You can see that the mean, the 80%, and the 90% prediction intervals are included.

See Hyndman and Athanasopoulos for more information on developing a workflow with these time series tools.

Step 4: Generating prediction intervals

The plot generated by autoplot() indicates that the model is estimating prediction intervals. There is a function hilo() from the fable library to extract them. However, this function creates a column with both upper and lower prediction intervals packed into a single variable. There are two options to separate the min and max of the 95% confidence interval:

The first option is to use the mutate() function:

#Extract prediction interval using mutate()
carrot_forecast_out <- carrot_forecast %>%
  hilo(level = 95) %>% #extract 95% prediction interval
  mutate(
    lower = `95%`$lower,
    upper = `95%`$upper
  ) #generate own columns named lower and upper

The second option is to use a function called unpack_hilo() to separate the values into their own variables. You need to specify the name of the variable; in this case "95%".

#Extract prediction interval using unpack_hilo()
carrot_forecast_out <- carrot_forecast %>%
  hilo(level = 95) %>% #extract 95% prediction interval
  unpack_hilo(cols = "95%") #unpack into own columns named lower and upper

Step 5: Preparing to export

Finally, we want to prepare the data for export to Tableau.

Recall that our original time series data, carrot_ts, has two variables: date and price.

Our tsibble carrot_forecast_out has date and price, but the price variable is a distribution rather than a value. The variable .mean contains the forecast mean, so we can remove price and rename .mean. Remember that you can rename and select variables in the same command.

#Prepare data for export
part2 <- carrot_forecast_out %>%
  select(date,price=.mean,`lower`="95%_lower",`upper`="95%_upper")

to_export <- bind_rows(carrot_ts,part2)

write_csv(to_export,"carrot_forecast.csv")

Note: If you used the code with mutate then you do not need to rename “95%_lower” and “95%_upper” in this code block.

Tableau: Plotting prediction intervals

Using the data we exported from R, we will plot the prediction intervals.

Option 1: Simple line graphs, no shading

  1. Connect to the carrot_forecast dataset you just exported from R.

  2. Create two new calculated fields. One called Observed Price and one called Forecasted Price. You can create both of these using an IF THEN ELSE statement:

  • Observed Price: We want all prices through the end of our observed data (January 1, 2023)

IF [Date] <= #January 1, 2024# THEN [Price] ELSE NULL END

  • Forecasted Price: We want all forecasted prices (after and including February 1, 2023)

IF [Date] > #January 1, 2024# THEN [Price] ELSE NULL END

Note: What error message are you seeing when you try to create the two new calculated fields? What do you need to do to fix the error?

  1. Create a line chart that includes Observed Price, Forecasted Price, Upper, and Lower.

Note: There are two more modifications you need to do before you can plot Upper and Lower. What modifications do you need to do?

  1. Change the colors to something useful. Use the same gray shade for both upper and lower.

Option 2: Adding shading

  1. Duplicate your previous worksheet.

  2. In the Marks card, change your chart to an Area Chart.

  3. Unstack your lines by going to Analysis -> Stack Marks -> Off

  4. Rearrange the order of your forecast and prediction intervals in the Measure card: Lower, Forecasted, Observed, Upper.

  5. Change your colors: White for Observed Price and Lower, Gray for Upper and Forecasted Price. Make sure the opacity is set to 100%.

  6. Create a new line graph by dragging the original Price field to the Rows Shelf.

  7. Change this new pane to a line chart.

  8. Change the colors to reflect forecasted and original prices by creating a new calculated field that equals “Forecasted Price” after January 1, 2023, and that equals “Observed Price” before then. Drag this calculated field to the Colors Card.

IF [Date] <= #January 1, 2024# THEN "Observed Price" ELSE "Forecasted Price" END

  1. Combine your charts by creating a dual axis. Right click the Y-axis of the Price line graph -> Select Dual Axis

  2. Synchronize your axes. Right click the second Y-axis -> Select Synchronize Axis. You can then hide this axis. Right click -> Click Show Header to de-select it.

  3. Notice that the area under the Price line no longer has your y-axis gridlines. Let’s just get rid of all the gridlines to make this look a bit nicer. Right click on your chart -> select Format. In the format pane, navigate to the lines icon, change Grid Lines to None.

  4. Finally, go back to your last sheet. Notice that the colors between your sheets are linked. Tableau does this to ensure your sheets are cohesive, but sometimes we want different colors for the same variable. The workaround for this is to duplicate each of the variables you want to assign a different color and use those in this sheet instead of the original variables.

Presenting in Tableau

Option 1: Stories

Tableau Desktop includes its own method for presenting your analyses. These are called stories. This tutorial will show you how to put together a simple story.

  1. Create a new story by clicking the New Story tab at the bottom of the window. Change the size to PowerPoint.

  2. Let’s create an Intro slide with text by creating a new Dashboard. Add a background image and a text box.

  3. Add your Intro slide to your story.

  4. Add a new (blank) story point and bring in one of your worksheets. Give this story point a useful title.

  5. See what your story looks like in presentation mode. Note that you can hide the story title to make this look a bit nicer.

  6. Note that if you like the formatting features of PowerPoint for text slides, you can also create these slides in power point, export them as images and then add those images to Dashboards.

Option 2: Embed in Powerpoint

You can also embed interactive Tableau dashboards from your Tableau Public account directly into PowerPoint. This will require you to install the Web Viewer Add-in. Follow these instructions to set this up: Instructions

Now get together with your teammate and spend some time on your project!

Footnotes

  1. I’ve included more information about the model in week 2 of this unit lab notes.↩︎

  2. You can extract model coefficient estimates using the command tidy(fit) if you need to extract them.↩︎