#This lab...
library(tidyquant)
library(fpp3)
<- tq_get(c("WPU01130212"),get = "economic.data",from="2007-08-01") carrot
Week 7 Lab: Forecasting and Prediction Intervals
This Lab Contributes to Course Objectives: 1, 2, 3, 4, 7, 8
Learning Objectives R
Understand the statistics of prediction intervals
Understand how to conduct a forecast with prediction intervals in R
Learning Objectives Tableau
Plot your prediction intervals
Using Tableau stories to present
Embedding Tableau in Powerpoint to present
R: Time Series Forecast and Prediction Interval
Last week:
We learned how to build a forecast from the components of the time series decomposition (Trend, Seasonality, and Residual).
We made some relatively simple assumptions about how these components would behave in the future.
R has many time series analysis and forecasting tools. Because the libraries containing the tools are open source and written by R users, the structure of the functions and object created by them may vary.
This week:
We are going to start using a new library called fpp3
that you will need to install if it isn’t already installed.
fpp3
is a meta-library that loads a series of time series forecasting libraries that make time series analysis and forecasting much easier. The libraryfpp3
was created as the companion to the textbook Forecasting: Principles and Practice (3rd ed).
The main libraries we will be using are fable
and feasts
:
fable
contains many time series models that you might want to use to understand and forecast data.feasts
contains a set ofdplyr
-like tools to work with time series data. This set of packages makes it very easy to run models and extract the information you want.
The goal today is to look at prediction intervals, which are related to confidence intervals but pertain to your forecast.
Step 1: Load your data
First, let’s load the time series data into R and set up your script (description, working directory, load packages). We will continue using the carrot price data.
Step 2: Convert your data into a time series object
Like last week, we need to convert our data into a time series dataframe called a tsibble
:
#Prep data
<- carrot %>%
carrot_ts select(date,price) %>% #select only the date and price variables
mutate(date=yearmonth(date)) %>% #convert the date into yearmonth format so the models understand the unit of observation
as_tsibble() #convert the data to a tsibble object - similar to a tibble or data.frame but for time series data
This will create a new time series tibble
object called carrot_ts.
Step 3: Tidy forecast workflow
Our objective is to build a forecast model and extract the key pieces of information, so that we can send them to Tableau.
One of the benefits of the fpp3
package is that it simplifies model building process in a structure that is compatible with the pipe, %>%
.
We will recreate the forecasting model that we built in parts last week:
Recall that we first decomposed the carrot price time series into
trend
,seasonal
, andrandom
components, and built a forecasting model based on each component.Then we put it all back together.1
This process is called a workflow: a set of steps for completing a task. We will translate the workflow we built into the structure of the new package.
3a: Define the model
We have the time series data, carrot_ts
, prepared for modeling. Now, we need to define the model.
We defined individual forecasts for each of the three components in week 2. We used the function, forecast
, which uses an exponential smoothing method by default. We can be more explicit in defining an exponential smoothing model of each component.
#Fit the model
<- carrot_ts %>%
fit model(my_ets=ETS(price ~ trend(method="A") + season(method="A") + error(method="A"))) #fit model
The function model()
is used to define the forecasting model. In this code, we have done the following:
Defined an exponential smoothing model,
ETS()
, based on three components:trend
,season
, anderror
.Told the model that we want those components to be additive rather than multiplicative. (The argument
method="A"
represents additive.)We have defined the model as
my_ets
.
In short, this defines the model that we built in parts in week 2. Inspecting the data frame fit
shows us that this simply defines the model but doesn’t do anything with it.2
Note that model()
is a powerful function capable of specifying many models at once. You may want to compare model fits or estimate models on several price series simultaneously. See more information here.
3b: Forecast the model
Now that we have specified the model, we need to actually run the forecast.
We will use a function called forecast()
from the fabletools
library (related to fable
). However, this function is different from the previously used forecast function. When there are potential conflicts between functions with the same name from different libraries, you can specify the library, so R does not try to use the wrong one.
We will specify that we want to forecast our ETS
model for 5 years. The function understands that you want 5 years worth of monthly forecasts (remember the data is stored as monthly data); this is equivalent to h=60
because 60 months is 5 years.
#Forecast the model
<- fit %>%
carrot_forecast ::forecast(h="5 years") #forecast model for 5 years fabletools
Inspect carrot_forecast
and you will see that it contains mean and variance information for each month of the 5 year forecast.
3c: Plot the forecast
Finally, you want to visualize your forecast.
You can pass carrot_forecast
into the function autoplot()
which understands how to read forecast output. Note that autoplot()
plots only the forecast by default. Adding the observed data carrot_ts
tells the function to include the observed data in the plot.
#Plot the forecast
autoplot(carrot_forecast,carrot_ts) #forecast model for 5 years
You can see that the mean, the 80%, and the 90% prediction intervals are included.
See Hyndman and Athanasopoulos for more information on developing a workflow with these time series tools.
Step 4: Generating prediction intervals
The plot generated by autoplot()
indicates that the model is estimating prediction intervals. There is a function hilo()
from the fable
library to extract them. However, this function creates a column with both upper and lower prediction intervals packed into a single variable. There are two options to separate the min and max of the 95% confidence interval:
The first option is to use the mutate()
function:
#Extract prediction interval using mutate()
<- carrot_forecast %>%
carrot_forecast_out hilo(level = 95) %>% #extract 95% prediction interval
mutate(
lower = `95%`$lower,
upper = `95%`$upper
#generate own columns named lower and upper )
The second option is to use a function called unpack_hilo()
to separate the values into their own variables. You need to specify the name of the variable; in this case "95%"
.
#Extract prediction interval using unpack_hilo()
<- carrot_forecast %>%
carrot_forecast_out hilo(level = 95) %>% #extract 95% prediction interval
unpack_hilo(cols = "95%") #unpack into own columns named lower and upper
Step 5: Preparing to export
Finally, we want to prepare the data for export to Tableau.
Recall that our original time series data, carrot_ts
, has two variables: date
and price
.
Our tsibble carrot_forecast_out
has date
and price
, but the price
variable is a distribution rather than a value. The variable .mean
contains the forecast mean, so we can remove price
and rename .mean
. Remember that you can rename and select variables in the same command.
#Prepare data for export
<- carrot_forecast_out %>%
part2 select(date,price=.mean,`lower`="95%_lower",`upper`="95%_upper")
<- bind_rows(carrot_ts,part2)
to_export
write_csv(to_export,"carrot_forecast.csv")
Note: If you used the code with
mutate
then you do not need to rename “95%_lower” and “95%_upper” in this code block.
Tableau: Plotting prediction intervals
Using the data we exported from R, we will plot the prediction intervals.
Option 1: Simple line graphs, no shading
Connect to the
carrot_forecast
dataset you just exported from R.Create two new calculated fields. One called
Observed Price
and one calledForecasted Price
. You can create both of these using an IF THEN ELSE statement:
Observed Price
: We want all prices through the end of our observed data (January 1, 2023)
IF [Date] <= #January 1, 2024# THEN [Price] ELSE NULL END
Forecasted Price
: We want all forecasted prices (after and including February 1, 2023)
IF [Date] > #January 1, 2024# THEN [Price] ELSE NULL END
Note: What error message are you seeing when you try to create the two new calculated fields? What do you need to do to fix the error?
- Create a line chart that includes
Observed Price
,Forecasted Price
,Upper
, andLower
.
Note: There are two more modifications you need to do before you can plot Upper and Lower. What modifications do you need to do?
- Change the colors to something useful. Use the same gray shade for both upper and lower.
Option 2: Adding shading
Duplicate your previous worksheet.
In the Marks card, change your chart to an Area Chart.
Unstack your lines by going to
Analysis
->Stack Marks
->Off
Rearrange the order of your forecast and prediction intervals in the Measure card: Lower, Forecasted, Observed, Upper.
Change your colors: White for
Observed Price
andLower
, Gray forUpper
andForecasted Price
. Make sure the opacity is set to 100%.Create a new line graph by dragging the original
Price
field to the Rows Shelf.Change this new pane to a line chart.
Change the colors to reflect forecasted and original prices by creating a new calculated field that equals “Forecasted Price” after January 1, 2023, and that equals “Observed Price” before then. Drag this calculated field to the Colors Card.
IF [Date] <= #January 1, 2024# THEN "Observed Price" ELSE "Forecasted Price" END
Combine your charts by creating a dual axis. Right click the Y-axis of the Price line graph -> Select
Dual Axis
Synchronize your axes. Right click the second Y-axis -> Select
Synchronize Axis
. You can then hide this axis. Right click -> ClickShow Header
to de-select it.Notice that the area under the Price line no longer has your y-axis gridlines. Let’s just get rid of all the gridlines to make this look a bit nicer. Right click on your chart -> select
Format
. In the format pane, navigate to the lines icon, change Grid Lines toNone
.Finally, go back to your last sheet. Notice that the colors between your sheets are linked. Tableau does this to ensure your sheets are cohesive, but sometimes we want different colors for the same variable. The workaround for this is to duplicate each of the variables you want to assign a different color and use those in this sheet instead of the original variables.
Presenting in Tableau
Option 1: Stories
Tableau Desktop includes its own method for presenting your analyses. These are called stories. This tutorial will show you how to put together a simple story.
Create a new story by clicking the
New Story
tab at the bottom of the window. Change the size to PowerPoint.Let’s create an Intro slide with text by creating a new Dashboard. Add a background image and a text box.
Add your Intro slide to your story.
Add a new (blank) story point and bring in one of your worksheets. Give this story point a useful title.
See what your story looks like in presentation mode. Note that you can hide the story title to make this look a bit nicer.
Note that if you like the formatting features of PowerPoint for text slides, you can also create these slides in power point, export them as images and then add those images to Dashboards.
Option 2: Embed in Powerpoint
You can also embed interactive Tableau dashboards from your Tableau Public account directly into PowerPoint. This will require you to install the Web Viewer Add-in. Follow these instructions to set this up: Instructions
Now get together with your teammate and spend some time on your project!