Week 6:
Intro to Trends and Forecasting

Agenda

Linear regression/trend analysis

Interpreting trend analysis

Forecasting using trends

Motivation for structural breaks

Project 1 reminder

Intro to Linear Regressions

The objective:
We want to understand the
relationship between two variables

The simple solution:
Assume the relationship is linear
and estimate the “average” trend

Linear regression: The intuition

Imagine you have been tracking carrot prices for several years and you want to make a prediction about what carrot prices will look like in the future.

You plotted your data on carrot prices, and you want to know: Does the price of carrots relate to how many months have passed since the financial crisis?

Linear regression: The intuition

We can use statistical tools to determine the relationship between “Months since financial crisis” and “Carrot price” to answer questions that cannot always be determined by simply looking at a data plot.

The tool is linear regression, or regression analysis.

Equation of a line

Quick review: The equation of a line plotting the relationship between \(y\) and \(x\) is, \[y=mx+b \]

Which variable represents the slope?

Which variable represents the intercept?

Shiny App

Regression analysis

Regression analysis is a tool that examines the average relationship between two (or more) variables.

It helps us understand:

  • How an outcome, or the “dependent” variable, (carrot price) changes when one or more “independent” variables (months since the financial crisis) change.

We use regression analysis to:

  • Forecast future values of the dependent variable (e.g., what carrot prices might be in the future, given past responses to economic events).

  • Evaluate the strength of each predictor, i.e., which independent variables are most strongly correlated with changes in carrot prices.

Regression analysis, applied

We use linear regression, along with our data, to answer:

  1. Are carrot prices, on average, increasing or decreasing?

  2. How much of the observed variation in carrot prices can be explained by a simple time-trend line?

  3. What is the most reasonable prediction of carrot prices 250 months after the financial crisis?

Exercise 1: Which trend line fits the data best?

Exercise 1: Which trend line fits the data best?

It is not easy to eyeball a trend line.

Regression analysis is useful to determine the best-fitting line.

In simple terms, a “best-fitting” line is one that goes through, or is closest to, most of the data points in the chart.

Regression analysis, applied

We use can use linear regression, along with our data, to answer:

  1. Are carrot prices, on average, increasing or decreasing?

    • Increasing (slightly)
  2. How much of the observed variation in carrot prices can be explained by a simple trend line?

  3. What is the most reasonable prediction of carrot prices 250 months after the financial crisis?

The Ordinary Least Squares Regression

Ordinary Least Squares (OLS) is the most common estimation method for linear models.

For a good reason: As long as your model satisfies the certain assumptions, you’re getting the BEST possible estimates.

Tableau, Excel, and other programs use OLS to estimate trend lines.

The OLS regression equation

The following equation represents the mathematical relationship between an independent variable and a dependent variable.

The \(\beta_1\) coefficient:

  • Acts as the “weight” that the independent variable has in predicting the dependent variable.

  • Tells you how much the dependent variable is expected to increase (or decrease) when the independent variable increases by one unit.

The \(\beta_0\) coefficient:

  • This is our intercept term.

  • It is the value of the dependent variable when all the independent variables are zero.

  • It represents the expected value of the outcome you’re interested in predicting when all of your predictor variables are zero.

Interpretting Tableau regression output

Q: What is the dependent (“Y”) variable?

A. months_since_crisis

B. 0.0728323

C. Avg. Carrot price

D. 189.513

Q: What is the independent (“X”) variable?

A. months_since_crisis

B. 0.0728323

C. Avg. Carrot price

D. 189.513

Q: What is our estimate of \(\beta_0\)?

A. months_since_crisis

B. 0.0728323

C. Avg. Carrot price

D. 189.513

Q: What is our estimate of \(\beta_1\)?

A. months_since_crisis

B. 0.0728323

C. Avg. Carrot price

D. 189.513

Discussion: Interpretting Tableau output

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]

What is does \(\beta_0\) represent? How can we interpret it? What does it mean for how the trend line will appear in our visualization?

Discussion: Interpretting Tableau output

\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]

What is does \(\beta_1\) represent? How can we interpret it? What does it mean for how the trend line will appear in our visualization?

Interpretting Tableau output: What is \(R^2\)?

  • A statistical measure of how well the model “fits” the data.

  • It tells us how much variation of the \(Y\) variable is explained by the \(X\) variable(s)

  • Higher \(R^2\) = better fit; \(R^2\) = 1 \(\Rightarrow\) all variation is explained. (More info) Why is this important?

  • If you are trying to predict carrot prices based on the number of months since the financial crisis, the \(R^2\) will tell us how well our predictions match up with the actual prices.

  • If the \(R^2\) is 2%, it means that 2% of the reasons why carrot prices change can be figured out just by looking at how many months have passed since the financial crisis.

Regression analysis, applied

We use can use linear regression, along with our data, to answer:

  1. Are carrot prices, on average, increasing or decreasing?

    • Increasing (slightly)
  2. How much of the observed variation in carrot prices can be explained by a simple trend line?

    • Not much!
  3. What is the most reasonable prediction of carrot prices 250 months after the financial crisis?

Interpretting output: What is P-value?

  • Tableau’s default P-value is a measure of the significance for slope parameter \(\beta_1\)

  • A p-value measures how surprising your data is under the assumption that there is no effect or no relationship.

  • Smaller p-value = better

  • A p-value of 0.05 or less is often considered statistically significant \(\Rightarrow\) statistically significantly different from 0. (More info)

  • In this example, a p-value of 0.02 means that there is a 2% chance of observing these data simply by chance. We have strong evidence against the assumption that there is no relationship.

Interpretting: What is P-value?

Note that clicking “describe trend line” will also give you a p-value for the intercept parameter.

Q: How well does the model fit the data?

A. Extremely well

B. Decent

C. Not well

Q: How confident are you that the estimated slope is different from zero?

A. Very

B. Somewhat

C. Not at all

Using OLS regressions to forecast

Using OLS regressions to forecast

How?

  • Continue/extend your trend beyond your time series

  • Tableau and other programs can do this for you automatically

  • You can also do this manually by plugging in values of your \(X\) variable(s)…

Using OLS regressions to forecast

What is the most reasonable prediction of carrot prices 250 months after the financial crisis?

\[ \hat{Y} = 189.513 + 0.0728323*(250) = 207.721 \]

Regression analysis, applied

How could we use these data to answer:

  1. Are carrot prices, on average, increasing or decreasing?

    • Increasing (slightly)
  2. How much of the observed variation in carrot prices can be explained by a simple trend line?

    • Not much!
  3. What is the most reasonable prediction of carrot prices 250 months after the financial crisis?

    • BEST estimate: 207

Cautionary note

  • We will teach you enough to be dangerous.

  • We are introducing you to the tip of the analytical iceberg. Learn more before you put this into practice.

  • Incorrect analysis can lead to worse conclusions than no analysis.

  • Be wary of causal conclusions!

CORRELATION \(\neq\) CAUSATION

REMINDER: Project 1

Groups of 2

Choose an ag biz or enre management question to answer with time series data

Collect time series data

Analyze trends

Generate a forecast

Present results in a recorded video