Linear regression/trend analysis
Interpreting trend analysis
Forecasting using trends
Motivation for structural breaks
Project 1 reminder
The objective:
We want to understand the
relationship between two variables
The simple solution:
Assume the relationship is linear
and estimate the “average” trend
Imagine you have been tracking carrot prices for several years and you want to make a prediction about what carrot prices will look like in the future.
You plotted your data on carrot prices, and you want to know: Does the price of carrots relate to how many months have passed since the financial crisis?
We can use statistical tools to determine the relationship between “Months since financial crisis” and “Carrot price” to answer questions that cannot always be determined by simply looking at a data plot.
The tool is linear regression, or regression analysis.
Quick review: The equation of a line plotting the relationship between \(y\) and \(x\) is, \[y=mx+b \]
Which variable represents the slope?
Which variable represents the intercept?
Regression analysis is a tool that examines the average relationship between two (or more) variables.
It helps us understand:
We use regression analysis to:
Forecast future values of the dependent variable (e.g., what carrot prices might be in the future, given past responses to economic events).
Evaluate the strength of each predictor, i.e., which independent variables are most strongly correlated with changes in carrot prices.
We use linear regression, along with our data, to answer:
Are carrot prices, on average, increasing or decreasing?
How much of the observed variation in carrot prices can be explained by a simple time-trend line?
What is the most reasonable prediction of carrot prices 250 months after the financial crisis?
It is not easy to eyeball a trend line.
Regression analysis is useful to determine the best-fitting line.
In simple terms, a “best-fitting” line is one that goes through, or is closest to, most of the data points in the chart.
We use can use linear regression, along with our data, to answer:
Are carrot prices, on average, increasing or decreasing?
How much of the observed variation in carrot prices can be explained by a simple trend line?
What is the most reasonable prediction of carrot prices 250 months after the financial crisis?
Ordinary Least Squares (OLS) is the most common estimation method for linear models.
For a good reason: As long as your model satisfies the certain assumptions, you’re getting the BEST possible estimates.
Tableau, Excel, and other programs use OLS to estimate trend lines.
The following equation represents the mathematical relationship between an independent variable and a dependent variable.
The \(\beta_1\) coefficient:
Acts as the “weight” that the independent variable has in predicting the dependent variable.
Tells you how much the dependent variable is expected to increase (or decrease) when the independent variable increases by one unit.
The \(\beta_0\) coefficient:
This is our intercept term.
It is the value of the dependent variable when all the independent variables are zero.
It represents the expected value of the outcome you’re interested in predicting when all of your predictor variables are zero.
A. months_since_crisis
B. 0.0728323
C. Avg. Carrot price
D. 189.513
A. months_since_crisis
B. 0.0728323
C. Avg. Carrot price
D. 189.513
A. months_since_crisis
B. 0.0728323
C. Avg. Carrot price
D. 189.513
A. months_since_crisis
B. 0.0728323
C. Avg. Carrot price
D. 189.513
\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]
What is does \(\beta_0\) represent? How can we interpret it? What does it mean for how the trend line will appear in our visualization?
\[ Y_i = \beta_0 + \beta_1 X_i + \varepsilon_i \]
What is does \(\beta_1\) represent? How can we interpret it? What does it mean for how the trend line will appear in our visualization?
A statistical measure of how well the model “fits” the data.
It tells us how much variation of the \(Y\) variable is explained by the \(X\) variable(s)
Higher \(R^2\) = better fit; \(R^2\) = 1 \(\Rightarrow\) all variation is explained. (More info) Why is this important?
If you are trying to predict carrot prices based on the number of months since the financial crisis, the \(R^2\) will tell us how well our predictions match up with the actual prices.
If the \(R^2\) is 2%, it means that 2% of the reasons why carrot prices change can be figured out just by looking at how many months have passed since the financial crisis.
We use can use linear regression, along with our data, to answer:
Are carrot prices, on average, increasing or decreasing?
How much of the observed variation in carrot prices can be explained by a simple trend line?
What is the most reasonable prediction of carrot prices 250 months after the financial crisis?
Tableau’s default P-value is a measure of the significance for slope parameter \(\beta_1\)
A p-value measures how surprising your data is under the assumption that there is no effect or no relationship.
Smaller p-value = better
A p-value of 0.05 or less is often considered statistically significant \(\Rightarrow\) statistically significantly different from 0. (More info)
In this example, a p-value of 0.02 means that there is a 2% chance of observing these data simply by chance. We have strong evidence against the assumption that there is no relationship.
Note that clicking “describe trend line” will also give you a p-value for the intercept parameter.
A. Extremely well
B. Decent
C. Not well
A. Very
B. Somewhat
C. Not at all
How?
Continue/extend your trend beyond your time series
Tableau and other programs can do this for you automatically
You can also do this manually by plugging in values of your \(X\) variable(s)…
What is the most reasonable prediction of carrot prices 250 months after the financial crisis?
\[ \hat{Y} = 189.513 + 0.0728323*(250) = 207.721 \]
How could we use these data to answer:
Are carrot prices, on average, increasing or decreasing?
How much of the observed variation in carrot prices can be explained by a simple trend line?
What is the most reasonable prediction of carrot prices 250 months after the financial crisis?
We will teach you enough to be dangerous.
We are introducing you to the tip of the analytical iceberg. Learn more before you put this into practice.
Incorrect analysis can lead to worse conclusions than no analysis.
CORRELATION \(\neq\) CAUSATION
Groups of 2
Choose an ag biz or enre management question to answer with time series data
Collect time series data
Analyze trends
Generate a forecast
Present results in a recorded video