Tangible Experience - Regression Line (TLS and OLS)
The GeoGebra applet is for the concepts of Total Least Squares (TLS) or Orthogonal Regression; however, the formula is based on the OLS method
Minimizing the Shortest Distance
If we want to minimize the square of the shortest (perpendicular) distances from each data point to the regression line, this approach is known as:
Total Least Squares (TLS) or Orthogonal Regression
- In TLS, errors are assumed in both the x- and y-values.
- The line is adjusted to minimize the perpendicular distance between the observed data and the regression line.
- This approach is common in:
- Physics (e.g., fitting data with measurement errors in both variables).
- Signal processing.
- Geometric modeling.
- Ordinary Least Squares (OLS) minimizes the sum of the squared vertical distances (yi−y^i)
- This makes sense when the x-values are considered error-free, and we're modeling y as a function of x.
Optimizing the Regression Line Using OLS: Minimizing Vertical Distances
How to Use This GeoGebra Build for the OLS Method?
1.) Considering the set of data points in blue and the red line, which serves as an estimator of the best fit or regression line, try to guess the optimized slope and optimized y-intercept using the sliders.
(Hint: The regression line is the magenta line.)
2.) As you manually derive the best fit or regression line, observe and compare the non-optimized sum of squares (component a) with the least sum of squares (component o). Reflect on how the formula for the regression line is derived.
3.) Study the Algebra pane section and try to find the command for finding or graphing the regression line given a set of data points.
4.) Study the spreadsheet and try to replicate this build in order to experience the concepts of the regression line firsthand.
5.) Finally, for the beautiful symphony of algebra and calculus, watch the video below for the formal derivation of the regression line equation.
Simplify the optimized formula for the slope and y-intercept (b) in order to derive the formula for the optimized slope and y-intercept (b) found in the GeoGebra applet above.
Derive the Formulae for Simple Linear Regression
Red line
1. What does the red line in the GeoGebra build represent?
Estimates vs Optimized Values of Slope and Y-intercept
2. When deriving the regression line manually, what should be compared?
Derivation of the Formulas of the Regression Line
In the context of the regression line equation, what is expected after simplifying the optimized formula of the slope and y-intercept (b)?
Regression line and the data points
How does the regression line relate to the data points in the scatter plot?
Non-optimized sum of squares and the Least sum of squares
What is the significance of comparing the non-optimized sum of squares with the least sum of squares in the context of regression analysis?
y-intercept (c) in the regression line equation
What is the role of the y-intercept (c) in the regression line equation?