Google Classroom
GeoGebraGeoGebra Classroom

Least-Squares Method

General Least-Squares Problem

Suppose is an m x n matrix and is a column vector in . We consider the equation . It is well known that such equation has no solution if is not in . But still we would like to find a solution such that is closest to . In other words, we need to minimize among all vectors in . This is called the general least-squares problem. The term"least-squares" comes from the fact that is the square root of a sum of squares. Definition: A vector in is a least-squares solution of if for all in . Visualizing the least-squares problem as follows: Let , which is a subspace of . Let be a vector not in . Thank to the Best Approximation Theorem, is the vector in such that , for any in . Since is in , the equation must be consistent. Let be a solution i.e. . Also, for any in , for some in . Then the above inequality can be rewritten as follows: , for any in . Therefore, is a least-squares solution of . Instead of computing and solving the equation for the least-squares solution, we have a better way to compute the least-squares solution directly. Since , is in . And . Hence, we have . Let be a least-squares solution i.e. , then That is to say, a least-squares solution must satisfy the equation . It is called the normal equations for . Conversely, given any solution to the normal equations , we have i.e. is in and . Hence, is the orthogonal decomposition such that is in and is in . By the uniqueness of orthogonal decomposition, i.e. is a least-squares solution. In short, we can solve the normal equations to find all the least-squares solution(s).

Exercise

Find a least-squares solution of the inconsistent linear system .

An Application - Linear Regression

Suppose we have pairs of experimental data for the two variable quantities and : . And we assume that is theoratically related to by a linear equation , where and are unknown parameters. Question: How can we find the parameters and such that the line best fits the experimental data? Given the parameter and , we compute the predicted y-value of by the line : for . We express them in terms of matrices, we have Let , and . , and are called the design matrix, parameter vector, and observation vector respectively. Therefore, each entry of the vector is the difference between an observed y-value and a predicted y-value, which is called a residual. The usual way to find the "best-fitted" line is to minimize the sum of the squares of the residuals. Such a best-fitted line is called a line of regression of on . And the parameters and are called (linear) regression coefficients. Equivalently, it means finding the least-squares solution to the equation ! In other words, we can find the regression coefficients by solving the normal equations : Assume , the 2 x 2 matrix is invertible and the least-squares solution can be evaluated. The formulas for and are as follows: Remark: , where is the variance of in the data. That is to say, the 2 x 2 matrix is invertible when not all are equal.

Example of Linear Regression