Least-Squares Method

Author:Ku, Yin Bon (Albert)

General Least-Squares Problem

Suppose

is an m x n matrix and

is a column vector in

. We consider the equation

. It is well known that such equation has no solution if

is not in

. But still we would like to find a solution

such that

is closest to

. In other words, we need to minimize

among all vectors

. This is called the general least-squares problem. The term"least-squares" comes from the fact that

is the square root of a sum of squares. Definition: A vector

is a least-squares solution of

for all

. Visualizing the least-squares problem as follows: Let

, which is a subspace of

. Let

be a vector not in

. Thank to the Best Approximation Theorem,

is the vector in

such that

, for any

. Since

is in

, the equation

must be consistent. Let

be a solution i.e.

. Also, for any

for some

. Then the above inequality can be rewritten as follows:

, for any

. Therefore,

is a least-squares solution of

. Instead of computing

and solving the equation

for the least-squares solution, we have a better way to compute the least-squares solution directly. Since

is in

. And

. Hence, we have

. Let

be a least-squares solution i.e.

, then

That is to say, a least-squares solution must satisfy the equation

. It is called the normal equations for

. Conversely, given any solution

to the normal equations

, we have

i.e.

is in

and

. Hence,

is the orthogonal decomposition such that

is in

and

is in

. By the uniqueness of orthogonal decomposition,

i.e.

is a least-squares solution. In short, we can solve the normal equations to find all the least-squares solution(s).

Exercise

Find a least-squares solution of the inconsistent linear system .

An Application - Linear Regression

Suppose we have

pairs of experimental data for the two variable quantities

and

. And we assume that

is theoratically related to

by a linear equation

, where

and

are unknown parameters. Question: How can we find the parameters

and

such that the line

best fits the experimental data? Given the parameter

and

, we compute the predicted y-value of

by the line

for

. We express them in terms of matrices, we have

Let

and

, and

are called the design matrix, parameter vector, and observation vector respectively. Therefore, each entry of the vector

is the difference between an observed y-value and a predicted y-value, which is called a residual. The usual way to find the "best-fitted" line is to minimize the sum of the squares of the residuals. Such a best-fitted line is called a line of regression of on . And the parameters

and

are called (linear) regression coefficients. Equivalently, it means finding the least-squares solution to the equation

! In other words, we can find the regression coefficients by solving the normal equations

Assume

, the 2 x 2 matrix is invertible and the least-squares solution can be evaluated. The formulas for

and

are as follows:

Remark:

, where

is the variance of

in the data. That is to say, the 2 x 2 matrix is invertible when not all

are equal.

Least-Squares Method

General Least-Squares Problem

Exercise

An Application - Linear Regression

Example of Linear Regression

New Resources

Discover Resources

Discover Topics