Interpreting the Hessian Matrix to Identify a Stable Point

Auteur :nujiznaw

We've so far transferred the problem of identifying a stable point as a minimum, maximum, or saddle point to determining whether the term

is always positive, negative, or sometimes positive and sometimes negative. If you know some matrix multiplications, you'll be able to see that the term above is equal to:

And according to how we defined

, and

before,

, which is exactly what the Hessian matrix is in the two-variable case. But we'll just leave it to

, and

for the sake of concision. So, if we ignore the factor

temporarily (because it does not change the term's positivity or negativity), we get

. But what does this actually mean? In the simpler cases, where b equals 0, we have

. Imagine the process going on here as this: there are two inputs, one is our diagonal matrix

(a matrix in which the entries outside the main diagonal are all zero), the other is our X-Y coordinate system. Our algorithm then spits out a function, the height

for every point

Alter a and c and observe the shape of the curved surface from different angles.

I hope you would get an intuition of what a diagonal matrix is really doing to the coordinate system. It is not operating the X-Y plane by stretching the X-axis by a factor of

and the Y-axis by a factor of

. Instead, if you think of the initial value as

, the matrix stretches the

on the z-X plane along the vertical direction by a factor of

and the

on the z-Y plane by a factor of

, thus giving out

. And that's why the curved surface is symmetric both with respect to the z-X plane and with respect to the z-Y plane. Check this by observing the function from top to down. It's obvious that

is always true, and so is

, while

can be both positive and negative. So extending this idea, we can conclude that: If

and

, then the stable point is a minimal point. If

and

, then the stable point is a maximal point. If

, then the stable point is a saddle point.

We are finally getting to the complicated case where

is not zero. There's this

matrix as our input, but we don't really understand what it's doing. Why not change it into a diagonal matrix? That's the fundamental idea behind the eigenvalues and eigenvectors. A quick reminder: if an eigenvector is

, and its corresponding eigenvalue is

, we'll have

, which means that the only thing the matrix does to the eigenvector is stretching it by a factor of

. Computationally, we can prove that there exist two eigenvalues and that the two eigenvectors are perpendicular to each other. I'll leave the proof below. If it does not interest you, you can skip that and just keep the two conclusions in mind.

Because the two eigenvectors are perpendicular to each other, we can choose them so that they look just like rotating the unit vectors

and

in the X-Y coordinate system by

degrees. (Let

rotated 90 degrees counterclockwise.)

You can alter θ.

Now, we can change the basis of our matrix

into the two unit eigenvectors

and

. As we see above,

. Putting them into a matrix, we get

, which transforms the original basis vectors

and

into

and

, and which is just a rotation matrix of

degrees. Applying the change of basis, we know

. Why is this true? (If you already understand the change of basis, skip to "We're almost there".) First,

is the inverse matrix of

, which means applying it should transform the new basis vectors

and

back to the old ones

and

. If you displace yourself into the position of the new coordinate system made up of

and

, you'll think that

and

, which is where you want your basis vectors to go. So you put them into the columns of a matrix, and get

, which transforms your basis

and

back to

and

. Or maybe you'd like to think of it in another way. The inverse of

is just rotating back the basis vectors by

degrees, which is a rotation by

degrees. So the inverse matrix is just

. The change of basis includes viewing a linear transformation from different perspectives. Under the new coordinate system, we have the matrix

, but a new coordinate system which has

and

as its basis vectors would think that it was put under a linear transformation of

, its coordinate axes each stretched by the corresponding eigenvalue. We now interpret the latter one. Let's say you want to know the coordinates of

in the new coordinate system, assuming it to be

. Then, from the perspective of your new coordinate system,

. Next, the new coordinate axes are each just stretched by some factor, giving us

, which is where our observed vector lands up, say at

, however in the language of your new coordinate system. The old coordinate system would rather think of it as

. Multiplying all this bunch of things together, we get where

lands up after the linear transformation,

, which is exactly the same as

. Therefore we've now understood the equation

. We're almost there. Remember that we were trying to figure out whether the term

is positive or negative? We now translate it into the coordinate system made up of its eigenvectors, to bring in a diagonal matrix that we are perfectly comfortable with:

This looks very symmetric.

If we let

and

. We shall see that:

Is this familiar to you? Under the new coordinate system (in which the coordinate axes are still perpendicular to each other), the function can be rewritten as

, where we have

as the

-axis and

as the

-axis. The

on the z-m plane is stretched along the vertical direction by a factor of

, the

on the z-n plane by a factor of

. Also, the curved surface is symmetric with respect to both the z-m plane and the z-n plane (observe the function from top to down).

Alter a, b, and c and see what happens. Keep an eye on the eigenvalues lambda. (The lambdas are under the function f(x, y) in the left column. I somehow couldn't put it into the graph. Sorry about that.)

So the conclusion is: If

and

, then

is always true, thus the stable point is a minimum. If

and

, then

is always true, thus the stable point is a maximum. If

, then

can both be negative and positive, thus the stable point is a saddle point.

Interpreting the Hessian Matrix to Identify a Stable Point

Alter a and c and observe the shape of the curved surface from different angles.

You can alter θ.

Alter a, b, and c and see what happens. Keep an eye on the eigenvalues lambda. (The lambdas are under the function f(x, y) in the left column. I somehow couldn't put it into the graph. Sorry about that.)

Nouvelles ressources

Découvrir des ressources

Découvrir des Thèmes