Chapter 6 r A X : The Minimum Norm Solution and the Least ...

EE448/528

Version 1.0

John Stensby

Chapter 6 r r

AX = b: The Minimum Norm Solution and the Least-Square-Error Problem

Like the previous chapter, this chapter deals with the linear algebraic equation problem

r r

AX = b. However, in this chapter, we impose additional structure in order to obtain specialized,

r

r r

but important, results. First, we consider Problem #1: b is in the range of A. For AX = b, either

a unique solution exists or an infinite number of solutions exist. We want to find the solution with

minimum norm. The minimum norm solution always exists, and it is unique. Problem #1 is called

r

the minimum norm problem. Next, we consider Problem #2: b is not in the range of A so that

r r

r

rr

AX = b has no solution. Of all the vectors X that minimize AX - b 2 , we want to find the one

with minimum norm. Problem #2 is called the minimum norm, least-square-error problem. Its

solution always exists and it is unique.

It should be obvious that Problems #1 and #2 are special cases of a more general

r r

r

r

problem. That is, given AX = b (with no conditions placed on b), we want to find the X which

rr

r

r

simultaneously minimizes both

r

AX - b 2 and

X 2.

Such an X always exists, it is always unique,

and it is linearly related to b. Symbolically, we write

r r

X = A+ b ,

(6-1)

where A+ denotes a linear operator called the pseudo inverse of A (yes, if A-1 exists, then A+ = A-1

r r

r r

r r

and X = A-1b). For Problem #1, X = A+ b will be a solution of AX = b, and it will be the "shortest

r r

length" (minimum norm) solution. For Problem #2, X = A+ b simultaneously minimizes both

rr

r

r r

AX - b 2rand X 2 even though it does not satisfy AX = b (which has no solution for Problem

#2 where b R(A) ).

Problem #1: The Minimum Norm Problem

r

In this section we consider the case where b R(A). The system

CH6.DOC

Page 6-1

EE448/528

r r

AX = b

Version 1.0

John Stensby (6-2)

has a unique solution or it has and infinite number of solutions as described by (5-7). If (6-2) has a unique solution, then this solution is the minimum norm solution, by default. If (6-2) has an infinite number of solutions, then we must find the solution with the smallest norm. In either case, the minimum norm solution is unique, and it is characterized as being orthogonal to K(A), as shown in what follows.

r

Each solution X of (6-2) can be uniquely decomposed into

rr

r

X = XK XK ,

(6-3)

where

r X K

K( A )

= R(A)

r

.

XK K(A)

(6-4)

However

rr

r

rr

AX = A(XK XK ) = AXK = b ,

(6-5)

so

r X K

K( A )

r

r

is the only part of X that is significant in generating b.

Theorem 6.1

In the decomposition (6.3),

r r

solutions of AX = b. Equivalently,

there there

is is

r only one vector XK

r a unique vector XK

K(A) that is common

K( A )

for

which

r A XK

to all

r

= b.

Proof: To show this, assume there are two, and arrive at a contradiction. We assume the

existence of

r X K

K( A )

r and YK

K( A )

rr

r

such that A XK = b and A YK

r

= b.

Then, simple

CH6.DOC

Page 6-2

EE448/528

Version 1.0

John Stensby

rr

r

rr

subtraction leads to the conclusion that A( XK - YK ) = 0, or the difference XK - YK K(A).

rr

This one

contradiction (as we

r X K

K( A )

that is

know, XK common to

- YK

K(A) ) leads

r r

all solutions of AX = b

to the (each

conclusion that there is only solution has a decomposition

r

of the form (6.3) where a common XK is used).

Theorem 6.2

r The unique solution XK

K( A )

r r

is the minimum norm solution of AX = b.

That is,

r

r

X K

<

2

X,

2

(6-6)

r

r r

r r

where X is any other (i.e., X XK ) solution of AX = b.

Proof:

rr r

r r

Let X, X XK , be any solution of AX = b. As shown by (6-3), we can write the decomposition

rr

r

X = XK XK ,

(6-7)

r

where XK K(A). Clearly, we have

r2 r

r2 r

rr

r

X= 2

X K

XK 2 =

X K

XK , XK

XK

rr

rr

= XK , XK + XK, XK (due to orthogonality of vectors)

=

r X K

2+

2

r XK

2

2

,

(6-8)

r

r

so that XK 2 < X 2 as claimed.

When viewed geometrically, Theorem 6.2 is obvious. As shown by Figure 6.1, the

r r

solution set for AX = b is a linear variety, a simple translation of K(A). Figure 6.1 shows an

CH6.DOC

Page 6-3

EE448/528

Version 1.0

John Stensby

r

b

r=

Linear

Variety

Containing r XK

Solutions

of

AX

r X

r XK

r

X is non - optimum

r XK

K(A) is optimum (minimum norm)

rr r

r

X = XK + XK where XK K(A)

K[A]

Figure 6-1: The solution set is a linear variety. The minimum norm solution is orthogonal to K(A).

r

r

arbitrary solution X, and it shows the "optimum", minimum norm, solution XK . Clearly, the

solution becomes "optimum" when it is exactly orthogonal to K(A) (i.e., it is in K(A)). We have

solved Problem #1.

The Pseudo Inverse A+

r

As shown by Theorems 6.1 and 6.2, when b R(A), a unique minimum norm solution to

r r

AX = b always exists.

And, this solution is in R(A*) = K(A).

r

Hence we have a mapping from b

R(A)

back

to

r

X

R(A*)

(at

this

point,

it

might

be

a

good

idea

to

study

Fig.

5.1

once

again).

It

is not difficult to show that this mapping is linear, one-to-one and onto. We denote this mapping

by

A+ : R(A) R(A*),

(6-9)

and

we

write

r

X

=

r

A+b,

for

r

b

R(A)

and

r

X

R(A*).

Finally, our unique minimum norm solution

r rr

r

r

to the AX = b, b R(A), problem (i.e., "Problem #1") is denoted symbolically by XK = A+b.

CH6.DOC

Page 6-4

EE448/528

Version 1.0

John Stensby

As a subspace of U, R(A*) is a vector space in its own right.

r

Every X in vector space

R(A*) U gets mapped by matrix A to something in vector space R(A) V. By restricting the

domain of A to be just R(A*) (instead of the whole U), we have a mapping with domain R(A*)

and co-domain R(A). Symbolically, we denote this mapping by

YA

: R(A) R(A) ,

R[A ]

(6-10)

and we call it the restriction of A to R(A*). The mapping (6-10) is linear, one-to-one, and onto (even though A : U V may not be one-to-one or onto). More importantly, the inverse of mapping (6-10) is the mapping (6-9). As is characteristic of an operator and its inverse, we have

YA

A+

r b

=

r b

for

all

r b

R(A)

R[A ]

Y A+ A

r X

=

r X

for

all

r X

R(A )

,

R[A ]

(6-11)

as expected. Finally, the relationship between (6-9) and (6-10) is illustrated by Fig. 6-2. As defined so far, the domain of A+ is only R(A) V. This is adequate for our discussion

r

of "Problem #1"-type problems where b R(A). However, for our discussion of "Problem #2"

r

(and the more general optimization problem, discussed in the paragraph containing (6-1), where b V), we must extend the domain of A+ to be all of V = R(A) K(A*). A+ is already defined on

A

R(A*)

R(A*) = K(A)

R(A) = K(A*)

A+

Figure 6-2: Restricted to R(A*) = K(A), A is one-to-one and onto, and it has the inverse A+.

CH6.DOC

Page 6-5

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download