Matrix Calculus Notes - University Of Washington

Transcription

Matrix Calculus NotesJames Burke2000

Linear Spaces and OperatorsX and Y – normed linear spaces with norms k·kx and k·ky .L : X Y is a linear transformations (or operators) ifL(αx βz) αL(x) βL(z) x, z X and α, β R.L[X, Y] the normed space of continuous linear operators:kT k : sup kT xky T L[X, Y].kxkx 1X : L[X, R] – topological dual of X with the duality pairinghφ, xi φ(x) (φ, x) X X.The duality pairing gives rise to adjoints of a linear operator:T L[X, Y] defines T L[Y , X ] byhy , T (x)i hT (y ), xi (y , x) Y X.

Matrix Representations{xj }nj 1 and {y i }mi 1 are bases for X and Y.Given x Pnj 1 aj xj X, the linear mappingκx (a1 , . . . , an )Tis a linear isomorphism between X and Rn and is called thecoordinate mapping from X to Rn associated with the basis{xj }nj 1 .

Matrix Representations{xj }nj 1 and {y i }mi 1 are bases for X and Y.Given x Pnj 1 aj xj X, the linear mappingκx (a1 , . . . , an )Tis a linear isomorphism between X and Rn and is called thecoordinate mapping from X to Rn associated with the basis{xj }nj 1 .η – coordinate mapping from Y to Rm with the basis {y i }mi 1 .Given T L[X, Y], there exist uniquely defined{tij i 1, . . . , m, j 1, . . . , n } R such thatmXtij y i , j 1, . . . , n.T xj i 1Therefore, η(T x) T κ(x) where where (tij ) T Rm n .

The Kronecker ProductA, C Rm n and B Rs t . The Kronecker product of A withB is the ms mt matrix given by a11 B a12 B · · · a1n B a21 B a22 B · · · a2n B A B : . . . am1 B am2 B · · · 0 1A [2 1], B 2 1 0 2 01A B 4 2 2 1 amn B

The Matrix Coordinate Map “vec”Given A Rm n , A·1 A·2 vec(A) : . , . A·nthat is, vec(A) is the mn vector obtained by stacking thecolumns of A on top of each other.Clearly, hA, BiF vec(A)T vec(B) and vec L[Rm n , Rmn ].

Properties of the Kronecker Product1. A B C (A B) C A (B C)2. (A B)(C D) AC BD when AC and BD exist.3. (A B)T (AT B T )4. (A† B † ) (A B)† , where M † is the Moore-Penrosepseudo inverse of the matrix M .5. For vectors a and b, vec(abT ) b a.6. If AXB is a well defined matrix product, thenvec(AXB) (B T A)vec(X).In particular,vec(AX) (I A)vec(X)and vec(XB) (B T I)vec(X),where I is interpreted as the identity matrix of theappropriate dimension.

Spectrum of the Kronecker ProductA Rn n and B Rm m have spectrum {λi }ni 1 , {µi }mi 1 withmultiplicity, resp.ly.Then the eigenvalues of A B areλi µj , i, j 1, . . . , nand the eigenvalues of (In A) (B Im ) areλi µj , i, j 1, . . . , n.(In A) (B Im ) is called the Kronecker sum of A and B.In particular,tr (A B) tr (A) tr (B) and det(A B) det An det(B)m .

Singular Values of the Kronecker ProductA Rm n and B Rs t have singular value decompositionsUA ΣA VAT and UB ΣB VVT . Then, after reordering, the singularvalue decomposition of A B is(UA UB )(ΣA ΣB )(VAT VBT ).In particular, the nonzero singular values of A B areσi (A)σj (B), i 1, . . . , rank A, j 1, . . . , rank B.

Matrix Representations for L[Rm n , Rs t ]On Rm n the matrices{Eij i 1, . . . , m, j 1, . . . , n } ,where Eij is the matrix having a one in the ij position and zeroelsewhere, form the standard unit coordinate basis for Rm n .Observe that vec is the coordinate mapping on Rm n associatedwith this basis, where the coordinates are ordered by columns.We show how to use vec and to compute a matrixrepresentation for of elements L[Rm n , Rs t ] with respect to thestandard unit coordinate bases on Rm n and Rs t .

Example 1Let A Rs m and B Rn t and define T L[Rm n , Rs t ] byT (X) AXB.Then, using the coordinate mapping vec we getvec(T (X)) vec(AXB) (B T A)vec(X).Hence, the matrix representation of T in the coordinate bases isT (B T A) .

Example 2Define T L[Rn n , Rn n ] by T (X) AX XB, whereA, B Rn n . Thenvec(T (X)) vec(AX) vec(XB) (I A)vec(X) (B T I)vec(X) [(In A) (B T In )]vec(X).That is, the matrix representation of T in the unit coordinatebases isT (In A) (B T In )the Kronecker sum of A and B T .

The Derivative of detThe standard way to compute the derivative of the determinant is touse Laplace’s formula: i0 , j0 {1, 2, . . . , n}det(X) nXxij0 ( 1)i j0 det(X(i, j0 )) i 1nXxi0 j ( 1)i0 j det(X(i0 , j)) ,j 1where X(i, j) R(n 1) (n 1) is obtained from X by deleting the ithrow and j th column. This formula immediately tells us that det(X) ( 1)i j det(X(i, j)) i, j {1, 2, . . . , n}. xijConsequently, the derivative of the determinant can be written interms of the classical adjoint of X: Tadj(A) : ( 1)i j det(X(i, j)) .That is,(det(·))0 (X)(D) adj(X)T , DF tr (adj(X)D) so det(X) adj(X)T .In differential notation, d det(X) adj(X)T , dXexplicitly describes how to apply the chain rule.F, which more

The DeterminantThe determinate is the unique multilinear form on the columns (orrows) whose value at the identity is 1. Determinants have a muchlonger history than do matrices themselves .They were derived tosolve linear systems long before the invention of matrices. Theculmination of this effort is what we now call Cramer’s rule. Cramer’srule tells us thatA adj(A) adj(A)A det(A)In .So, when det(A) 6 0, then A 1 exists and we haveA 1 1adj(A) det(A 1 ) adj(A)det(A)adj(A) det(A) A 1 .In particular, when A 1 exists, we have det(A) adj(A)T det(A) A T .and

The Banach LemmaThe spectral radius of A Rn n is the maximum modulus of itsspectrum,ρ(A) : max { λ det(λ A) 0 } .Lemma: Given A Rn n , if ρ(A) 1, then (I A) 1 existsand is given by the geometric series(I A) 1 I A A2 A3 . . . .In addition, we have11 ρ((I A) 1 ) .1 ρ(A)1 ρ(A)

Derivatives of A 1 : d(X) 1 X 1 (dX)X 1The general linear group of degree n over R, GLn (R), is the set of realnonsingular n n matrices.Define Φ : GLn (R) GLn (R) by Φ(A) : A 1 . Let A GLn (R) and A Rn n be such that ρ(A 1 A) 1, then(A A) 1 (A(I A 1 A)) 1 (I A 1 A) 1 A 12 (I A 1 A) A 1 A) o(k Ak ))A 1 A 1 A 1 AA 1 A 1 AA 1 AA 1(Banach Lemma)2 o(k Ak ).So Φ0 (A)(D) A 1 DA 1 and Φ00 (A)(D, D) 2A 1 DA 1 DA 1 ,andvec(Φ0 (A)(D)) vec(A 1 DA 1 ) (A T A 1 )vec(D) Φ(A) A T A 1This procedure shows that Φ is C and all these derivatives are easilycomputed from the Banach Lemma.

Chain Rule Example(ln det(X T V 1 X)ψ(V ) : , V Sn , otherwise,ψ 0 (V )(D) (ln det(·))(X T V 1 X), (X T (·) 1 X)0 (V )(D) (X T V 1 X) 1 , X T V 1 DV 1 X tr (X T V 1 X) 1 X T V 1 DV 1 X tr V 1 X(X T V 1 X) 1 X T V 1 D V 1 X(X T V 1 X) 1 X T V 1 , D ψ(V ) V 1 X(X T V 1 X) 1 X T V 1 Sn .

Matrix Representations fxjgn j 1 and fy igm i 1 are bases for X and Y. Given x P n j 1 a jx j2X, the linear mapping x! (a 1;:::;a n) T is a linear isomorphism between X and Rnand is called the coordinate mapping from X to Rnassociated with the basis fxjgn j 1. { coordinate mapping from Y to Rmwith the basis fyigm i 1