The Matrix Cookbook - Mathematics

Transcription

The Matrix Cookbook[ http://matrixcookbook.com ]Kaare Brandt PetersenMichael Syskind PedersenVersion: November 15, 20121

IntroductionWhat is this? These pages are a collection of facts (identities, approximations, inequalities, relations, .) about matrices and matters relating to them.It is collected in this form for the convenience of anyone who wants a quickdesktop reference .Disclaimer: The identities, approximations and relations presented here wereobviously not invented but collected, borrowed and copied from a large amountof sources. These sources include similar but shorter notes found on the internetand appendices in books - see the references for a full list.Errors: Very likely there are errors, typos, and mistakes for which we apologize and would be grateful to receive corrections at cookbook@2302.dk.Its ongoing: The project of keeping a large repository of relations involvingmatrices is naturally ongoing and the version will be apparent from the date inthe header.Suggestions: Your suggestion for additional content or elaboration of sometopics is most welcome acookbook@2302.dk.Keywords: Matrix algebra, matrix relations, matrix identities, derivative ofdeterminant, derivative of inverse matrix, differentiate a matrix.Acknowledgements: We would like to thank the following for contributionsand suggestions: Bill Baxter, Brian Templeton, Christian Rishøj, ChristianSchröppel, Dan Boley, Douglas L. Theobald, Esben Hoegh-Rasmussen, EvripidisKarseras, Georg Martius, Glynne Casteel, Jan Larsen, Jun Bin Gao, JürgenStruckmeier, Kamil Dedecius, Karim T. Abou-Moustafa, Korbinian Strimmer,Lars Christiansen, Lars Kai Hansen, Leland Wilkinson, Liguo He, Loic Thibaut,Markus Froeb, Michael Hubatka, Miguel Barão, Ole Winther, Pavel Sakov,Stephan Hattinger, Troels Pedersen, Vasile Sima, Vincent Rabaud, ZhaoshuiHe. We would also like thank The Oticon Foundation for funding our PhDstudies.Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 2

CONTENTSCONTENTSContents1 Basics1.1 Trace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.2 Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.3 The Special Case 2x2 . . . . . . . . . . . . . . . . . . . . . . . . .2 Derivatives2.1 Derivatives2.2 Derivatives2.3 Derivatives2.4 Derivatives2.5 Derivatives2.6 Derivatives2.7 Derivatives2.8 Derivativesofofofofofofofofa Determinant . . . . . . . . . . . .an Inverse . . . . . . . . . . . . . . .Eigenvalues . . . . . . . . . . . . . .Matrices, Vectors and Scalar FormsTraces . . . . . . . . . . . . . . . . .vector norms . . . . . . . . . . . . .matrix norms . . . . . . . . . . . . .Structured Matrices . . . . . . . . .3 Inverses3.1 Basic . . . . . . . . . . .3.2 Exact Relations . . . . .3.3 Implication on Inverses .3.4 Approximations . . . . .3.5 Generalized Inverse . . .3.6 Pseudo Inverse . . . . .6667.889101012141414.171718202021214 Complex Matrices244.1 Complex Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . 244.2 Higher order and non-linear derivatives . . . . . . . . . . . . . . . 264.3 Inverse of complex sum . . . . . . . . . . . . . . . . . . . . . . . 275 Solutions and Decompositions5.1 Solutions to linear equations .5.2 Eigenvalues and Eigenvectors5.3 Singular Value Decomposition5.4 Triangular Decomposition . .5.5 LU decomposition . . . . . .5.6 LDM decomposition . . . . .5.7 LDL decompositions . . . . .28283031323233336 Statistics and Probability346.1 Definition of Moments . . . . . . . . . . . . . . . . . . . . . . . . 346.2 Expectation of Linear Combinations . . . . . . . . . . . . . . . . 356.3 Weighted Scalar Variable . . . . . . . . . . . . . . . . . . . . . . 367 Multivariate Distributions7.1 Cauchy . . . . . . . . . .7.2 Dirichlet . . . . . . . . . .7.3 Normal . . . . . . . . . .7.4 Normal-Inverse Gamma .7.5 Gaussian . . . . . . . . . .7.6 Multinomial . . . . . . . .37373737373737Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 3

CONTENTS7.77.87.9CONTENTSStudent’s t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Wishart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Wishart, Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Gaussians8.1 Basics . . . . . . . .8.2 Moments . . . . . .8.3 Miscellaneous . . . .8.4 Mixture of Gaussians373839.40404244449 Special Matrices9.1 Block matrices . . . . . . . . . . . . . . . .9.2 Discrete Fourier Transform Matrix, The . .9.3 Hermitian Matrices and skew-Hermitian . .9.4 Idempotent Matrices . . . . . . . . . . . . .9.5 Orthogonal matrices . . . . . . . . . . . . .9.6 Positive Definite and Semi-definite Matrices9.7 Singleentry Matrix, The . . . . . . . . . . .9.8 Symmetric, Skew-symmetric/Antisymmetric9.9 Toeplitz Matrices . . . . . . . . . . . . . . .9.10 Transition matrices . . . . . . . . . . . . . .9.11 Units, Permutation and Shift . . . . . . . .9.12 Vandermonde Matrices . . . . . . . . . . . .4646474849495052545455565710 Functions and Operators10.1 Functions and Series . . . . .10.2 Kronecker and Vec Operator10.3 Vector Norms . . . . . . . . .10.4 Matrix Norms . . . . . . . . .10.5 Rank . . . . . . . . . . . . . .10.6 Integral Involving Dirac Delta10.7 Miscellaneous . . . . . . . . .5858596161626263. . . . . . . . . . . . . . . . . . . . . . . . . .Functions. . . . . .A One-dimensional Results64A.1 Gaussian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64A.2 One Dimensional Mixture of Gaussians . . . . . . . . . . . . . . . 65B Proofs and Details66B.1 Misc Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 4

CONTENTSCONTENTSNotation and NomenclatureAAijAiAijAnA 1A A1/2(A)ijAij[A]ijaaiaia z z Z z z ZMatrixMatrix indexed for some purposeMatrix indexed for some purposeMatrix indexed for some purposeMatrix indexed for some purpose orThe n.th power of a square matrixThe inverse matrix of the matrix AThe pseudo inverse matrix of the matrix A (see Sec. 3.6)The square root of a matrix (if unique), not elementwiseThe (i, j).th entry of the matrix AThe (i, j).th entry of the matrix AThe ij-submatrix, i.e. A with i.th row and j.th column deletedVector (column-vector)Vector indexed for some purposeThe i.th element of the vector aScalarReal part of a scalarReal part of a vectorReal part of a matrixImaginary part of a scalarImaginary part of a vectorImaginary part of a matrixdet(A)Tr(A)diag(A)eig(A)vec(A)sup A ATA TA AHDeterminant of ATrace of the matrix ADiagonal matrix of the matrix A, i.e. (diag(A))ij δij AijEigenvalues of the matrix AThe vector-version of the matrix A (see Sec. 10.2.2)Supremum of a setMatrix norm (subscript if any denotes what norm)Transposed matrixThe inverse of the transposed and vice versa, A T (A 1 )T (AT ) 1 .Complex conjugated matrixTransposed and complex conjugated matrix (Hermitian)A BA BHadamard (elementwise) productKronecker product0IJijΣΛThe null matrix. Zero in all entries.The identity matrixThe single-entry matrix, 1 at (i, j) and zero elsewhereA positive definite matrixA diagonal matrixPetersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 5

11Basics(AB) 1(ABC.) 1T 1(A )1.2B 1 A 1 .C 1(1) 1B 1A 1 T)(2) (AT TA B(4)(AB)T BT AT(5)(ABC.)T .CT BT AT(6)(A B)1.1BASICS(3)TH 1 1 H(A )(A B)H (A )AH BH(AB)H BH AHHH(ABC.) Tr(A) PTr(A) (7)(8)(9)H.C B AH(10)TracePiAii(11)i λi ,Tλi eig(A)(12)Tr(A) Tr(A )(13)Tr(AB) Tr(BA)(14)Tr(A B) Tr(A) Tr(B)(15)Tr(ABC) Tr(BCA) Tr(CAB)(16)aT a Tr(aaT )(17)DeterminantLet A be an n n matrix.det(A) Qi λiλi eig(A)det(cA) cn det(A),det(AT ) if A Rn ndet(A)(18)(19)(20)det(AB) det(A) det(B)(21)det(A 1 ) 1/ det(A)(22)det(An ) det(A)n(23)Tdet(I uv ) T1 u v(24)det(I A) 1 det(A) Tr(A)(25)11det(I A) 1 det(A) Tr(A) Tr(A)2 Tr(A2 )22(26)For n 2:For n 3:Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 6

1.3The Special Case 2x21BASICSFor n 4:det(I A) 1 det(A) Tr(A) 121 Tr(A)2 Tr(A2 )21113 Tr(A) Tr(A)Tr(A2 ) Tr(A3 )623(27)For small ε, the following approximation holds11det(I εA) 1 det(A) εTr(A) ε2 Tr(A)2 ε2 Tr(A2 )221.3(28)The Special Case 2x2Consider the matrix A A A11A21A12A22 Determinant and tracedet(A) A11 A22 A12 A21(29)Tr(A) A11 A22(30)Eigenvaluesλ2 λ · Tr(A) det(A) 0λ1 Tr(A) pTr(A)2 4 det(A)2λ1 λ2 Tr(A)λ2 Tr(A) pTr(A)2 4 det(A)2λ1 λ2 det(A)Eigenvectors v1 A12λ1 A11InverseA 1 1 det(A) v2 A22 A21A12λ2 A11 A12A11 (31)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 7

22DERIVATIVESDerivativesThis section is covering differentiation of a number of expressions with respect toa matrix X. Note that it is always assumed that X has no special structure, i.e.that the elements of X are independent (e.g. not symmetric, Toeplitz, positivedefinite). See section 2.8 for differentiation of structured matrices. The basicassumptions can be written in a formula as Xkl δik δlj Xijthat is for e.g. vector forms, x xi y i y x y i(32) x yi x y ij xi yjThe following rules are general and very useful when deriving the differential ofan expression ([19]): A (αX) (X Y) (Tr(X)) (XY) (X Y) (X Y) (X 1 ) (det(X)) (det(X)) (ln(det(X))) XT XH2.12.1.1 0α X X YTr( X)( X)Y X( Y)( X) Y X ( Y)( X) Y X ( Y) X 1 ( X)X 1Tr(adj(X) X)det(X)Tr(X 1 X)Tr(X 1 X)( X)T( X)H(A is a 43)(44)(45)Derivatives of a DeterminantGeneral form det(Y) xX det(X)Xjk Xik 1 Ydet(Y)Tr Y x(46) δij det(X)(47)k 2 det(Y) x2" "det(Y) Tr Y Y 1 x# x Y Y Tr Y 1Tr Y 1 x x # 1 Y 1 YY Tr Y x x(48)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 8

2.2Derivatives of an Inverse2.1.22DERIVATIVESLinear forms det(X) XX det(X)Xjk Xik det(X)(X 1 )T δij det(X)(49)(50)k det(AXB) X2.1.3 det(AXB)(X 1 )T det(AXB)(XT ) 1(51)Square formsIf X is square and invertible, then det(XT AX) 2 det(XT AX)X T X(52)If X is not square but A is symmetric, then det(XT AX) 2 det(XT AX)AX(XT AX) 1 X(53)If X is not square and A is not symmetric, then det(XT AX) det(XT AX)(AX(XT AX) 1 AT X(XT AT X) 1 ) X2.1.4(54)Other nonlinear formsSome special cases are (See [9, 7]) ln det(XT X) X ln det(XT X) X ln det(X) X det(Xk ) X2.2 2(X )T 2XT (X 1 )T (XT ) 1 k det(Xk )X T(55)(56)(57)(58)Derivatives of an InverseFrom [27] we have the basic identity Y 1 Y 1 Y 1Y x x(59)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 9

2.3Derivatives of Eigenvalues2DERIVATIVESfrom which it follows (X 1 )kl Xij (X 1 )ki (X 1 )jl(60) aT X 1 b X T abT X T(61) X det(X 1 ) det(X 1 )(X 1 )T(62) X Tr(AX 1 B) (X 1 BAX 1 )T(63) X Tr((X A) 1 ) ((X A) 1 (X A) 1 )T(64) XFrom [32] we have the following result: Let A be an n n invertible squarematrix, W be the inverse of A, and J(A) is an n n -variate and differentiablefunction with respect to A, then the partial differentials of J with respect to Aand W satisfy J T J A TA A W2.3Derivatives of Eigenvalues Xeig(X) Tr(X) I(65) X X Y eig(X) det(X) det(X)X T(66) X XIf A is real and symmetric, λi and vi are distinct eigenvalues and eigenvectorsof A (see (276)) with viT vi 1, then [33] λi vi2.42.4.1 viT (A)vi (67) (λi I A) (A)vi(68)Derivatives of Matrices, Vectors and Scalar FormsFirst Order xT a x aT Xb X aT XT b X aT Xa X X Xij (XA)ij Xmn (XT A)ij Xmn aT x x abT(70) baT(71) aT XT a X Jij δim (A)nj (Jmn A)ij(74) δin (A)mj (Jnm A)ij(75) a (69)aaT(72)(73)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 10

2.4Derivatives of Matrices, Vectors and Scalar Forms2.4.22DERIVATIVESSecond Order XXkl Xmn Xij 2Xklmn bT XT Xc X (Bx b)T C(Dx d) x (XT BX)kl Xij (XT BX) XijXkl(76)kl X(bcT cbT )(77) BT C(Dx d) DT CT (Bx b)(78) δlj (XT B)ki δkj (BX)il(79) XT BJij Jji BX(Jij )kl δik δjl (80)See Sec 9.7 for useful properties of the Single-entry matrix Jij xT Bx x bT XT DXc X (Xb c)T D(Xb c) X (B BT )x DT XbcT DXcbT (D DT )(Xb c)bT(81)(82)(83)Assume W is symmetric, then (x As)T W(x As) 2AT W(x As) s (x s)T W(x s) 2W(x s) x (x s)T W(x s) 2W(x s) s (x As)T W(x As) 2W(x As) x (x As)T W(x As) 2W(x As)sT A(84)(85)(86)(87)(88)As a case with complex values the following holds (a xH b)2 x 2b(a xH b) (89)This formula is also known from the LMS algorithm [14]2.4.3Higher-order and non-linearn 1X (Xn )kl (Xr Jij Xn 1 r )kl Xijr 0(90)For proof of the above, see B.1.3.n 1X T na X b (Xr )T abT (Xn 1 r )T Xr 0(91)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 11

2.5Derivatives of Traces2DERIVATIVESn 1Xh T n T na (X ) X b XXn 1 r abT (Xn )T Xrr 0 (Xr )T Xn abT (Xn 1 r )Ti(92)See B.1.3 for a proof.Assume s and r are functions of x, i.e. s s(x), r r(x), and that A is aconstant, then T T r s TAr AT s(93)s Ar x x x (Ax)T (Ax) x (Bx)T (Bx) 2.4.4 xT AT Ax x xT BT BxAT AxxT AT AxBT Bx2 T 2x BBx(xT BT Bx)2(94)(95)Gradient and HessianUsing the above we have for the gradient and the Hessianf f x f x 2f x xT2.5 xT Ax bT x(96) (A AT )x b(97) A AT(98)Derivatives of TracesAssume F (X) to be a differentiable function of each of the elements of X. Itthen holds that Tr(F (X)) f (X)T Xwhere f (·) is the scalar derivative of F (·).2.5.1First Order Tr(X) I X Tr(XA) AT X Tr(AXB) AT BT X Tr(AXT B) BA X Tr(XT A) A X Tr(AXT ) A X Tr(A X) Tr(A)I X(99)(100)(101)(102)(103)(104)(105)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 12

2.5Derivatives of Traces2.5.22DERIVATIVESSecond Order Tr(X2 ) 2XT X Tr(X2 B) (XB BX)T X Tr(XT BX) X Tr(BXXT ) X Tr(XXT B) X Tr(XBXT ) X Tr(BXT X) X Tr(XT XB) X Tr(AXBX) X Tr(XT X) X Tr(BT XT CXB) X Tr XT BXC X Tr(AXBXT C) Xhi Tr (AXB C)(AXB C)T X Tr(X X) X(106)(107) BX BT X(108) BX BT X(109) BX BT X(110) XBT XB(111) XBT XB(112) XBT XB(113) AT XT BT BT XT AT(114) Tr(XXT ) X(115) CT XBBT CXBBT 2X BXC BT XCT(116)(117) AT CT XBT CAXB(118) 2AT (AXB C)BT(119) Tr(X)Tr(X) 2Tr(X)I(120) XSee [7].2.5.3Higher Order Tr(Xk ) X k(Xk 1 )Tk 1X Tr(AXk ) (Xr AXk r 1 )T Xr 0 T T T CXXT CXBBT X Tr B X CXX CXB(121)(122) CT XBBT XT CT X CXBBT XT CX CT XXT CT XBBT(123)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 13

2.6Derivatives of vector norms2DERIVATIVES2.5.4Other Tr(AX 1 B) (X 1 BAX 1 )T X T AT BT X T XAssume B and C to be symmetric, thenhi Tr (XT CX) 1 A Xhi Tr (XT CX) 1 (XT BX) X(124) (CX(XT CX) 1 )(A AT )(XT CX) 1 (125) 2CX(XT CX) 1 XT BX(XT CX) 1 2BX(XT CX) 1hi Tr (A XT CX) 1 (XT BX) X(126) 2CX(A XT CX) 1 XT BX(A XT CX) 1 2BX(A XT CX) 1(127)See [7]. Tr(sin(X)) X2.62.6.12.7 cos(X)T(128)Derivatives of vector normsTwo-norm x a x a 2 x x a 2(129) x aI(x a)(x a)T x kx ak2kx ak2kx ak32(130) xT x 2 x 22 2x x x(131)Derivatives of matrix normsFor more on matrix norms, see Sec. 10.4.2.7.1Frobenius norm X 2F Tr(XXH ) 2X(132) X XSee (248). Note that this is also a special case of the result in equation 119.2.8Derivatives of Structured MatricesAssume that the matrix A has some structure, i.e. symmetric, toeplitz, etc.In that case the derivatives of the previous section does not apply in general.Instead, consider the following general rule for differentiating a scalar functionf (A)" # TX f Akldf f A Tr(133)dAij Akl Aij A AijklPetersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 14

2.8Derivatives of Structured Matrices2DERIVATIVESThe matrix differentiated with respect to itself is in this document referred toas the structure matrix of A and is defined simply by A Sij Aij(134)If A has no special structure we have simply Sij Jij , that is, the structurematrix is simply the single-entry matrix. Many structures have a representationin singleentry matrices, see Sec. 9.7.6 for more examples of structure matrices.2.8.1The Chain RuleSometimes the objective is to find the derivative of a matrix which is a functionof another matrix. Let U f (X), the goal is to find the derivative of thefunction g(U) with respect to X: g(f (X)) g(U) X X(135)Then the Chain Rule can then be written the following way:MN g(U) X X g(U) ukl g(U) X xij ukl xij(136)k 1 l 1Using matrix notation, this can be written as:h g(U) U i g(U))T Tr (. Xij U Xij2.8.2(137)SymmetricIf A is symmetric, then Sij Jij Jji Jij Jij and therefore T f f fdf diagdA A A A(138)That is, e.g., ([5]): Tr(AX) X det(X) X ln det(X) X2.8.3 A AT (A I), see (142)(139) det(X)(2X 1 (X 1 I))(140) 2X 1 (X 1 I)(141)DiagonalIf X is diagonal, then ([19]): Tr(AX) X A I(142)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 15

2.8Derivatives of Structured Matrices2.8.42DERIVATIVESToeplitzLike symmetric matrices and diagonal matrices also Toeplitz matrices has aspecial structure which should be taken into account when the derivative withrespect to a matrix with Toeplitz structure. Tr(AT) T Tr(TA) T(143)Tr(A)Tr([AT ]n1 )Tr([AT ]1n ))Tr(A)Tr([[AT ]1n ]n 1,2 ). .Tr([[AT ]1n ]2,n 1 ).A1n.···.···.Tr([[AT ]1n ]2,n 1 ).Tr([AT ]1n )) An1Tr([[AT ]1n ]n 1,2 )Tr([AT ]n1 )Tr(A) α(A)As it can be seen, the derivative α(A) also has a Toeplitz structure. Each valuein the diagonal is the sum of all the diagonal valued in A, the values in thediagonals next to the main diagonal equal the sum of the diagonal next to themain diagonal in AT . This result is only valid for the unconstrained Toeplitzmatrix. If the Toeplitz matrix also is symmetric, the same derivative yields Tr(TA) Tr(AT) α(A) α(A)T α(A) I T T(144)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 16

333.13.1.1INVERSESInversesBasicDefinitionThe inverse A 1 of a matrix A Cn n is defined such thatAA 1 A 1 A I,(145)where I is the n n identity matrix. If A 1 exists, A is said to be nonsingular.Otherwise, A is said to be singular (see e.g. [12]).3.1.2Cofactors and AdjointThe submatrix of a matrix A, denoted by [A]ij is a (n 1) (n 1) matrixobtained by deleting the ith row and the jth column of A. The (i, j) cofactorof a matrix is defined ascof(A, i, j) ( 1)i j det([A]ij ),The matrix of cofactors can be created from the cofactors cof(A, 1, 1)···cof(A, 1, n) .cof(A) .cof(A,i,j). cof(A, n, 1)···cof(A, n, n)(146)(147)The adjoint matrix is the transpose of the cofactor matrixadj(A) (cof(A))T ,3.1.3(148)DeterminantThe determinant of a matrix A Cn n is defined as (see [12])det(A) nX( 1)j 1 A1j det ([A]1j )j 1nXA1j cof(A, 1, j).(149)(150)j 13.1.4ConstructionThe inverse matrix can be constructed, using the adjoint matrix, byA 1 1· adj(A)det(A)(151)For the case of 2 2 matrices, see section 1.3.Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 17

3.2Exact Relations3.1.53INVERSESCondition numberThe condition number of a matrix c(A) is the ratio between the largest and thesmallest singular value of a matrix (see Section 5.3 on singular values),c(A) d d (152)The condition number can be used to measure how singular a matrix is. If thecondition number is large, it indicates that the matrix is nearly singular. Thecondition number can also be estimated from the matrix norms. Herec(A) kAk · kA 1 k,(153)where k · k is a norm such as e.g the 1-norm, the 2-norm, the -norm or theFrobenius norm (see Sec 10.4pfor more on matrix norms).The 2-norm of A equals (max(eig(AH A))) [12, p.57]. For a symmetricmatrix, this reduces to A 2 max( eig(A) ) [12, p.394]. If the matrix issymmetric and positive definite, A 2 max(eig(A)). The condition numberbased on the 2-norm thus reduces tokAk2 kA 1 k2 max(eig(A)) max(eig(A 1 )) 3.23.2.1max(eig(A)).min(eig(A))Exact RelationsBasic(AB) 1 B 1 A 13.2.2(154)(155)The Woodbury identityThe Woodbury identity comes in many variants. The latter of the two can befound in [12](A CBCT ) 1 A 1 A 1 C(B 1 CT A 1 C) 1 CT A 1 (156) 1 A 1 A 1 U(B 1 VA 1 U) 1 VA 1(A UBV)(157)If P, R are positive definite, then (see [30])(P 1 BT R 1 B) 1 BT R 1 PBT (BPBT R) 13.2.3(158)The Kailath Variant(A BC) 1 A 1 A 1 B(I CA 1 B) 1 CA 1(159)See [4, page 153].3.2.4Sherman-Morrison(A bcT ) 1 A 1 A 1 bcT A 11 cT A 1 b(160)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 18

3.2Exact Relations3.2.53INVERSESThe Searle Set of IdentitiesThe following set of identities, can be found in [25, page 151],(I A 1 ) 1T 1(A BB )A A(A B)A 1A 1 B 1(I AB)(I AB) 1 A3.2.6A(A I) 1B A(A 1 B 1 ) 1 1 1(161)TB(I B A 1 1B) A(A B) 1 B B(A B) 1 A B B(A B) A 1 1B 1(A B)B 1 I A(I BA)(162)(163)(164)(165)B A(I BA) 1(166)(167)Rank-1 update of inverse of inner productDenote A (XT X) 1 and that X is extended to include a new column vectorin the end X̃ [X v]. Then [34]"#TvvT XAT AXT vA vAXT v vT XAXT vTTTT 1v v v XAX v(X̃ X̃) vT XAT1vT v vT XAXT v3.2.7vT v vT XAXT vRank-1 update of Moore-Penrose InverseThe following is a rank-1 update for the Moore-Penrose pseudo-inverse of realvalued matrices and proof can be found in [18]. The matrix G is defined below:(A cdT ) A G(168)Using the the notationβv (169)A c(170)n (A )T d(171)w(I AA )cm 1 dT A c T(I A A) d(172)(173)the solution is given as six different cases, depending on the entities w , m , and β. Please note, that for any (column) vector v it holds that v vTvT (vT v) 1 v 2 . The solution is:Case 1 of 6: If w 6 0 and m 6 0. ThenG vw (m )T nT β(m )T w 1β1vwT mnT mwT w 2 m 2 m 2 w 2(174)(175)Case 2 of 6: If w 0 and m 6 0 and β 0. ThenG vv A (m )T nT11 vvT A mnT2 v m 2(176)(177)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 19

3.3Implication on Inverses3INVERSESCase 3 of 6: If w 0 and β 6 0. Then1βG mvT A β v 2 m 2 β 2 v 2m vβ T m 2 T(A ) v nβ(178)Case 4 of 6: If w 6 0 and m 0 and β 0. ThenG A nn vw 11A nnT vwT n 2 w 2(179)(180)Case 5 of 6: If m 0 and β 6 0. ThenG 1 βA nwT β n 2 w 2 β 2 w 2 A n vβ T n 2(181)w nβCase 6 of 6: If w 0 and m 0 and β 0. ThenG 3.3 vv A A nn v A nvn 11v T A n vvT A A nnT vnT22 v n v 2 n 2(182)(183)Implication on Inverses(A B) 1 A 1 B 1IfAB 1 A BA 1 Bthen(184)See [25].3.3.1A PosDef identityAssume P, R to be positive definite and invertible, then(P 1 BT R 1 B) 1 BT R 1 PBT (BPBT R) 1(185)See [30].3.4ApproximationsThe following identity is known as the Neuman series of a matrix, which holdswhen λi 1 for all eigenvalues λi(I A) 1 XAn(186)( 1)n An(187)n 0which is equivalent to(I A) 1 Xn 0When λi 1 for all eigenvalues λi , it holds that A 0 for n , and thefollowing approximations holds(I A) 1 1(I A) I A A2 I A A2(188)(189)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 20

3.5Generalized Inverse3INVERSESThe following approximation is from [22] and holds when A large and symmetricA A(I A) 1 A I A 1(190)If σ 2 is small compared to Q and M then(Q σ 2 M) 1 Q 1 σ 2 Q 1 MQ 1(191)Proof:(Q σ 2 M) 1 (192) 1 1 (193)((I σ 2 MQ 1 )Q) 1 (194) 1(QQ2Q σ MQ 1Q2Q) 1 1(I σ MQ)(195)This can be rewritten using the Taylor expansion:Q 1 (I σ 2 MQ 1 ) 1 1Q3.53.5.12 1(I σ MQ2 (σ MQ ) .) 1 2(196) 1Q2 σ Q 1 1MQ(197)Generalized InverseDefinitionA generalized inverse matrix of the matrix A is any matrix A such that (see[26])AA A A(198)The matrix A is not unique.3.63.6.1Pseudo InverseDefinitionThe pseudo inverse (or Moore-Penrose inverse) of a matrix A is the matrix A that fulfilsIAA A AIIA AA A IIIAA symmetricIVA A symmetricThe matrix A is unique and does always exist. Note that in case of complex matrices, the symmetric condition is substituted by a condition of beingHermitian.Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 21

3.6Pseudo Inverse3.6.23INVERSESPropertiesAssume A to be the pseudo-inverse of A, then (See [3] for some of them)(A ) A(AT ) (A )T(200) H(201) (202)H (A ) (A ) (A A)AT6 A(cA) A (AT A) T (AA )A(204) (1/c)A T (206) AT (AAT ) (207) A (AT ) (208)(A A) AT (A ) AH (209)H(210) AH (AAH ) (211)(A A) A (A A)(205)T (203)T A H(A ) AH A(A ) (A A)AH(199)H A (A )H (AA ) (AB) (A ) A f (AH A) f (0)IH (212)(213) (A AB) (ABB )(214) A [f (AAH ) f (0)I]AHHf (AA ) f (0)I A[f (A A) f (0)I]A (215)(216)where A Cn m .Assume A to have full rank, then(AA )(AA ) (A A)(A A)Tr(AA ) Tr(A A) AA (217) A A (218)rank(AA ) (See [26])(219)(See [26])(220) rank(A A) (A AB) (ABB ) For two matrices it hold that(AB) (A B)3.6.3 A B(221)(222)ConstructionAssume that A has full rank, thenA n nA n mA n mSquareBroadTallrank(A) nrank(A) nrank(A) m A A 1A AT (AAT ) 1A (AT A) 1 ATThe so-called ”broad version” is also known as right inverse and the ”tall version” as the left inverse.Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 22

3.6Pseudo Inverse3INVERSESAssume A does not have full rank, i.e. A is n m and rank(A) r min(n, m). The pseudo inverse A can be constructed from the singular valuedecomposition A UDVT , byTA Vr D 1r Ur(223)where Ur , Dr , and Vr are the matrices with the degenerated rows and columnsdeleted. A different way is this: There do always exist two matrices C n rand D r m of rank r, such that A CD. Using these matrices it holds thatA DT (DDT ) 1 (CT C) 1 CT(224)See [3].Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 23

44COMPLEX MATRICESComplex MatricesThe complex scalar product r pq can be written as r p p q r p p q4.1(225)Complex DerivativesIn order to differentiate an expression f (z) with respect to a complex z, theCauchy-Riemann equations have to be satisfied ([7]): (f (z)) (f (z))df (z) idz z z(226)anddf (z) (f (z)) (f (z)) i dz z zor in a more compact form: f (z) f (z) i. z z(227)(228)A complex function that satisfies the Cauchy-Riemann equations for points in aregion R is said yo be analytic in this region R. In general, expressions involvingcomplex conjugate or conjugate transpose do not satisfy the Cauchy-Riemannequations. In order to avoid this problem, a more generalized definition ofcomplex derivative is used ([24], [6]): Generalized Complex Derivative:df (z)1 f (z) f (z) i.dz2 z z(229) Conjugate Complex Derivativedf (z)1 f (z) f (z) i. dz2 z z(230)The Generalized Complex Derivative equals the normal derivative, when f is ananalytic function. For a non-analytic function such as f (z) z , the derivativeequals zero. The Conjugate Complex Derivative equals zero, when f is ananalytic function. The Conjugate Complex Derivative has e.g been used by [21]when deriving a complex gradient.Notice:df (z) f (z) f (z)6 i.(231)dz z z Complex Gradient Vector: If f is a real function of a complex vector z,then the complex gradient vector is given by ([14, p. 798]) f (z) df (z)dz f (z) f (z) i. z z2(232)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 24

4.1Complex Derivatives4COMPLEX MATRICES Complex Gradient Matrix: If f is a real function of a complex matrix Z,then the complex gradient matrix is given by ([2]) f (Z) df (Z)dZ f (Z) f (Z) i. Z Z2(233)These expressions can be used for gradient descent algorithms.4.1.1The Chain Rule for complex numbersThe chain rule is a little more complicated when the function of a complexu f (x) is non-analytic. For a non-analytic function, the following chain rulecan be applied ([7]) g(u) x g u g u u x u x g u g u u x u x(234)Notice, if the function is analytic, the second term reduces to zero, and the function is reduced to the normal well-known chain rule. For the matrix derivativeof a scalar function g(U), the chain rule can be written the following way: TTTr(( g(U)Tr(( g(U) g(U) U ) U) U ) U ) . X X X4.1.2(235)Complex Derivatives of TracesIf the derivatives involve complex numbers, the conjugate transpose is often involved. The most useful way to show complex derivative is to show the derivativewith respect to the real and the imaginary part separately. An easy example is: Tr(X ) Tr(XH ) X X Tr(X ) Tr(XH )i i X X I(236) I(237)Since the two results have the same sign, the conjugate complex derivative (230)should be used. Tr(X) Tr(XT ) X X Tr(X) Tr(XT )i i X X I(238) I(239)Here, the two results have different signs, and the generalized complex derivative(229) should be used. Hereby, it can be seen that (100) holds even if X is acomplex number. Tr(AXH ) X Tr(AXH )i X A(240) A(241)Petersen & Pedersen, The Matrix Cookbook, Version: November 15, 2012, Page 25

4.2Higher order

CONTENTS CONTENTS Notation and Nomenclature A Matrix A ij Matrix indexed for some purpose A i Matrix indexed for some purpose Aij Matrix indexed for some purpose An Matrix indexed for some purpose or The n.th power of a square matrix A 1 The inverse matrix of the matrix A A The pseudo inverse matrix