DirectLiNGAM - Osaka U

Transcription

Updated at Jan 14 2011DirectLiNGAM:A direct estimation method forLiNGAMShohei Shimizu, Takanori Inazumi, Yasuhiro Sogawa, Osaka Univ.Aapo Hyvarinen, Univ. HelsinkiYoshinobu Kawahara, Takashi Washio, Osaka Univ.Patrik O. Hoyer, Univ. HelsinkiKenneth Bollen, Univ. North Carolina

2Abstract Structural equation models (SEMs) are widelyused in many empirical sciences (Bollen, 1989) A non-Gaussian framework has been shown tobe useful for discovering SEMs (Shimizu, et al. 2006) Propose a new non-Gaussian estimation method– No algorithmic parameters– Guaranteed convergence in a fixed number of stepsif the data strictly follows the model

Background

4Linear Non-Gaussian Acyclic Model(LiNGAM model) (Shimizu et al. 2006) A SEM model, identifiable using non-Gaussianity Continuous observed random variables xiDirected acyclic graph (DAG)LinearityDisturbances ei are independent and non-Gaussianxi b xijk ( j ) k ( i )j eiorx Bx e– k(i) denotes an order of xi– B can be permuted to be lower triangular by simultaneous equalrow and column permutations

5Example A three-variable modelx1 e1x2 1.5 x1 e2x3 1.3 x2 e300 x1 e1 x1 0 x 1.5 x e 00 2 2 2 x3 0 1.3 0 x3 e3 1442443B Orders of variables:– k (1) 1, k ( 2) 2, k (3) 3e1– x2 can be influenced by x1, but never by x3 External influences:– x1 is equal to e1 and is exogenous– e2 and e3 are errorsx11.5x2e2-1.3x3e3

6Our goal We know– Data X is generated byx Bx e We do NOT know– Connection strengths: bij– Orders: k(i)– Disturbances: ei What we observe is data X only Goal– Estimate B and k using data X only!

Previous work

Independent Component Analysis8(Comon 1994; Hyvarinen et al., 2001)x As A is an unknown square matrix si are independent and non-Gaussian Identifiable including the rotation (Comon, 1994) Many estimation methods– e.g., FastICA (Hyvarinen,99), Amari (99) and Bach & Jordan (02)

Key idea Observed variables xi are linear combinations ofnon-Gaussian independent disturbances eix Bx e x (I B) 1 e Ae ICA gives-- ICA!W PDA 1 PD(I B)– P: Permutation matrix, D: scaling matrix Permutation indeterminacy in ICA can be solved– Can be shown that the correct permutation is the only one whichhas no zeros in the diagonal (Shimizu et al., UAI2005)9

ICA-LiNGAM algorithm(Shimizu et al., 2006)1. Do ICA (here, FastICA) and get W PD(I-B)2. Find a permutation P1 that gives no zeros on thediagonal. Then we obtain D(I-B).1(P1W )iiPˆ1 min P1i3. Divide each row by its corresponding diagonalelement. Then we get I-B, i.e., B4. Find a simultaneous row and column permutation Q sothat the permuted B is as close as possible to bestrictly lower triangular. Then we get k(i).(Tˆ minQQBQ Qi j)ij10

Potential problems ofICA-LiNGAM algorithm1. ICA is an iterative search method– May stuck in a local optimum if the initialguess or step size is badly chosen2. The permutation algorithms are notscale-invariant– May provide different variable orderings fordifferent scales of variables11

A new method

13DirectLiNGAM algorithm(Shimizu et al., UAI2009; Shimizu et al., 2011) Alternative estimation method without ICA– Estimates an ordering of variables that makes pathcoefficient matrix B to be lower-triangular.x perm O x perm e perm 23 1B permA full DAGx1x3x2Redundant edges Many existing (covariance-based) methods cando further pruning or finding significant pathcoefficients (Zou, 2006; Shimizu et al., 2006; Hyvarinen et al. 2010)

14Basic idea (1/2) :An exogenous variable can be at thetop of a right ordering An exogenous variable x j is a variable with noparents (Bollen, 1989), here x3 .– The corresponding row of B has all zeros. So, an exogenous variable can be at the top ofsuch an ordering that makes B lower-triangularwith zeros on the diagonal.00 x3 00 x 1.500 1 x2 0 1.30 x3 e3 0 x1 e1 0 x2 e2 x3x1x2

Basic idea (2/2):Regress exogenous x3 out15(3 )r Compute the residuals i (i 1, 2) regressing the othervariables xi (i 1, 2) on exogenous x3 :(3 )(3 )randr– The residuals 1form a LiNGAM model.2– The ordering of the residuals is equivalent to that ofcorresponding original variables.00 x3 00 x 1.500 1 x2 0 1.3x3x10 x3 e3 0 x1 e1 0 x2 e2 x20 r1(3) e1 r1(3) 0 ( 3) ( 3) r2 1.3 0 r2 e2 ( 3)1rr2(3) Exogenous r1(3) implies x1 can be at the second top’.

16Outline of DirectLiNGAM Iteratively find exogenous variables until all thevariables are ordered:1. Find an exogenous variable x3 .––Put x3 at the top of the ordering.Regress x3 out.( 3)1 .2. Find an exogenous residual, here r––Put x1 at the second top of the ordering.Regress r1(3) out.3. Put x2 at the third top of the ordering and terminate.The estimated ordering is x3 x1 x2 .Step. 1x3x1Step. 2x2r1(3)Step. 3r2(3)r2(3,1)

Identification of an exogenousvariable (two variable cases)i) x1 ( e1 ) is exogenous.x1 e1x2 b21 x1 e2(b21 0)Regressing x2 on x1 ,(1)2rcov( x2 , x1 ) x2 x1var( x1 ) x2 b21 x1 e2x1 and r2(1) are independent.ii)17x1 is NOT exogenous.x1 b12 x2 e1 (b12 0)x2 e2Regressing x2 on x1 ,r2(1) x2 cov( x2 , x1 )x1var( x1 ) b cov( x2 , x1 ) b12 var( x2 ) 1 12x e1 2var( x1 ) var( x1 ) x1 and r2(1) are NOT independent.

Need to use Darmois-Skitovitch’theorem (Darmois, 1953; Skitovitch, 1953)Darmois-Skitovitch’ theorem:Define two variables x1 and x2 asppj 1j 1x1 a1 j e j , x2 a2 j e j18x1 is NOT exogenous.x1 b12 x2 1 e1 (b12 0)ii)x2 e2Regressing x1 on x2 ,cov( x2 , x1 )x1var( x1 )where e j are independentrandom variables.r2(1) x2 If there exists a non-Gaussian ei b cov( x2 , x1 ) b12 var( x2 ) 1 12xe1 2var( x1 ) var( x1 ) for which a1i a2i 0 ,x1 and x2 are dependent.x1 and r2(1) are NOT independent.

Identification of an exogenousvariable (p variable cases) Lemma 1: x j and its residual ri( j) xi cov( xi , x j )var( x j )19xjare independent for all i j x j is exogenous In practice, we can identify an exogenous variableby finding a variable that is most independent ofits residuals

Independence measures20 Evaluate independence between a variable and aresidual by a nonlinear correlation:{( )} (g tanh )corr x j , g ri( j ) Taking the sum over all the residuals, we get:{( )}{T corr x j , g ri( j ) corr g (x j ), ri( j )}i j Can use more sophisticated measures as well(Bach & Jordan, 2002; Gretton et al., 2005; Kraskov et al., 2004).– Kernel-based independence measure (Bach & Jordan, 2002)often gives more accurate estimates (Sogawa et al., IJCNN10)

Real-world data example (1/2) Status attainment model– General Social Survey (U.S.A.)– Sample size 1380 Non-farm, ages 35-45, white, male, in the labor force, years 1972-2006Domain knowledge(Duncan et al. 1972)DirectLiNGAM21

Real-world data example (2/2)ICA-LiNGAMPC algorithmGES22

23Summary DirectLiNGAM repeats:– Least squares simple linear regression– Evaluation of pairwise independence between eachvariable and its residuals No algorithmic parameters like stepsize, initialguesses, convergence criteria Guaranteed convergence to the right solution ina fixed number of steps (the number ofvariables) if the data strictly follows the model

ICA-LiNGAM algorithm 10 (Shimizu et al., 2006) 1. Do ICA (here, FastICA) and get W PD(I-B) 2. Find a permutation that gives no zeros on the diagonal. Then we obtain D(I-B). 3. Divide each row by its corresponding diagonal element. Then we get I-B, i.e., B 4. Find a simultaneous row and column permutation Q so