Factor Analysis - Harvard University

Transcription

Factor AnalysisQian-Li XueBiostatistics ProgramHarvard Catalyst The Harvard Clinical & Translational ScienceCenterShort course, October 27, 20161

Well-used latent variable modelsLatentvariablescaleObserved variable scaleContinuousContinuous FactoranalysisLISRELDiscreteLatent profileGrowth mixtureDiscreteDiscrete FAIRT (item response)Latent classanalysis, regressionGeneral software: MPlus, Latent Gold, WinBugs (Bayesian), NLMIXED (SAS)

Objectives§ What is factor analysis?§ What do we need factor analysis for?§ What are the modeling assumptions?§ How to specify, fit, and interpret factor models?§ What is the difference between exploratory andconfirmatory factor analysis?§ What is and how to assess model identifiability?3

What is factor analysis§ Factor analysis is a theory drivenstatistical data reduction technique used toexplain covariance among observedrandom variables in terms of fewerunobserved random variables namedfactors4

An Example: General Intelligence(Charles Spearman, 5

Why Factor Analysis?1. Testing of theory§ Explain covariation among multiple observed variables by§ Mapping variables to latent constructs (called “factors”)2. Understanding the structure underlying a set ofmeasures§ Gain insight to dimensions§ Construct validation (e.g., convergent validity)3. Scale development§ Exploit redundancy to improve scale’s validity andreliability6

Part I. Exploratory FactorAnalysis (EFA)7

One Common Factor Model:Model Specificationλ1λ2FY1δ1Y1 λ1 F δ1Y2δ2Y2 λ2 F δ 2Y3δ3Y3 λ3 F δ 3λ3§ The factor F is not observed; only Y1, Y2, Y3 are observed§ δi represent variability in the Yi NOT explained by F§ Yi is a linear function of F and δi8

One Common Factor Model:Model Assumptionsλ1λ2FY1δ1Y1 λ1 F δ1Y2δ2Y2 λ2 F δ 2Y3δ3Y3 λ3 F δ 3λ3§ § § § Factorial causationF is independent of δj, i.e. cov(F,δj) 0δi and δj are independent for i j, i.e. cov(δi,δj) 0Conditional independence: Given the factor, observed variablesare independent of one another, i.e. cov( Yi ,Yj F ) 09

One Common Factor Model:Model Interpretationλ1λ2FY1δ1Y2δ2Y3δ3Given all variables in standardized form, i.e.var(Yi) var(F) 1§ Factor loadings: λiλi corr(Yi,F)λ3Y1 λ1 F δ1Y2 λ2 F δ 2Y3 λ3 F δ 3§ Communality of Yi: hi2hi2 λi2 [corr(Yi,F)]2 % variance of Yi explained by F§ Uniqueness of Yi: 1-hi2 residual variance of Yi§ Degree of factorial determination: Σ λi2/n, where n # observed variables Y10

Two-Common Factor Model (Orthogonal):Model SpecificationF1λ11λ21λ31λ41λ51λ61λ12 λ22 λ32λ42F2λ52λ62Y1δ1Y1 λ11F1 λ12 F2 δ1Y2δ2Y2 λ21 F1 λ22 F2 δ 2Y3δ3Y4δ4Y5δ5Y6δ6Y3 λ31F1 λ32 F2 δ 3Y4 λ41 F1 λ42 F2 δ 4Y5 λ51F1 λ52 F2 δ 5Y6 λ61 F1 λ62 F2 δ 6F1 and F2 are common factors because they are shared by 2 variables !11

Matrix Notationwith n variables and m factorsYnx1 ΛnxmFmx1 δnx1 Y1 λ11 λ1m δ1 F1 Fm m 1 Yn λn1 λnm n m δ n n 112

Factor Pattern Matrix§ Columns represent derived factors§ Rows represent input variables§ Loadings represent degree to which eachof the variables “correlates” with each ofthe factors§ Loadings range from -1 to 1§ Inspection of factor loadings revealsextent to which each of the variablescontributes to the meaning of each of thefactors.§ High loadings provide meaning andinterpretation of factors ( regressioncoefficients) λ11 λ1m λn1 λnm n m13

Two-Common Factor Model (Orthogonal):Model AssumptionsF1λ11λ21λ31λ41λ51λ61λ12 λ22 λ32λ42F2λ52λ62Y1δ1Y2δ2Y3δ3Y4δ4Y5δ5Y6δ6§ Factorial causation§ F1 and F2 are independent of δj, i.e.cov(F1,δj) cov(F2,δj) 0§ δi and δj are independent for i j, i.e.cov(δi,δj) 0§ Conditional independence: Givenfactors F1 and F2, observed variablesare independent of one another, i.e.cov( Yi ,Yj F1, F2) 0 for i j§ Orthogonal ( independent):cov(F1,F2) 014

Two-Common Factor Model (Orthogonal):Model InterpretationF1λ11λ21λ31λ41λ51λ61λ12 λ22 ven all variables in standardized form, i.e.var(Yi) var(Fi) 1;AND orthogonal factors, i.e. cov(F1,F2) 0§ Factor loadings: λijλij corr(Yi,Fj)§ Communality of Yi: hi2hi2 λi12 λ i22 % variance of Yiexplained by F1 AND F2§ Uniqueness of Yi: 1-hi2§ Degree of factorial determination: Σ λij2/n, n # observed variables Y15

Two-Common Factor Model :The Oblique CaseF1λ11λ21λ31λ41λ51λ61λ12 λ22 λ32λ42F2λ52λ62Given all variables in standardized form,i.e. var(Yi) var(Fi) 1;AND oblique factors (i.e. cov(F1,F2) 0)Y1δ1Y2δ2Y3δ3Y4δ4 § The calculation of communality of YiY5δ5Y6δ6§ The interpretation of factor loadings: λijis no longer correlation between Y andF; it is direct effect of F on Y(hi2) is more complex16

Extracting initial factors§ § Least-squares method (e.g. principal axisfactoring with iterated communalities)Maximum likelihood method17

Model Fitting: Extracting initial factorsLeast-squares method (LS) (e.g. principal axis factoring withiterated communalities)v Goal: minimize the sum of squared differencesbetween observed and estimated corr. matricesv Fitting steps:a) Obtain initial estimates of communalities (h2)e.g. squared correlation between a variable and theremaining variablesb) Solve objective function: det(RLS-ηI) 0,where RLS is the corr matrix with h2 in the main diag. (alsotermed adjusted corr matrix), η is an eigenvaluec) Re-estimate h2d) Repeat b) and c) until no improvement can be made18

Model Fitting: Extracting initial factorsMaximum likelihood method (MLE)v v v v v v Goal: maximize the likelihood of producing the observed corrmatrixAssumption: distribution of variables (Y and F) is multivariatenormalObjective function: det(RMLE- ηI) 0,where RMLE U-1(R-U2)U-1 U-1RLSU-1, and U2 is diag(1-h2)Iterative fitting algorithm similar to LS approachException: adjust R by giving greater weights to correlationswith smaller unique variance, i.e. 1- h2Advantage: availability of a large sample χ2 significant test forgoodness-of-fit (but tends to select more factors for large n!)19

Choosing among Different Methods§ Between MLE and LS§ LS is preferred withfew indicators per factorv Equeal loadings within factorsv No large cross-loadingsv No factor correlationsv Recovering factors with low loadings (overextraction)v § MLE if preferred withMultivariate normalityv unequal loadings within factorsv § Both MLE and LS may have convergence problems20

Factor Rotation§ Goal is simple structure§ Make factors more easily interpretable§ While keeping the number of factors andcommunalities of Ys fixed!!!§ Rotation does NOT improve fit!21

Factor RotationTo do this we “rotate” factors:§ redefine factors such that ‘loadings’ (orpattern matrix coefficients) on various factorstend to be very high (-1 or 1) or very low (0)§ intuitively, it makes sharper distinctions in themeanings of the factors22

Factor Rotation (Intuitively)F21, 21, 235x1x2x3x4x55F14Factor 10.40.40.650.690.613Factor 20.690.690.32-0.4-0.354F1x1x2x3x4x5Factor 1-0.8-0.8-0.600Factor 2000.40.80.7F223

Factor Rotation§ Uses “ambiguity” or non-uniqueness of solutionto make interpretation more simple§ Where does ambiguity come in?§ Unrotated solution is based on the idea that eachfactor tries to maximize variance explained,conditional on previous factors§ What if we take that away?§ Then, there is not one “best” solution24

Factor Rotation:Orthogonal vs. Oblique Rotation§ Orthogonal: Factors are independent§ varimax: maximize variance of squared loadingsacross variables (sum over factors)v Goal: the simplicity of interpretation of factors§ quartimax: maximize variance of squared loadingsacross factors (sum over variables)v Goal: the simplicity of interpretation of variables§ Intuition: from previous picture, there is a rightangle between axes§ Note: “Uniquenesses” remain the same!25

Factor Rotation:Orthogonal vs. Oblique Rotation§ Oblique: Factors are NOT independent. Changein “angle.”§ oblimin: minimize covariance of squared loadingsbetween factors.§ promax: simplify orthogonal rotation by making smallloadings even closer to zero.§ Target matrix: choose “simple structure” a priori.§ Intuition: from previous picture, angle betweenaxes is not necessarily a right angle.§ Note: “Uniquenesses” remain the same!26

Pattern versus Structure Matrix§ In oblique rotation, one typically presents both a patternmatrix and a structure matrix§ Also need to report correlation between the factors§ The pattern matrix presents the usual factor loadings§ The structure matrix presents correlations between thevariables and the factors§ For orthogonal factors, pattern matrix structure matrix§ The pattern matrix is used to interpret the factors27

Factor Rotation: Which to use?§ Choice is generally not critical§ Interpretation with orthogonal (varimax) is“simple” because factors are independent:“Loadings” are correlations.§ Configuration may appear more simple inoblique (promax), but correlation of factorscan be difficult to reconcile.§ Theory? Are the conceptual meanings of thefactors associated?28

Factor Rotation: Unique Solution?§ The factor analysis solution is NOT unique!§ More than one solution will yield the same“result.”29

Derivation of Factor Scores§ Each object (e.g. each person) gets a factor score for each factor:§ The factors themselves are variables§ “Object’s” score is weighted combination of scores on inputvariablesˆ ˆˆF WY , where W is the weight matrix.These weights are NOT the factor loadings!Different approaches exist for estimating Ŵ (e.g. regression method)Factor scores are not uniqueUsing factors scores instead of factor indicators can reducemeasurement error, but does NOT remove it.§ Therefore, using factor scores as predictors in conventionalregressions leads to inconsistent coefficient estimators!§ § § § 30

Factor Analysis withCategorical Observed Variables§ Factor analysis hinges on the correlation matrix§ As long as you can get an interpretable correlationmatrix, you can perform factor analysis§ Binary/ordinal items?§ Pearson corrlation: Expect attenuation!§ Tetrachoric correlation (binary)§ Polychoric correlation (ordinal)To obtain polychoric correlation in STATA:polychoric var1 var2 var3 var4 var5 To run princial component analysis:pcamat r(R), n(328)To run factor analysis:factormat r(R), fa(2) ipf n(328)31

Criticisms of Factor Analysis§ Labels of factors can be arbitrary or lack scientific basis§ Derived factors often very obvious§ defense: but we get a quantification§ “Garbage in, garbage out”§ really a criticism of input variables§ factor analysis reorganizes input matrix§ Correlation matrix is often poor measure of association ofinput variables.32

Major steps in EFA1.2.3.4.5.6.Data collection and preparationChoose number of factors to extractExtracting initial factorsRotation to a final solutionModel diagnosis/refinementDerivation of factor scales to beused in further analysis33

Part II. Confirmatory FactorAnalysis (CFA)34

Exploratory vs. ConfirmatoryFactor Analysis§ Exploratory:§ summarize data§ describe correlation structure between variables§ generate hypotheses§ Confirmatory§ Testing correlated measurement errors§ Redundancy test of one-factor vs. multi-factor models§ Measurement invariance test comparing a modelacross groups§ Orthogonality tests35

Confirmatory Factor Analysis (CFA)§ Takes factor analysis a step further.§ We can “test” or “confirm” or “implement” a “highlyconstrained a priori structure that meets conditions ofmodel identification”§ But be careful, a model can never be confirmed!!§ CFA model is constructed in advance§ number of latent variables (“factors”) is pre-set byanalyst (not part of the modeling usually)§ Whether latent variable influences observed isspecified§ Measurement errors may correlate§ Difference between CFA and the usual SEM:§ SEM assumes causally interrelated latent variables§ CFA assumes interrelated latent variables (i.e. exogenous)36

Exploratory Factor AnalysisTwo factor model: x1 λ11 x λ 2 21 x3 λ31 x4 λ41 x5 λ51 x6 λ61x Λξ δλ12 δ1 δ λ22 2 λ32 ξ1 δ 3 λ42 ξ 2 δ 4 δ 5 λ52 λ62 δ 6 ξ1ξ2x1δ1x2δ2x3δ3x4δ4x5δ5x6δ637

CFA NotationTwo factor model:x Λξ δ x1 λ11 0 δ1 x λ 2 21 0 δ 2 x3 λ31 0 ξ1 δ 3 x4 0 λ42 ξ 2 δ 4 x5 0 λ52 δ 5 x6 0 λ62 δ 6 ξ1ξ2x1δ1x2δ2x3δ3x4δ4x5δ5x16δ638

Difference between CFA and EFACFAEFAx1 λ11ξ1 δ1x1 λ11ξ1 λ12ξ2 δ1x2 λ21ξ1 δ2x2 λ21ξ1 λ22ξ2 δ2x3 λ31ξ1 δ3x3 λ31ξ1 λ32ξ2 δ3x4 λ42ξ2 δ4x4 λ41ξ1 λ42ξ2 δ4x5 λ52ξ2 δ5x5 λ51ξ1 λ52ξ2 δ5x6 λ62ξ2 δ6x6 λ61ξ1 λ62ξ2 δ6cov(ξ1 , ξ2 ) ϕ12cov(ξ1 , ξ2 ) 039

Model Constraints§ Hallmark of CFA§ Purposes for setting constraints:§ Test a priori theory§ Ensure identifiability§ Test reliability of measures40

Identifiability§ Let θ be a t 1 vector containing allunknown and unconstrained parameters ina model. The parameters θ are identified ifΣ(θ1) Σ(θ2) θ1 θ2§ Estimability Identifiability !!§ Identifiability – attribute of the model§ Estimability – attribute of the data41

Model Constraints: Identifiability§ Latent variables (LVs) need someconstraints§ Because factors are unmeasured, theirvariances can take different values§ Recall EFA where we constrained factors:F N(0,1)§ Otherwise, model is not identifiable.§ Here we have two options:§ Fix variance of latent variables (LV) to be 1 (oranother constant)§ Fix one path between LV and indicator42

Necessary ConstraintsFix variances:1ξ11ξ2Fix �5x5δ5δ6x6δ61ξ1ξ2143

Model ParametrizationFix variances:Fix path:x1 λ11ξ1 δ1x1 ξ1 δ1x2 λ21ξ1 δ2x2 λ21ξ1 δ2x3 λ31ξ1 δ3x3 λ31ξ1 δ3x4 λ42ξ2 δ4x4 ξ2 δ4x5 λ52ξ2 δ5x5 λ52ξ2 δ5x6 λ62ξ2 δ6x6 λ62ξ2 δ6cov(ξ1 , ξ2 ) ϕ12cov(ξ1 , ξ2 ) ϕ12var(ξ1 ) 1var(ξ1 ) ϕ11var(ξ2 ) 1var(ξ2 ) ϕ2244

Identifiability Rules for CFA(1) Two-indicator rule (sufficient, not necessary)1) At least two factors2) At least two indicators per factor3) Exactly one non-zero element per row of Λ(translation: each x only is pointed at by one LV)4) Non-correlated errors (Θ is diagonal)(translation: no double-header arrows between the δ’s)5) Factors are correlated (Φ has no zero elements)*(translation: there are double-header arrows between all ofthe LVs)* Alternative less strict criteria: each factor is correlated withat least one other factor.(see page 247 on Bollen)45

x1 λ11 0 δ1 x λ δ 0 2 21 2 x3 λ31 0 ξ1 δ 3 x4 0 λ42 ξ 2 δ 4 x5 0 λ52 δ 5 x6 0 λ62 δ 6 0000 θ11 0 0 θ0000 22 00 θ33 000 Θ var(δ ) 000θ0044 0000 θ55 0 00000θ66 1ξ11ξ2x1δ1x2δ2x3δ3x4δ4x5δ5x16δ6 1 ϕ12 Φ var(ξ ) ϕ12 1 46

Example: Two-Indicator Rule1ξ1x1δ1x2δ21ξ2x3δ3x4δ4x5δ5x6δ61ξ347

Example: Two-Indicator Rule1ξ1x1δ1x2δ21ξ21ξ3 x3δ3x4δ4x5δ5x6δ648

Example: Two-Indicator �81ξ31ξ449

Identifiability Rules for CFA(2) Three-indicator rule (sufficient, not necessary)1) at least one factor2) at least three indicators per factor3) one non-zero element per row of Λ(translation: each x only is pointed at by one LV)4) non-correlated errors (Θ is diagonal)(translation: no double-headed arrows between the δ’s)[Note: no condition about correlation of factors (nox1restrictions on Φ).]δ1ξ1x2x3δ2δ350

What is factor analysis ! Factor analysis is a theory driven statistical data reduction technique used to explain covariance among observed random variables in t