Multivariate Data Analysis 6th Edition Technology And .

Transcription

Multivariate Data Analysis6th EditionAn introduction to Multivariate Analysis, Process AnalyticalTechnology and Quality by DesignKim H. EsbensenandBrad Swarbrickwith contributions from Frank Westad, Pat Whitcombe and Mark Anderson

ContentsPrefacexvChapter 1. Introduction to multivariate analysis11.1The world is multivariate.11.2Indirect observations and correlations.21.3Data must carry useful information.21.4Variance, covariance and correlation.31.5Causality vs correlation.61.6Hidden data structures—correlations again.61.7Multivariate data analysis vs multivariate statistics.81.8Main objectives of multivariate data analysis.81.8.1Data description (exploratory data structure modelling). 91.8.2Discrimination and classification. 91.8.3Regression and prediction. 101.9Multivariate techniques as geometric projections.101.9.1Geometry, mathematics, algorithms. 111.10 The grand overview in multivariate data analysis.111.11 References.12Chapter 2: A review of some fundamental statistics132.1Terminology.132.2Definitions of some important measurements and concepts.142.2.1The mean. 152.2.2The median. 162.2.3The mode. 172.2.4Variance and standard deviation. 172.3Samples and representative sampling.182.3.12.4An example from the pharmaceutical industry. 19The normal distribution and its properties.202.4.1Graphical representations. 20

2.5Hypothesis testing.262.5.1Significance, risk and power. 262.5.2Defining an appropriate risk level. 282.5.3A general guideline for applying formal statistical tests. 302.5.4A Test for Equivalence of Variances: The F-test. 352.5.5Tests for equivalence of means. 382.6An introduction to time series and control charts.452.7Joint confidence intervals and the need for multivariate analysis.482.8Chapter summary.502.9References.52Chapter 3: Theory of Sampling (TOS)533.1Chapter overview.543.2Heterogeneity.543.2.1Constitutional heterogeneity (CH). 553.2.2Distributional heterogeneity (DH). 553.3Sampling error vs practical sampling.573.4Total Sampling Error (TSE)—Fundamental Sampling Principle (FSP).583.5Sampling Unit Operations (SUO).593.6Replication experiment—quantifying sampling errors.613.7TOS in relation to multivariate data analysis.623.8Process sampling—variographic analysis.633.8.13.9Appendix A. Terms and definitions used in the TOS literature. 65References.68Chapter 4: Fundamentals of principal component analysis (PCA)694.1Representing data as a matrix.694.2The variable space—plotting objects in p-dimensions.704.2.1Plotting data in 1-d and 2d space. 704.2.2The variable space and dimensions. 704.2.3Visualisation in 3-D (or more). 704.3Plotting objects in variable space.714.4Example—plotting raw data (beverage).714.4.1Purpose. 714.4.2Data set. 71

4.5The first principal component.734.5.1Maximum variance directions. 734.5.2The first principal component as a least squares fit. 744.6Extension to higher-order principal components.754.7Principal component models—scores and loadings.764.7.1Maximum number of principal components. 764.7.2PC model centre. 774.7.3Introducing loadings—relations between X and PCs. 774.7.4Scores—coordinates in PC space. 784.7.5Object residuals. 784.8Objectives of PCA.794.9Score plot–object relationships.804.9.1Interpreting score plots. 804.9.2Choice of score plots. 824.10 The loading plot–variable relationships.834.10.1Correlation loadings. 844.10.2Comparison of scores and loading plots. 864.10.3The 1-dimensional loading plot. 874.11 Example: city temperatures in europe.894.11.1Introduction. 894.11.2Plotting data and deciding on the validation scheme. 894.11.3PCA results and interpretation. 904.12 Principal component models.934.12.1The PC model. 934.12.2Centring. 934.12.3Step by step calculation of PCs. 944.12.4A preliminary comment on the algorithm: NIPALS. 944.12.5Residuals—the E-matrix. 954.12.6Residual variance. 954.12.7Object residuals. 964.12.8The total squared object residual. 964.12.9Explained/residual variance plots. 964.12.10 How many PCs to use?. 974.12.11 A note on the number of PCs. 984.12.12 A doubtful case—using external evidence. 984.12.13 Variable residuals. 994.12.14 More about variances—modelling error variance. 99

4.13 Example: interpreting a PCA model (peas).994.13.1Purpose. 1004.13.2Data set. 1004.13.3Tasks. 1004.13.4How to do it. 1004.13.5Summary. 1014.14 PCA modelling—the NIPALS algorithm.1024.15 Chapter summary.1034.16 References .104Chapter 5: g of discrete data.1065.2.1Variable weighting and scaling. 1065.2.2Logarithm transformation. 1085.2.3Averaging. 1085.3Preprocessing of spectroscopic data.1095.3.1Spectroscopic transformations. 1105.3.2Smoothing. 1125.3.3Normalisation. 1135.3.4Baseline correction. 1145.3.5Derivatives. 1165.3.6Correcting multiplicative effects in spectra. 1225.3.7Other general preprocessing methods. 1255.4Practical aspects of preprocessing.1275.4.1Scatter effects plot. 1295.4.2Detailed example: preprocessing gluten–starch mixtures. 1305.5Chapter summary.1335.6References.1346. Principal Component Analysis (PCA)—in practice1356.1The PCA overview.1356.2PCA—Step by Step.1366.3Interpretation of PCA models.1386.3.1Interpretation of score plots—look for patterns. 1386.3.2Summary—interpretation of score plots. 1406.3.3Interpretation of loading plots—look for important variables. 140

6.4Example: alcohol in water analysis.1416.5PCA—what can go wrong?.1446.5.1Is there any information in the data set?. 1446.5.2Too few PCs are used in the model. 1456.5.3Too many PCs are used in the model. 1456.5.4Outliers which are truly due to erroneous dat

Multivariate Data Analysis 6th Edition An introduction to Multivariate Analysis, Process Analytical Technology and Quality by Design Kim H. Esbensen and Brad Swarbrick with contributions from Frank Westad, Pat Whitcombe and Mark Anderson . Contents Preface xv Chapter 1. Introduction to multivariate analysis 1 1.1 The world is multivariate. 1 1.2 Indirect observations and correlations .