Applied Statistics Using SPSS, STATISTICA, MATLAB And R

Transcription

Applied Statistics Using SPSS, STATISTICA,MATLAB and R

Joaquim P. Marques de SáApplied StatisticsUsing SPSS, STATISTICA, MATLAB and RWith 195 Figures and a CD123

E d itorsProf. Dr. Joaquim P. Marques de SáUniversidade do PortoFac. EngenhariaRua Dr. Roberto Frias s/n4200-465 PortoPortugale-mail: jmsa@fe.up.ptLibrary of Congress Control Number: 2007926024ISBN 978-3-540-71971-7 Springer Berlin Heidelberg New YorkThis work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting,reproduction on microfilm or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9,1965, in its current version, and permission for use must always be obtained from Springer. Violationsare liable for prosecution under the German Copyright Law.Springer is a part of Springer Science Business Mediaspringer.com Springer-Verlag Berlin Heidelberg 2007The use of general descriptive names, registered names, trademarks, etc. in this publication does notimply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use.Typesetting: by the editorsProduction: Integra Software Services Pvt. Ltd., IndiaCover design: WMX design, HeidelbergPrinted on acid-free paperSPIN: 1190894442/3100/Integra5 4 3 2 1 0

ToWiesjeand Carlos.

ContentsPreface to the Second EditionxvPreface to the First EditionxviiSymbols and Abbreviationsxix1 Introduction1.11.21.31.41.51.61.71.8Deterministic Data and Random Data.1Population, Sample and Statistics .5Random Variables.8Probabilities and Distributions.101.4.1 Discrete Variables .101.4.2 Continuous Variables .12Beyond a Reasonable Doubt. .13Statistical Significance and Other Significances.17Datasets .19Software Tools .191.8.1 SPSS and STATISTICA.201.8.2 MATLAB and R.222 Presenting and Summarising the Data2.12.22.3129Preliminaries .292.1.1 Reading in the Data .292.1.2 Operating with the Data.34Presenting the Data .392.2.1 Counts and Bar Graphs.402.2.2 Frequencies and Histograms.472.2.3 Multivariate Tables, Scatter Plots and 3D Plots .522.2.4 Categorised Plots .56Summarising the Data.582.3.1 Measures of Location .582.3.2 Measures of Spread .622.3.3 Measures of Shape.64

viiiContents2.3.4 Measures of Association for Continuous Variables.662.3.5 Measures of Association for Ordinal Variables.692.3.6 Measures of Association for Nominal Variables .73Exercises.773 Estimating Data Parameters813.1Point Estimation and Interval Estimation.813.2Estimating a Mean .853.3Estimating a Proportion .923.4Estimating a Variance .953.5Estimating a Variance Ratio.973.6Bootstrap Estimation.99Exercises.1074 Parametric Tests of Hypotheses1114.14.24.3Hypothesis Test Procedure.111Test Errors and Test Power .115Inference on One Population.1214.3.1 Testing a Mean .1214.3.2 Testing a Variance.1254.4Inference on Two Populations .1264.4.1 Testing a Correlation .1264.4.2 Comparing Two Variances.1294.4.3 Comparing Two Means .1324.5Inference on More than Two Populations.1414.5.1 Introduction to the Analysis of Variance.1414.5.2 One-Way ANOVA .1434.5.3 Two-Way ANOVA .156Exercises.1665 Non-Parametric Tests of Hypotheses5.15.2171Inference on One Population.1725.1.1 The Runs Test.1725.1.2 The Binomial Test .1745.1.3 The Chi-Square Goodness of Fit Test .1795.1.4 The Kolmogorov-Smirnov Goodness of Fit Test .1835.1.5 The Lilliefors Test for Normality .1875.1.6 The Shapiro-Wilk Test for Normality .187Contingency Tables.1895.2.1 The 2 2 Contingency Table .1895.2.2 The rxc Contingency Table .193

Contentsix5.2.3 The Chi-Square Test of Independence .1955.2.4 Measures of Association Revisited.1975.3Inference on Two Populations .2005.3.1 Tests for Two Independent Samples.2015.3.2 Tests for Two Paired Samples .2055.4Inference on More Than Two Populations.2125.4.1 The Kruskal-Wallis Test for Independent Samples .2125.4.2 The Friedmann Test for Paired Samples .2155.4.3 The Cochran Q test.217Exercises.2186 Statistical Classification2236.16.2Decision Regions and Functions.223Linear Discriminants.2256.2.1 Minimum Euclidian Distance Discriminant .2256.2.2 Minimum Mahalanobis Distance Discriminant.2286.3Bayesian Classification .2346.3.1 Bayes Rule for Minimum Risk .2346.3.2 Normal Bayesian Classification .2406.3.3 Dimensionality Ratio and Error Estimation.2436.4The ROC Curve .2466.5Feature Selection.2536.6Classifier Evaluation .2566.7Tree Classifiers .259Exercises.2687 Data Regression7.17.27.37.4271Simple Linear Regression .2727.1.1 Simple Linear Regression Model .2727.1.2 Estimating the Regression Function .2737.1.3 Inferences in Regression Analysis.2797.1.4 ANOVA Tests .285Multiple Regression .2897.2.1 General Linear Regression Model .2897.2.2 General Linear Regression in Matrix Terms .2897.2.3 Multiple Correlation .2927.2.4 Inferences on Regression Parameters .2947.2.5 ANOVA and Extra Sums of Squares.2967.2.6 Polynomial Regression and Other Models .300Building and Evaluating the Regression Model.3037.3.1 Building the Model.3037.3.2 Evaluating the Model .3067.3.3 Case Study.308Regression Through the Origin.314

xContentsRidge Regression .3167.57.6Logit and Probit Models .322Exercises.3278 Data Structure Analysis3298.1Principal Components .3298.2Dimensional Reduction.3378.3Principal Components of Correlation Matrices.3398.4Factor Analysis .347Exercises.3509 Survival Analysis3539.19.2Survivor Function and Hazard Function .353Non-Parametric Analysis of Survival Data .3549.2.1 The Life Table Analysis .3549.2.2 The Kaplan-Meier Analysis.3599.2.3 Statistics for Non-Parametric Analysis.3629.3Comparing Two Groups of Survival Data .3649.4Models for Survival Data .3679.4.1 The Exponential Model .3679.4.2 The Weibull Model.3699.4.3 The Cox Regression Model .371Exercises.37310 Directional Data10.110.210.310.4375Representing Directional Data .375Descriptive Statistics.380The von Mises Distributions .383Assessing the Distribution of Directional Data.38710.4.1 Graphical Assessment of Uniformity .38710.4.2 The Rayleigh Test of Uniformity .38910.4.3 The Watson Goodness of Fit Test .39210.4.4 Assessing the von Misesness of Spherical Distributions.39310.5Tests on von Mises Distributions.39510.5.1 One-Sample Mean Test .39510.5.2 Mean Test for Two Independent Samples .39610.6Non-Parametric Tests.39710.6.1 The Uniform Scores Test for Circular Data.39710.6.2 The Watson Test for Spherical Data.39810.6.3 Testing Two Paired Samples .399Exercises.400

ContentsAppendix A - Short Survey on Probability TheoryA.1A.2A.3A.4A.5A.6A.7A.8B.2403Basic Notions .403A.1.1 Events and Frequencies .403A.1.2 Probability Axioms.404Conditional Probability and Independence .406A.2.1 Conditional Probability and Intersection Rule.406A.2.2 Independent Events .406Compound Experiments.408Bayes’ Theorem .409Random Variables and Distributions .410A.5.1 Definition of Random Variable .410A.5.2 Distribution and Density Functions .411A.5.3 Transformation of a Random Variable .413Expectation, Variance and Moments .414A.6.1 Definitions and Properties .414A.6.2 Moment-Generating Function .417A.6.3 Chebyshev Theorem .418The Binomial and Normal Distributions.418A.7.1 The Binomial Distribution.418A.7.2 The Laws of Large Numbers .419A.7.3 The Normal Distribution .420Multivariate Distributions .422A.8.1 Definitions .422A.8.2 Moments.425A.8.3 Conditional Densities and Independence.425A.8.4 Sums of Random Variables .427A.8.5 Central Limit Theorem .428Appendix B - DistributionsB.1xi431Discrete Distributions .431B.1.1 Bernoulli Distribution.431B.1.2 Uniform Distribution .432B.1.3 Geometric Distribution .433B.1.4 Hypergeometric Distribution.434B.1.5 Binomial Distribution .435B.1.6 Multinomial Distribution.436B.1.7 Poisson Distribution .438Continuous Distributions .439B.2.1 Uniform Distribution .439B.2.2 Normal Distribution.441B.2.3 Exponential Distribution.442B.2.4 Weibull Distribution.444B.2.5 Gamma Distribution .445B.2.6 Beta Distribution .446B.2.7 Chi-Square Distribution.448

xiiContentsB.2.8 Student’s t Distribution.449B.2.9 F Distribution .451B.2.10 Von Mises Distributions.452Appendix C - Point EstimationC.1C.2Definitions.455Estimation of Mean and Variance.457Appendix D - TablesD.1D.2D.3D.4D.5459Binomial Distribution .459Normal Distribution .465Student s t Distribution .466Chi-Square Distribution .467Critical Values for the F Distribution .468Appendix E - 55469Breast Tissue.469Car Sale.469Cells .470Clays .470Cork Stoppers.471CTG .472Culture .473Fatigue .473FHR.474FHR-Apgar .474Firms .475Flow Rate .475Foetal Weight.475Forest Fires.476Freshmen.476Heart Valve .477Infarct.478Joints .478Metal Firms.479Meteo .479Moulds .479Neonatal .480Programming.480Rocks .481Signal & Noise.481

ContentsE.26E.27E.28E.29E.30E.31E.32Soil Pollution .482Stars .482Stock Exchange.483VCG .484Wave .484Weather .484Wines .485Appendix F - ToolsF.1F.2F.3F.4xiii487MATLAB Functions .487R Functions .488Tools EXCEL File .489SCSize Program .

Using SPSS, STATISTICA, MATLAB and R . Printed on acid-free paper SPIN: 11908944 42/ 5 4 3 2 1 0 E d itors 3100/Integra . D.4 Chi-Square Distribution.467 D.5 Critical Values for the F Distribution .468 Appendix E - Datasets 469 .