Practical Data Analysis With JMP Third Edition - SAS Support

Transcription

The correct bibliographic citation for this manual is as follows: Carver, Robert. 2019. Practical Data Analysis withJMP , Third Edition. Cary, NC: SAS Institute Inc.Practical Data Analysis with JMP , Third EditionCopyright 2019, SAS Institute Inc., Cary, NC, USAISBN 978-1-64295-614-6 (Hardcover)ISBN 978-1-64295-610-8 (Paperback)ISBN 978-1-64295-611-5 (Web PDF)ISBN 978-1-64295-612-2 (EPUB)ISBN 978-1-64295-613-9 (Kindle)All Rights Reserved. Produced in the United States of America.For a hard copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, inany form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permissionof the publisher, SAS Institute Inc.For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendorat the time you acquire this publication.The scanning, uploading, and distribution of this book via the Internet or any other means without the permission ofthe publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do notparticipate in or encourage electronic piracy of copyrighted materials. Your support of others’ rights is appreciated.U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computersoftware developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government.Use, duplication, or disclosure of the Software by the United States Government is subject to the license terms of thisAgreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4,and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice isrequired to be affixed to the Software or documentation. The Government’s rights in Software and documentationshall be only those set forth in this Agreement.SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414October 2019SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS InstituteInc. in the USA and other countries. indicates USA registration.Other brand and product names are trademarks of their respective companies.SAS software may be provided with certain third-party software, including but not limited to open-source software,which is licensed under its applicable third-party software license agreement. For license information about thirdparty software distributed with SAS software, refer to http://support.sas.com/thirdpartylicenses.

ContentsAbout This Book . ixAbout The Author . xviiChapter 1: Getting Started: Data Analysis with JMP . 1Overview .1Goals of Data Analysis: Description and Inference .1Types of Data .2Starting JMP .3A Simple Data Table .5Graph Builder: An Interactive Tool to Explore Data .8Using an Analysis Platform .11Row States .14Exporting and Sharing JMP Reports .16Saving and Reproducing Your Work .20Leaving JMP .21Chapter 2: Data Sources and Structures . 23Overview .23Populations, Processes, and Samples .23Representativeness and Sampling .25Cross-Sectional and Time Series Sampling .28Study Design: Experimentation, Observation, and Surveying .29Creating a Data Table .36Raw Case Data and Summary Data .36Application.37Chapter 3: Describing a Single Variable . 39Overview .39The Concept of a Distribution .39Variable Types and Their Distributions.40Distribution of a Categorical Variable .41Using Graph Builder to Explore Categorical Data Visually .44Distribution of a Quantitative Variable .45

iv ContentsUsing the Distribution Platform for Continuous Data . 46Exploring Further with the Graph Builder. 52Summary Statistics for a Single Variable. 53Application . 56Chapter 4: Describing Two Variables at a Time . 61Overview . 61Two-by-Two: Bivariate Data . 61Describing Covariation: Two Categorical Variables. 63Describing Covariation: One Continuous, One Categorical Variable. 71Describing Covariation: Two Continuous Variables. 74Application . 82Chapter 5: Review of Descriptive Statistics . 87Overview . 87The World Development Indicators . 87Questions for Analysis . 88Applying an Analytic Framework . 89Preparation for Analysis . 92Univariate Descriptions. 92Explore Relationships with Graph Builder . 95Further Analysis with the Multivariate Platform. 98Further Analysis with Fit Y by X. 100Summing Up: Interpretation and Conclusions . 101Visualizing Multiple Relationships . 101Chapter 6: Elementary Probability and Discrete Distributions . 105Overview . 105The Role of Probability in Data Analysis. 105Elements of Probability Theory. 106Contingency Tables and Probability . 109Discrete Random Variables: From Events to Numbers . 111Three Common Discrete Distributions . 112Simulating Random Variation with JMP. 116Discrete Distributions as Models of Real Processes . 117Application . 118Chapter 7: The Normal Model . 123Overview . 123Continuous Data and Probability . 123Density Functions . 124The Normal Model . 127Normal Calculations . 128Checking Data for the Suitability of a Normal Model . 133

Contents vGenerating Pseudo-Random Normal Data.137Application.138Chapter 8: Sampling and Sampling Distributions . 143Overview .143Why Sample? .143Methods of Sampling.144Using JMP to Select a Simple Random Sample .145Variability Across Samples: Sampling Distributions .148Application.159Chapter 9: Review of Probability and Probabilistic Sampling. 163Overview .163Probability Distributions and Density Functions .163The Normal and t Distributions .164The Usefulness of Theoretical Models .166When Samples Surprise Us: Ordinary and Extraordinary Sampling Variability .167Conclusion .171Chapter 10: Inference for a Single Categorical Variable . 173Overview .173Two Inferential Tasks .173Statistical Inference Is Always Conditional .174Using JMP to Conduct a Significance Test .174Confidence Intervals.179Using JMP to Estimate a Population Proportion .179A Few Words about Error .183Application.184Chapter 11: Inference for a Single Continuous Variable . 189Overview .189Conditions for Inference .189Using JMP to Conduct a Significance Test .190What If Conditions Are Not Satisfied? .197Using JMP to Estimate a Population Mean .197Matched Pairs: One Variable, Two Measurements .199Application.201Chapter 12: Chi-Square Tests . 205Overview .205Chi-Square Goodness-of-Fit Test .205Inference for Two Categorical Variables .208Contingency Tables Revisited .209Chi-Square Test of Independence .211Application.213

vi ContentsChapter 13: Two-Sample Inference for a Continuous Variable . 217Overview . 217Conditions for Inference . 217Using JMP to Compare Two Means . 217Using JMP to Compare Two Variances . 224Application . 226Chapter 14: Analysis of Variance . 229Overview . 229What Are We Assuming? . 229One-Way ANOVA. 230What If Conditions Are Not Satisfied? . 237Including a Second Factor with Two-Way ANOVA . 238Application . 245Chapter 15: Simple Linear Regression Inference . 249Overview . 249Fitting a Line to Bivariate Continuous Data . 249The Simple Regression Model . 253What Are We Assuming? . 255Interpreting Regression Results . 256Application . 261Chapter 16: Residuals Analysis and Estimation . 267Overview . 267Conditions for Least Squares Estimation. 267Residuals Analysis . 268Estimation. 276Application . 280Chapter 17: Review of Univariate and Bivariate Inference. 285Overview . 285Research Context . 285One Variable at a Time. 286Life Expectancy by Income Group. 287Life Expectancy by GDP per Capita . 291Conclusion. 293Chapter 18: Multiple Regression . 295Overview . 295The Multiple Regression Model . 295Visualizing Multiple Regression . 296Fitting a Model . 298A More Complex Model . 302Residuals Analysis in the Fit Model Platform. 304Using a Regression Tree Approach: The Partition Platform . 306

Contents viiCollinearity .309Evaluating Alternative Models .315Application.319Chapter 19: Categorical, Curvilinear, and Non-Linear Regression Models. 323Overview .323Dichotomous Independent Variables.323Dichotomous Dependent Variable .327Curvilinear and Non-Linear Relationships .330More Non-Linear Functions .337Application.338Chapter 20: Basic Forecasting Techniques . 341Overview .341Detecting Patterns Over Time .341Smoothing Methods .344Trend Analysis .350Autoregressive Models .352Application.355Chapter 21: Elements of Experimental Design . 359Overview .359Why Experiment? .360Goals of Experimental Design .360Factors, Blocks, and Randomization .361Multi-Factor Experiments and Factorial Designs .362Blocking.369A Design for Main Effects Only .371Definitive Screening Designs .373Non-Linear Response Surface Designs .

The goal of applied statistical analysis is to work with data to calibrate, cope with, and sometimes reduce uncertainty. Business decisions, public policies, scientific research, and news reporting are all shaped by statistical analysis and reasoning. Statistical thinking is an essential part of the boom in "big data analytics"