Descriptive And Inferetial Statistics IBM SPSS 24

Transcription

Descriptive and Inferetial StatisticsIBM SPSS 24Prepared by Dr Baber Khan (Ahmadzai)DVM (RVMP), M.Sc. Livestock Value Chains and Agriculture Production Chain ManagementVHL-University of Applied Science, NetherlandsMonitoring and Evaluation ManagerNational Agriculture Education College (NAEC)Deputy Ministry of Technical and Vocational Education and Training, (DMTVET)baberkhanahmadzai44@gmail.com1

Table of ContentsImportant definitions .1Overview .6The SPSS Windows and Files . 7Data Editor (.sav files) . 7Output Viewer (.spv files) . 8Syntax Editor (.sps files) . 9Issuing Commands . 9Dialog Boxes . 10Working with the Data Editor . 11Working with the Output Viewer. 15Introduction and Types of Variable . 19Dependent and Independent Variables . 19Experimental and Non-Experimental Research . 20Categorical and Continuous Variables . 21Ambiguities in classifying a type of variable . 22Measures of Central Tendency . 23Types of Measurements . 23Measures of location . 23Measures of variation . 23Mean and standard deviation combined . 24Types of Kurtosis . 26SPSS Cheat Sheet . 28Mean (Arithmetic). 28Median . 30Mode . 31Skewed Distributions and the Mean and Median . 34Summary of when to use the mean, median and mode . 37Measures of Spread . 37Range . 37Variation. 42Absolute Deviation and Mean Absolute Deviation . 42Variance . 43Standard Deviation (SD) . 442

What type of data should you use when you calculate a standard deviation? . 45Examples . 45List of Important Requirements for Drawing a chart:. 48Creating, opening and saving files . 49Making frequency tables and cross tabulations . 50Recoding Variables . 51Binning Variables . 57Creating new variables . 63A new dialog window will appear, see below. . 63Hypothesis Testing . 65An example of a lecturer's dilemma . 65The research hypothesis . 65Sample to population . 66The structure of hypothesis testing . 67Operationally defining (measuring) the study. 68Variables . 69The null and alternative hypothesis . 69Significance levels . 71One- and two-tailed predictions . 72Rejecting or failing to reject the null hypothesis . 73Types of Errors . 74Type I error. 75Type II error. 75Summary of Cheat table for test Selection . 76Parametric tests: . 78Post Hoc tests :. 101Non Parametric test: . 102Tests for Relationship. 112The steps for conducting a Fisher's Exact Test in SPSS . 116Predication/Estimation tests . 116Output of Linear Regression Analysis . 126Estimated model coefficients . 130Statistical significance of the independent variables. 131Putting it all together . 1313

Other tests for Normal Distribution . 138Reliability test . 141Testing for Normality using SPSS Statistics . 142Non-parametric (or distribution-free) inferential statistical methods . 144Rokyan consultancy teaches following topics Daily Topics . 147More explanation about statistical issues by youtube . 149Descriptive statistics internet links . 149Inferential statistics internet links . 150SPSS statistics charts to show relationships between a pair of variables. 150SPSS statistics commonly used analyze menus . 151Choosing the correct statistical test. 154Statistical Tests . 1584

Important definitionsA pie chart: A circular graph where wedge-shaped slices comprise proportions of the totalcircular graph.Adjusted R-square – This is an adjustment of the R-squared that penalizes the addition ofextraneous predictors to the model. Adjusted R-squared is computed using the formula 1 – ((1 –Rsq)((N – 1) /( N – k – 1)) where k is the number of predictors.Alternative hypothesis: The subsequent test result that leads the researcher to reject the nullhypothesis in favor of the alternative hypothesis with a pre-specified level of confidence. Thenull and alternative hypotheses are mutual exclusive states.Bar chart: A chart made from categorical data in which the heights of bars represent thefrequency (or relative frequency aka percent) of membership in each value of the variable.Unlike a histogram, the width of the bars carries no meaning.Box and whisker plot: A plot that incorporates the median and upper and lower quartiles tographically display the data range. Also particularly useful for displaying outliers when they arepresent in the data.Central Limit Theorem: The statistical law that states that regardless of the shape of thedistribution of the individual values in the population, as the sample size gets larger, thesampling distribution of the mean can be approximated by a normal distribution.Confidence interval: An interval computed from a sample that is expected to contain thepopulation parameter with a given level of confidence.it is 90%, 95% and 99%. It depends on thequality of the data collected by the researchers.Continuous probability distribution: A probability distribution described by any possiblevalue of the variable within the range of possible values.Correlation coefficient: A numerical measure of the sign and strength of the linear associationbetween two variables. The correlation coefficient will range between -1.00 (negativecorrelation) and 1.00 (positive correlation).Correlation: The strength of linear association between two variables. Correlation isnot causality. A causal relationship exists when the independent variable is the underlyingcontributing determinant of the dependent variable. A causal relationship may be suggested bycorrelation; it is not proof a causal relationship exists however.Cross-sectional data: Data compared at one point in time. Comparisons can be intra-data orwith a benchmark data point.1

Degree of freedom: The number of independent data values available to estimate thepopulation's standard deviation. The degrees of freedom equal the number of observations in thesample (N) minus the number of parameters to be estimated (K). It is N-1Discrete probability distribution: A probability distribution where each class contains onlycertain values of the variable in any particular interval (such as only whole number values, forexample).Diversification: The effect that reduces portfolio risk if the securities making up the portfolioare not perfectly positively correlated. Cross security returns tend to moderate each other overtime thereby reducing the volatility of any one security held in isolation. A broad market indexwill be completely diversified and will demonstrate only non-diversifiable or market risk.Expected mean: A measure of central tendency. All data values are weighted by theirprobability of occurring and then summed. The expected mean is an ex ante calculation(sometimes referred to as a weighted mean where the probabilities are the weights). Theexpected mean can be from a population or from a sample. Typically it is computed from asample. The expected mean is also referred to as an expected value.Experimental data: Data about a variable that has been collected by allowing only one (or aselected) group of variables to change. All other variables are held constant. Experimental data istypically seen in the hard sciences. Non-experimental data is typically seen in the social scienceswhere it is impossible to "hold everything else constant."Frame data: Data collected using a pre-specified list establishing the guidelines that will beused in assembling the sample from the population. Frames should be selected so that theresulting sample will represent the population.Frequency table: A grouping of data into mutually exclusive classes showing the number ofobservations in each class. Relative frequency classes are derived from a frequency table bycomputing the percentage of the total observations made up by each class.Frequency: The number or percent occurrence of a particular outcome out of N trials.Histogram: A graph made from quantitative data in which the range of the data is divided intointervals called bins, and then bars are constructed above each bin such that the heights of thebars represent the frequency or relative frequency of data in the particular bin. Unlike a bar chart,the width of the bars is an important characteristic of the graphJoint frequency distribution: A table consisting of paired responses for two variables.Left-skewed probability distribution: A set of data values in which the mean is generally lessthan the median. The left tail of the distribution is longer than the right tail of the distribution.Linear regression: A statistical method in which a straight line is "fit" to a scatter of pointcoordinates so as to determine an estimated intercept and slope (the regression2

coefficients). Once estimated the intercept and slope allow the value of the dependent variable tobe obtained from the value of an independent variable. Multiple linear regression uses two ormore independent variables to explain a dependent variable. A linear regression line hasan equation of the form Y a bX, where X is the explanatory variable and Y is the dependentvariable. The slope of the line is b, and a is the intercept (the value of y when x 0).Mean: A measure of central tendency. It is computed by summing all data values and dividingby the number of data values summed. In this context the mean (average) is an ex post number. Itis computed after-the-fact. If the observations include all the values in a population the average isreferred to as a population mean. If the values used in the computation only include those from asample, the result is referred to as a sample mean.Median: A center value that divides the data array into two halves. The med

IBM SPSS 24 Prepared by Dr Baber Khan (Ahmadzai) DVM (RVMP), M.Sc. Livestock Value Chains and Agriculture Production Chain Management . Pareto chart: A bar