Bps7e Lecturepowerpointslides Ch01 - University Of Utah

Transcription

Basic Practice of Statistics7th Edition1

In chapter 1 we cover Individuals and variables Categorical variables: pie charts and bargraphs Quantitative variables: histograms Interpreting histograms Quantitative variables: stemplots Time plots2

Statistics Statistics is a science that involves theextraction of information from numerical dataobtained during an experiment or from asample. It involves the design of theexperiment or sampling procedure, thecollection and analysis of the data, and makinginferences (statements) about the populationbased upon information in a sample. In short,Statistics is the science of data.3

Individuals and Variables A data set contains information on a number ofindividuals. The first step in dealing with data is toorganize your thinking about the data: Individuals (outcomes) the objects described by a set of data may be people, animals, or things Variable any characteristic of an individual can take different values for different individuals4

Example:Individual: an object described by a set of dataVariable: characteristic of the individual5

Figure 1.1, The Basic Practice of Statistics, 2015 W. H. Freeman6

IndividualsAirport administrators take a sample of airlinebaggage and record the number of bags that weighmore than 75 pounds. What is the individual?a) the number of bags weighing more than 75 poundsb) the average weight of the bagsc) each piece of baggaged) the airport administrators

Types of variablesCategorical variable: places individualsinto one of several groups or categoriesQuantitative variable: takes numerical values forwhich arithmetic operations make sense (usuallyrecorded in a unit of measurement)8

VariablesIs the variable “monthly rainfall in Michigan” acategorical variable or a quantitative variable?a) categoricalb) quantitative

Individuals and VariablesA survey involving 35 questions was given to a sample of 200students attending a university with an enrollment of 10,000.How many individuals do the data describe?a)b)c)d)3520010,000It is impossible to say.

Individuals and VariablesA survey involving 35 questions was given to a sample of 200students attending a university with an enrollment of 10,000.How many variables do the data contain?a)b)c)d)3520035 20035 10,000

Apply your knowledge12

Extra example: Case StudyWeight Gain SpellsHeart Risk for Women“Weight, weight change, and coronary heart diseasein women.” W.C. Willett, et. al., vol. 273(6), Journalof the American Medical Association, Feb. 8, 1995.(Reported in Science News, Feb. 4, 1995, p. 108)13

Extra example: Case StudyWeight Gain SpellsHeart Risk for WomenObjective:To recommend a range of body mass index (afunction of weight and height) in terms of coronaryheart disease (CHD) risk in women.14

Extra example: Case Study Study started in 1976 with 115,818 women aged 30to 55 years and without a history of previousCHD. Each woman’s weight (body mass) wasdetermined. Each woman was asked her weight at age 18. The cohort of women were followed for 14 years. The number of CHD (fatal and nonfatal) caseswere counted (1292 cases).15

Case StudyquantitativeVariables measured Age (in 1976) Weight in 1976 Weight at age 18categorical Incidence of coronary heartdisease Smoker or nonsmoker Family history of heart disease16

Distribution(getting the quantitative picture) Tells what values a variable takes and how often ittakes these values Can be a table, graph, or function The values of a categorical variable are labels for thecategories. The distribution of a categorical variable lists thecategories and gives either the count or the percentof individuals who fall in each category.17

Displaying Distributions Categorical variables Pie charts (when categories make a whole) Bar graphs Quantitative variables Histograms Stemplots (stem-and-leaf plots)18

Example: Which Major?Field of StudyData PercentTable of StudentsArts and Humanities10.6Biological 2Health professions12.8Math and computer science3.7Physical Sciences2.4Social sciences10.1Other majors andundeclared14.9TOTAL100.119

Example: Which Major?Figure 1.2, The Basic Practice of Statistics, 2015 W. H. Freeman20

Example: Which Major?Figure 1.3, The Basic Practice of Statistics, 2015 W. H. Freeman21

Pie Charts and Bar GraphsField of StudyFieldPercent of studentsArts and humanities10.6Biological 2Health professions12.8Math and computer science3.7Physical sciences2.4Social sciences10.1Other majors14.9Total100.1ß Rounding error22

Class Make-up on First DayData 3%Junior614.0%Senior920.9%Total43100.1%23

Class Make-up on First DayPie Chart24

Class Make-up on First DayBar Graph25

Example: U.S. Solid Waste (2000)Data TableMaterialWeight (million tons)Percent of totalFood scraps25.911.2 %Glass12.85.5 %Metals18.07.8 %Paper, paperboard86.737.4 %Plastics24.710.7 %Rubber, leather, textiles15.86.8 %Wood12.75.5 %Yard trimmings27.711.9 %7.53.2 %231.9100.0 %OtherTotal26

Example: U.S. Solid Waste (2000)Pie Chart27

Bar Graphs onlyEXAMPLE: What sources do Americans aged 12–24 yearsuse to keep up-to-date and learn about music?90AM/FM radio57Friends/family77Music television43Facebook49Pandora70Apple iTunes41Local stores37SiriusXM24Music blogs23iHeartRadio27Spotify3780Percent who have used that sourceSourcePercent whohave used706050403020100Note: for bargraphs,percentsdon’tnecessarilyadd to 100.

Categorical VariablesMarried respondents, in a survey, were asked how they mettheir spouses. According to the bar chart below, approximatelyhow many respondents met their spouses through a datingservice?a)b)c)d)2203070

Categorical VariablesBar graphs are more flexible than pie charts. Both typesof graph can display the distribution of a categoricalvariable, but a bar graph can also compare any set ofquantities measured in the same units.a) trueb) false

Categorical VariablesThe order of the categories in a bar chart should only bearranged alphabetically and should never be rearrangedby height.a) trueb) false

Quantitative data The distribution of a quantitative variable tellsus what values the variable takes on and howoften it takes those values. Histograms show the distribution of aquantitative variable by using bars whoseheight represents the number of individualswho take on a value within a particular class. Stemplots separate each observation into astem and a leaf that are then plotted todisplay the distribution while maintaining theoriginal values of the variable.32

Histograms Appropriate for quantitative variables thattake many values and/or large datasets. Divide the possible values into classes(equal widths). Count how many observations fall into eachinterval (may change to percents). Draw picture representing thedistribution―bar heights are equivalent tothe number (percent) of observations in eachinterval.33

HistogramsExample: Freshman Graduation Rate, or FGR, Data for2010-2011FGR DataNumber of States 2015105FGR55 - 6060 - 6565 - 7070 - 7575 - 8080 - 8585 - 90Count1236111690Freshman Graduation Rate34

Histograms: Class Intervals How many intervals? One rule is to calculate the square root of the samplesize, and round up. Size of intervals? Divide range of data (max-min) by number ofintervals desired, and round to convenient number Pick intervals so each observation can only fallin exactly one interval (no overlap)35

Example: Case StudyWeight DataIntroductory Statistics classSpring, 1997Virginia Commonwealth University36

Weight 712421512519437

Weight Data: Frequency TableWeight Group100 - 120120 - 140140 - 160Count9116160 - 180180 - 200200 - 220220 - 240240 - 260138501sqrt(53) 7.2, or 8 intervals; range (260-100 160) / 8 20 class width38

Number of studentsWeight Data: Histogram100 120 140160 180 200Weight220 240260* Left endpoint is included in the group, right endpoint is not.39

Interpreting HistogramsEXAMINING A HISTOGRAM In any graph of data, look for the overall patternand for striking deviations from that pattern. You can describe the overall pattern by itsshape, center, and variability. You willsometimes see variability referred to asspread(variation). An important kind of deviation is an outlier, anindividual that falls outside the overall pattern.40

Describing Distributions A distribution is symmetric if the right and left sides of thegraph are approximately mirror images of each other. Eg.Bell-shaped, other symmetric shapes A distribution is skewed to the right (right-skewed) if theright side of the graph (containing the half of theobservations with larger values) is much longer than theleft side. It is skewed to the left (left-skewed) if the left side of thegraph is much longer than the right side.SymmetricSkewed-leftSkewed-right41

Symmetric Bell-Shaped42

Symmetric Mound-Shaped43

Symmetric Uniform44

Asymmetric Skewed to the Left45

Asymmetric Skewed to the Right46

Stemplots (Stem-and-Leaf Plots)STEMPLOT To make a stemplot:1. Separate each observation into a stem, consistingof all but the final (rightmost) digit, and a leaf, thefinal digit. Stems may have as many digits asneeded, but each leaf contains only a single digit.2. Write the stems in a vertical column with thesmallest at the top, and draw a vertical line at theright of this column. Be sure to include all thestems needed to span the data, even when somestems will have no leaves.3. Write each leaf in the row to the right of its stem,in increasing order out from the stem.47

Stemplots Example: Weight Data – Introductory Statistics Class192152135110128180260170165150Stems 520318512313910610180016617011 009121300034578 13050035913155101142200818015 200257187161405551715700025514818 00005556715010619 22451802017232117502512722 012423133242526 0215125194Key20 3 means203 poundsStems 10’sLeaves 1’48s

Another example: Stemplots49

Stemplots (Stem-and-Leaf Plots) If there are very few stems (when the data coveronly a very small range of values), then we maywant to create more stems by splitting the originalstems. Example: If all of the data values were between150 and 179, then we may choose to use thefollowing stems:151516161717Leaves 0-4 would go on each upperstem (first “15”), and leaves 5-9 wouldgo on each lower stem (second “15”).51

Time Plots (1 of 3) A time plot shows behavior over time. Time is always on the horizontal axis, and the variablebeing measured is on the vertical axis. Look for an overall pattern (trend), and deviations fromthis trend. Connecting the data points by lines mayemphasize this trend. Look for patterns that repeat at known regular intervals(seasonal variations).52

Time Plots (2 of 3)Class Make-up On First Day70%60%Percent of ClassThat Are ear of Fall Semester199119921993!53

Time Plots (3 of 3)54

Quantitative Variables (1 of 12)Look at the following histogram of salaries of baseballplayers. What shape would you say the data take?a) bi-modalb) left-skewedc) right-skewedd) symmetrice) Uniform

Quantitative Variables (2 of 12)Look at the following histogram of percent urban populationin 50 U.S. states. What shape would you say the data take?a) bi-modalb) left-skewedc) right-skewedd) symmetrice) uniform

Quantitative Variables (3 of 12)Wine bottles at a small winery are sampled andtested for quality. One measurement isvolume: the bottles should be filled to 750 ml,with some variation expected. Based on thishistogram, does 750 ml seem to be areasonable center for the data?a)b)c)Yes, the center is approximately 750 ml.No, the center is too variable.No, the data range from 744 to 755 ml.

Quantitative Variables (4 of 12)Considering this histogram, about how many bottles ofwine have more than 752 ml?a)b)c)d)174465

Quantitative Variables (5 of 12)What three features can best describe the overall patternof a histogram?a) shape, center, and skewnessb) center, spread, and outliersc) center, mean, and symmetryd) shape, center, and spread

Quantitative Variables (6 of 12)In the dataset represented by the following stemplot, inwhich the leaf unit 1.0, how many times does the number“28” occur?0 91 2469992 1111345678889993 000112222345666699a)b)c)d)01344 0014455 00146 77 3

Quantitative Variables (7 of 12)What is one advantage of histograms over stemplots?a)b)c)d)Histograms can be created by hand.The dataset can be any size.The actual data can be extracted from histograms.Histograms can be horizontal or vertical.

Quantitative Variables (8 of 12)What is one advantage of stemplots over histograms?a)b)c)d)Stemplots can use classes of any size.The dataset can be any size.The actual data can be extracted from stemplots.Stemplots can be horizontal or vertical.

Quantitative Variables (9 of 12)a)b)c)d)Plot AYearPlot APlot BbothNone of the above.Residential Electricity ConsumptionResidential Electricity ConsumptionWhich of these plots is a time plot?Plot BNet Generation of Electricity

Quantitative Variables (10 of 12)What type of trend does the following time plot show?a) downwardb) upwardc) no trend

Quantitative Variables (11 of 12)What would be the correct interpretation of the following graph?a) There is an upward trend in the data.b) There is a downward trend in the data.c) The data show seasonal variation.

Quantitative Variables (12 of 12)An individual value that falls outside the overall pattern is called .the centerb) a shapec) an outlierd) a spreada)

Individuals and Variables A data set contains information on a number of individuals.The first step in dealing with data is to organize your thinking about the data: Individuals (outcomes) the objects described by a set of data may be people, animals, or things Variable any characteristic of an indiv