Lock5Data: Datasets For 'Statistics: UnLocking The Power Of Data' - RStudio

Transcription

Package ‘Lock5Data’July 23, 2021Title Datasets for Statistics: UnLocking the Power of Data''Version 3.0.0Maintainer Robin Lock rlock@stlawu.edu Description Datasets for the third edition of Statistics: Unlocking the Power of Data'' by Lock 5Includes version of datasets from earlier editions.Depends R ( 3.5.0)License GPL-2Encoding UTF-8LazyData trueRoxygenNote 7.1.1NeedsCompilation noAuthor Robin Lock [aut, cre]Repository CRANDate/Publication 2021-07-22 22:40:10 UTCR topics documented:Lock5Data-package .ACS . . . . . . . . .ACS2010 . . . . . .AllCountries . . . . .AllCountries1e . . .AllCountries2e . . .APMultipleChoice .April14Temps . . . .April14Temps1e . .April14Temps2e . .BaseballHits1e . . .BaseballHits2014 . .BaseballHits2019 . 0111112121314141516

R topics documented:2BaseballTimes . . . .Benford . . . . . . .BikeCommute . . . .BodyFat . . . . . . .BodyTemp50 . . . .BootAtlantaCorr . .CaffeineTaps . . . .CAOSExam . . . . .CarbonDioxide . . .CarbonDioxide2e . .CarDepreciation . . .Cars2015 . . . . . .Cars2020 . . . . . .Cereal . . . . . . . .CityTemps . . . . . .CityTemps2e . . . .CocaineTreatment . .ColaCalcium . . . .CollegeScores . . . .CollegeScores2yr . .CollegeScores4yr . .CommuteAtlanta . .CommuteStLouis . .CompassionateRats .CricketChirps . . . .DDS . . . . . . . . .DecemberFlights . .DecemberFlights2e .DietDepression . . .Digits . . . . . . . .DogOwner . . . . . .DrugResistance . . .EducationLiteracy . .EducationLiteracy2eElectionMargin . . .EmployedACS . . .EmployedACS2010 .ExerciseHours . . . .FacebookFriends . .FatMice18 . . . . . .FireAnts . . . . . . .FisherIris . . . . . .FishGills12 . . . . .FishGills3 . . . . . .Flight179 . . . . . .Flight433 . . . . . .Flight433 2e . . . .FloridaLakes . . . 53536363738383940404141424344444546464747484849

R topics documented:FootballBrain . . . . . . .ForestFires . . . . . . . .GeneticDiversity . . . . .GlobalInternet2010 . . . .GlobalInternet2019 . . . .GolfRound . . . . . . . .GPAbySex . . . . . . . . .GSWarriors2016 . . . . .GSWarriors2019 . . . . .HappyPlanetIndex . . . . .HeatCognition . . . . . . .HeightData . . . . . . . .HockeyPenalties2011 . . .HockeyPenalties2019 . . .HollywoodMovies . . . .HollywoodMovies2011 . .HollywoodMovies2013 . .HomesForSale . . . . . . .HomesForSale2e . . . . .HomesForSaleCA . . . . .HomesForSaleCA2e . . .HomesForSaleCanton . . .HomesForSaleCanton2e .HomesForSaleNY . . . . .HomesForSaleNY2e . . .HomingPigeons . . . . . .Honeybee . . . . . . . . .HoneybeeCircuits . . . . .HoneybeeWaggle . . . . .HotDogs1e . . . . . . . .HotDogs2015 . . . . . . .HotDogs2019 . . . . . . .HouseStarts2015 . . . . .HouseStarts2018 . . . . .HumanTears25 . . . . . .HumanTears50 . . . . . .Hurricanes2014 . . . . . .Hurricanes2018 . . . . . .ICUAdmissions . . . . . .ImmuneTea . . . . . . . .InkjetPrinters . . . . . . .LifeExpectancyVehicles ightatNight . . . . . . . .LightatNight4Weeks . . .LightatNight8Weeks . . .MalevolentUniformsNFL 7686969707071727273737475757676777878798080818283

R topics documented:4MalevolentUniformsNHL .MammalLongevity . . . .ManhattanApartments . . .ManhattanApartments2011MarriageAges . . . . . . .MastersGolf . . . . . . . .MateChoice . . . . . . . .MentalMuscle . . . . . . .MiamiHeat . . . . . . . .MindsetMatters . . . . . .MustangPrice . . . . . . .NBAPlayers2011 . . . . .NBAPlayers2015 . . . . .NBAPlayers2019 . . . . .NBAStandings2011 . . . .NBAStandings2016 . . . .NBAStandings2019 . . . .NFLContracts2015 . . . .NFLContracts2019 . . . .NFLPreSeason2014 . . . .NFLPreseason2019 . . . .NFLScores2011 . . . . . .NFLScores2018 . . . . . .NHANES . . . . . . . . .NutritionStudy . . . . . .OlympicMarathon2008 . .OlympicMarathon2012 . .OlympicMarathon2016 . .OrganicEffect . . . . . . .OttawaSenators . . . . . .OttawaSenators2010 . . .OttawaSenators2019 . . .PASeniors . . . . . . . . .PizzaGirl . . . . . . . . .PumpkinBeer . . . . . . .QuizPulse10 . . . . . . . .RandomP50N200 . . . . .RestaurantTips . . . . . .RetailSales . . . . . . . .RetailSales2011 . . . . . .RockandRoll2012 . . . . .RockandRoll2015 . . . . .RockandRoll2019 . . . . .SalaryGender . . . . . . .SampColleges . . . . . . .SampColleges2yr . . . . .SampColleges4yr . . . . .SampCountries . . . . . 1112112113114115117118

Lock5Data-package5SampCountries1e . . . .SampCountries2e . . . .SandP500 . . . . . . . .SandP5001e . . . . . . .SandP5002e . . . . . . .SandwichAnts . . . . . .SandwichAnts2 . . . . .SkateboardPrices . . . .SleepCaffeine . . . . . .SleepStudy . . . . . . .Smiles . . . . . . . . . .SpeedDating . . . . . . .SplitBill . . . . . . . . .StatGrades . . . . . . . .StockChanges . . . . . .StorySpoilers . . . . . .StressedMice . . . . . .StudentSurvey . . . . . .SynchronizedMovementTenCountries . . . . . .TenCountries1e . . . . .TenCountries2e . . . . .TextbookCosts . . . . .ToenailArsenic . . . . .TrafficFlow . . . . . . .USStates . . . . . . . . .USStates1e . . . . . . .USStates2e . . . . . . .WaterStriders . . . . . .WaterTaste . . . . . . . .Wetsuits . . . . . . . . .YoungBlood . . . . . . 37139140141141142143144Lock5 DatasetsDescriptionDatasets for first, second, and third editions of Statistics: Unlocking the Power of Data by Lock 5

s)Robin LockMaintainer: Robin Lock rlock@stlawu.edu ACSAmerican Community SurveyDescriptionData from a sample of individuals in the American Community SurveyFormatA data frame with 2000 observations on the following 9 variables.Sex 0 female and 1 maleAge Age (years)Married 0 not married and 1 marriedIncome Wages and salary for the past 12 months (in 1,000’s)HoursWk Hours of work per weekRace asian, black, other, or whiteUSCitizen 1 citizen and 0 noncitizenHealthInsurance 1 have health insurance and 0 no health insuranceLanguage 1 English spoken at home and 0 otherDetailsThe American Community Survey, administered by the US Census Bureau, is given every year to arandom sample of about 3.5 million households (about 3% of all US households). Data on a randomsample of 1% of all US residents are made public (after ensuring anonymity), and we have selecteda random sub-sample of n 2000 from the 2017 data for this dataset.** Updated for 3e (earlier version is ACS2010). **

ACS20107SourceThe full public dataset can be downloaded at ata.html, and the full list of variables are at ata/documentation.htmlACS2010American Community Survey - 2010DescriptionData from a sample of individuals in the 2010 American Community SurveyFormatA dataset with 1000 observations on the following 9 ealthInsuranceLanguage0 female and 1 maleAge (years)0 not married and 1 marriedWages and salary for the past 12 months (in 1,000’s)Hours of work per weekasian, black, white, or other1 citizen and 0 noncitizen1 have health insurance and 0 no health insurance1 native English speaker and 0 otherDetailsThe American Community Survey, administered by the US Census Bureau, is given every year to arandom sample of about 3.5 million households (about 3% of all US households). Data on a randomsample of 1% of all US residents are made public (after ensuring anonymity), and we have selecteda random sub-sample of n 1000 from the 2010 data for this dataset.** From 2e - dataset has been updated for 3e **SourceThe full public dataset can be downloaded athttp://www.census.gov/acs/www/data documentation/pums data/,and the full list of variables are athttp://www.census.gov/acs/www/Downloads/data documentation/pums/DataDict/PUMSDataDict10.pdf.

8AllCountriesAllCountriesAll CountriesDescriptionData on the countries of the worldFormatA data frame with 217 observations on the following 26 variables.Country Country nameCode Three-letter code for countryLandArea Size in 1000 sq. km.Population Population in millionsDensity Number of people per square kilometerGDP Gross Domestic Product (in US) per capitaRural Percentage of population living in rural areasCO2 CO2 emissions (metric tons per capita)PumpPrice Price for a liter of gasoline ( US)Military Percentage of government expenditures directed toward the militaryHealth Percentage of government expenditures directed towards healthcareArmedForces Number of active duty military personnel (in 1,000’s)Internet Percentage of the population with access to the internetCell Cell phone subscriptions (per 100 people)HIV Percentage of the population with HIVHunger Percent of the population considered undernourishedDiabetes Percent of the population diagnosed with diabetesBirthRate Births per 1000 peopleDeathRate Deaths per 1000 peopleElderlyPop Percentage of the population at least 65 years oldLifeExpectancy Average life expectancy (years)FemaleLabor Percent of females 15 - 64 in the labor forceUnemployment Percent of labor force unemployedEnergy Kilotons of oil equivalentElectricity Electric power consumption (kWh per capita)Developed Categories for kilowatt hours per capita, 1 under 2500, 2 2500 to 5000, 3 over 5000

AllCountries1e9DetailsData for each variable were collected for 2018 (or most recently available year). Within a variableall country measurements are from the same year, but the year may vary between different variablesdepending on availability.** This dataset is updated from an earlier versions (now Allcountries1e and AllCountries2e) **SourceThe data were gathered online from https://data.worldbank.org/. Accessed June 2019.AllCountries1eAllCountries - 1eDescriptionData on the countries of the worldFormatA dataset with 213 observations on the following 18 PopLifeExpectancyCO2GDPCellElectricityName of the countryThree letter country codeSize in sq. kilometersPopulation in millionsEnergy usage (kilotons of oil)Percentage of population living in rural areasPercentage of government expenditures directed toward the militaryPercentage of government expenditures directed towards healthcarePercentage of the population with HIVPercentage of the population with access to the internetCategories for kilowatt hours per capita, 1 under 2500, 2 2500 to 5000, 3 over 5000Births per 1000 peoplePercentage of the population at least 65 years oldAverage life expectancy (years)CO2 emissions (metric tons per capita)Gross Domestic Product (per capita)Cell phone subscriptions (per 100 people)Electric power consumption (kWh per capita)DetailsMost data from 2008 to avoid many missing values in more recent years.** From 1e - dataset has been updated for 2e **

10AllCountries2eSourceData collected from the World Bank website, worldbank.org.AllCountries2eAllCountries - 2eDescriptionData on the countries of the worldFormatA dataset with 215 observations on the following 25 velopedName of the countrySize in 1000 sq. kilometersPopulation in millionsNumber of people per square kilometerGross Domestic Product (in US) per capitaPercentage of population living in rural areasCO2 emissions (metric tons per capita)Price for a liter of gasoline ( US)Percentage of government expenditures directed toward the militaryPercentage of government expenditures directed towards healthcareNumber of active duty military personnel (in 1,000’s)Percentage of the population with access to the internetCell phone subscriptions (per 100 people)Percentage of the population with HIVPercent of the population considered undernourishedPercent of the population diagnosed with diabetesBirths per 1000 peopleDeaths per 1000 peoplePercentage of the population at least 65 years oldAverage life expectancy (years)Percent of females 15 - 64 in the labor forcePercent of labor force unemployedEnergy usage (kilotons of oil equivalent)Electric power consumption (kWh per capita)Categories for kilowatt hours per capita, 1 under 2500, 2 2500 to 5000, 3 over 5000DetailsData for each variable were collected for years between 2012 and 2014. Within a variable allcountry measurements are from the same year, but the year may vary between different variablesdepending on availability.** From 2e - dataset has been updated for 3e **

April14Temps11SourceData collected from the World Bank website, worldbank.org.APMultipleChoiceAP Multiple ChoiceDescriptionCorrect responses on Advanced Placement multiple choice examsFormatA dataset with 400 observations on the following variable.AnswerCorrect response: A, B, C, D, or EDetailsCorrect responses from multiple choice sections for a sample of released Advanced PlacementexamsSourceSample exams from several disciplines at http://apcentral.collegeboard.comApril14TempsApril 14th TemperaturesDescriptionTemperatures in Des Moines, IA and San Francisco, CA on April 14thFormatA data frame with 25 observations on the following 3 variables.Year 1995 to 2019DesMoines Temperature in Des Moines (degrees F)SanFrancisco Temperature in San Francisco (degrees F)DetailsAverage temperature for the day of April 14th in each of 25 years from 1995-2019** Data set updated for 3e (earlier versions are now April14Temps1e and April14Temps2e) **

12April14Temps2eSourceThe University of Dayton Average Daily Temperature Archive at citylistUS.htmApril14Temps1eApril 14th Temperatures -1eDescriptionTemperatures in Des Moines, IA and San Francisco, CA on April 14thFormatA dataset with 16 observations on the following 3 ature in Des Moines (degrees F)Temperature in San Francisco (degrees F)DetailsAverage temperature for the day of April 14th in each of 16 years from 1995-2010** From 1e - dataset has been updated for 2e **SourceThe University of Dayton Average Daily Temperature Archive /citylistUS.htmApril14Temps2eApril 14th Temperatures - 2eDescriptionTemperatures in Des Moines, IA and San Francisco, CA on April 14thFormatA dataset with 21 observations on the following 3 variables.YearDesMoinesSanFrancisco1995 to 2015Temperature in Des Moines (degrees F)Temperature in San Francisco (degrees F)

BaseballHits1e13DetailsAverage temperature for the day of April 14th in each of 21 years from 1995-2015** From 2e - dataset has been updated for 3e **SourceThe University of Dayton Average Daily Temperature Archive /citylistUS.htmBaseballHits1eBaseball HitsDescriptionNumber of hits, wins, and other stats for MLB teams - 2011FormatA dataset with 30 observations on the following 14 tingAvgName of baseball teamEither American AL or National NL LeagueNumber of wins for the seasonNumber of runs scoredNumber of hitsNumber of doublesNumber of triplesNumber of home runsNumber of runs batted inNumber of stolen basesNumber of times caught stealingNumber of walksNumber of strikeoutsTeam batting averageDetailsData from the 2010 Major League Baseball regular season.** From 1e - dataset has been updated for 2e MLB/2011-standard-batting.shtml

14BaseballHits2019BaseballHits2014Baseball Hits - 2014DescriptionNumber of hits, wins, and other stats for MLB teams - 2014FormatA dataset with 30 observations on the following 14 tingAvgName of baseball team (3-character code)Either AL or NLNumber of wins for the seasonNumber of runs scoredNumber of hitsNumber of doublesNumber of triplesNumber of home runsNumber of runs batted inNumber of stolen basesNumber of times caught stealingNumber of walksNumber of strikeoutsTeam batting averageDetailsData from the 2014 Major League Baseball regular season.** From 2e - dataset has been updated for 3e eball Team Statistics (2019)DescriptionNumber of hits, wins, and other stats for MLB teams in 2019

BaseballSalaries201515FormatA data frame with 30 observations on the following 14 variables.Team Name of baseball team (3-character code)League Either AL or NLWins Number of wins for the seasonRuns Number of runs scoredHits Number of hitsDoubles Number of doublesTriples Number of triplesHomeRuns Number of home runsRBI Number of runs batted inStolenBases Number of stolen basesCaughtStealing Number of times caught stealingWalks Number of walksStrikeouts Number of strikeoutsBattingAvg Team batting averageDetailsOffensive team statistics for the 2019 Major League Baseball regular season.** Updated for 3e (earlier versions are now BaseballHits2014 and allSalaries2015MLB Player Salaries in 2015DescriptionOpening Day salaries for all Major League Baseball players in 2015FormatA dataset with 868 observations on the following 4 variables.NameSalaryTeamPositionPlayer’s name2015 season salary (in millions)Abbreviated team nameCode for player’s main position

16BaseballTimesDetailsYearly salary (in millions of dollars) for all players on the rosters of Major League Baseball teamsat the start of the 2015 season.** From 2e - dataset has been updated for 3e sBaseballSalaries2019MLB Player Salaries in 2019DescriptionOpening Day salaries for all Major League Baseball players in 2019FormatA data frame with 877 observations on the following 4 variables.Name Player’s nameSalary 2019 season salary (in millions)Team Abbreviated team namePOS Code for player’s main positionDetailsYearly salary (in millions of dollars) for all players on the rosters of Major League Baseball teamsat the start of the 2019 season.** Updated for 3e (earlier version for 2015 is at BaseballSalaries2015). s/BaseballTimesBaseball Game TimesDescriptionInformation for a sample of 30 Major League Baseball games played during the 2011 seasonFormatA dataset with 30 observations on the following 9 variables.

TimeAway team nameHome team nameTotal runs scored (both teams)Margin of victoryTotal number of hits (both teams)Total number of errors (both teams)Total number of pitchers used (both teams)Total number of walks (both teams)Elapsed time for game (in minutes)DetailsData from a sample of boxscores for Major League Baseball games played in August /2011.shtmlBenfordBenford dataDescriptionTwo examples to test Benford’s LawFormatA dataset with 9 observations on the following 4 variables.DigitBenfordPAddressInvoicesLeading digit (1-9)Expected proportion according to Benford’s lawFrequency as a first digit in an addressFrequency as the first digit in invoice amountsDetailsLeading digits from 1188 addresses sampled from a phone book and 7273 amounts from invoicessampled at a company.SourceThanks to Prof. Richard Cleary for providing the data

18BodyFatBike CommuteBikeCommuteDescriptionCommute times for two kinds of bicycleFormatA dataset with 56 observations on the following 9 eedSecondsMonthType of material Carbon or SteelDate of the bike commuteLength of commute (in miles)Total commute time (hours:minutes:seconds)Time converted to minutesAverage speed during the ride (miles per hour)Maximum speed (miles per hour)Time converted to secondsCategories: 1Jan 2Feb 3Mar 4Apr 5May 6June 7JulyDetailsData from a personal experiment to compare commuting time based on a randomized selectionbetween two bicycles made of different materials.SourceThanks to Dr. Groves for providing his data.ReferencesBicycle weight and commuting time: randomised trial, in British Medical Journal, BMJ 2010;341:c6801.BodyFatBody MeasurementsDescriptionPercent fat and other body measurements for a sample of menFormatA dataset with 100 observations on the following 10 variables.

AnkleBicepsWristPercent body fatAge in yearsWeight in poundsHeight in inchesNeck circumference in cm.Chest circumference in cm.Abdomen circumference in cm.Ankle circumference in cm.Extended biceps circumference in cm.Wrist circumference in cm.DetailsThis is a subset of a larger sample of men who each had a percent body fat estimated by an underwater weighing technique. Other measurements were taken to see how they might be used topredict the body fat percentage.SourceThese data were contributed by Roger Johnson, then at Carleton University, to the Datasets Archiveat the Journal of Statistics v4n1/datasets.johnson.htmlThe data were originally supplied by Dr. A. Garth Fisher, Human Performance Research Center,Brigham Young University, Provo, Utah 84602.BodyTemp50Body TemperaturesDescriptionSample of 50 body temperaturesFormatA data frame with 50 observations on the following 3 variables.BodyTemp Body temperature in degrees FPulse Pulse rates (beat per minute)Sex F Female, M MaleDetailsBody temperatures and pulse rates for a sample of 50 healthy adults. Note the Sex variable was labeled as Gender in earlier versions of this dataset. We acknowledge that this binary dichotomizationis not a complete or inclusive representation of reality.

20CaffeineTapsSourceShoemaker, "What’s Normal: Temperature, Gender and Heartrate", Journal of Statistics Education,Vol. 4, No. 2 r.htmlBootstrap Correlations for Atlanta CommutesBootAtlantaCorrDescriptionBootstrap correlations between Time and Distance for 500 commuters in AtlantaFormatA dataset with 1000 observations on the following variable.CorrTimeDistCorrelation between Time and Distance for a bootstrap sample of Atlanta commutersDetailsCorrelations for bootstrap samples of Time vs. Distance for the data on Atlanta commuters inCommuteAtlanta.SourceComputer simulationCaffeine TapsCaffeineTapsDescriptionFinger tap rates with and without caffeineFormatA dataset with 20 observations on the following 2 variables.TapsGroupNumber of finger taps in one minuteTreatment with levels Caffeine NoCaffeine

CarbonDioxide21DetailsResults from a double-blind experiment where a sample of male college students were asked to taptheir fingers at a rapid rate. The sample was then divided at random into two groups of ten studentseach. Each student drank the equivalent of about two cups of coffee, which included about 200 mgof caffeine for the students in one group but was decaffeinated coffee for the second group. After atwo hour period, each student was tested to measure f

Title Datasets for Statistics: UnLocking the Power of Data'' Version 3.0.0 Maintainer Robin Lock rlock@stlawu.edu Description Datasets for the third edition of Statistics: Unlocking the Power of Data'' by Lock 5 Includes version of datasets from earlier editions. Depends R ( 3.5.0) License GPL-2 Encoding UTF-8 LazyData true RoxygenNote .