R For Stata Users - GBV

Transcription

Robert A. Muenchen JosephR for Stata Users4l1 SpringerM. Hilbe

es Between R and Stata21.3Why Learn R?31.4Is R Accurate?41.5What About Programming ConventionsTypographic Conventions556Installing and Updating R2.1Installing Add-on Packages2.2Loading an Add-on Package2.3Updating Your Installation2.4Uninstalling R2.5Choosing Repositories2.6Accessing Data in RWindowsRunningInteractivelyRunning R Interactively on MacintoshRunning R Interactively on Linux or UNIXRunning Programs That Include Other ProgramsRunning R in Batch ModeGraphicalonUser Interfaces1921232525263.6.1R Commander3.6.2Rattle for Data3.6.3JGR Java GUI for R26Mining2930ix

Contentsx4Helpand Documentation374.1Introduction374.2374.7Help FilesStarting HelpHelp ExamplesHelp for Functions ThatHelp for PackagesHelp for Data Sets4.8Books and Manuals424.9E-mail ll Other Functions404142the Web4343Programming Language Basics455.1Introduction455.2Simple465.3Data Structures475.3.1Vectors475.3.2Factors515.3.3Data ontrolling Functions with Arguments5.6.2Controlling Functions with Formulas5.6.3Controlling Functions with an Object's Class5.6.4Controlling Functions with Extractor FunctionsHow Much Output is There?Writing Your Own Functions (Macros)R Program Demonstrating Programming Basics70DataAcquisition916.1The R Data Editor916.2Reading Delimited Text Files6.2.1Reading Comma-Delimited5.75.85.9637Your Work67Document Your ProgramsFunctions (Commands)to7072737577818493Text Files6.2.2ReadingTab-Delimited Text Files6.2.3Missing Values6.2.4Trouble with Tabs6.2.5Skipping Variables6.2.6Example Programs for ReadingReadingText Data Withinfor Character Variables94959798in Delimited FilesFiles6.36999Delimited Text100aProgram102

xiContents1026.3.1The6.3.2The More General Approach6.3.3Easy ApproachExample Programs104Reading Text Data Withinfora104Program6.4Reading Fixed-Width Text Files,6.4.1Fixed-Width TextReadingExample ProgramsFiles, One Record Per CaseReading Fixed-Width Text Files, Two orfor110More Records perIllCase6.5.16.66.7to Read Fixed-Width Text FilesExample Programs112with Two Records per CaseImporting Data from Stata into R6.6.1Writing6.7.1113ProgramImport Data from StataData to a Comma-Delimited Text FileRtoExample Programs forWritinga71161191261281291307.8.3Using ComponentVariables toaNew Dataset131132Names132133134134134139Selecting lecting130Names in FormulasExample Programs for Variable SelectionStata Program to Select Variables7.13.1R Program to Select Variables7.13.28.5124The with Function7.138.41237.8.2Saving Selected8.31201297.128.2120The attach Function7.118.11197.8.1Selecting Variables with the subset FunctionSelecting Variables Using List IndexGenerating Indexes A to Z from Two Variable7.10114115Data from R to StataSelecting Variables7.1Selecting Variables in Stata7.2Selecting All Variables7.3Selecting Variables Using Index Numbers7.4Selecting Variables Using Column Names7.5Selecting Variables Using Logic7.6Selecting Variables Using String Search7.7Selecting Variables Using Notation7.8Selecting Variables Using Component acro Substitution6.4.26.5One Record per CaseObservations in Stata139All Observations140UsingIndex Numbers140UsingRow Names143ObservationsObservationsObservationsUsing Logic145

xiiContents8.68.78.88.98.108.119Selecting Observations Using String SearchSelecting Observations Using the subset FunctionGenerating Indexes A to Z from Two Row NamesVariable Selection Methods with No Counterpart for Selecting148Observations152Saving Selected Observations to a New Data FrameExample Programs for Selecting Observations8.11.1Stata Program to Select Observations8.11.2R Program to Select 3Variables and Observations157The subset Function157Observationsand VariablesNameSelectingby LogicbyUsing Names to Select Both Observations and VariablesUsing Numeric Index Values to Select Both Observations158159andVariables9.59.69.7160to Select Both Observations and VariablesUsing LogicSaving and Loading SubsetsExample Programs for Selecting Variables9.7.1StataProgramforSelecting161162and Observations.Observations9.7.210 Data162Variables andR Program for Selecting Variables and Observations162.163ManagementTransforming Variables10.1.1Example Programs for Transforming Variables16710.110.2FunctionsCommands? The apply Function DecidesApplying the mean Function172Finding N or NVALIDExample Programs for Applying e Programs for ConditionalMultiple Conditional Transformations10.4.1Missing Values10.5.1182183Example Programs for Multiple mations10.5171Conditional Transformations10.3.110.4167Substituting Means for Missing Values10.5.2Finding Complete Observations10.5.3When "99" Has Meaning10.5.4Example Programs to Assign Missing ValuesRenaming Variables (and Observations)10.6.1Renaming Variables—Advanced Examples10.6.2Renaming by Index10.6.3Renaming by Column Name185186188189190192194196197198

ContentsRenaming Many Sequentially Numbered ming ObservationsExample Programs for Renaming Variables20010.7.210.7.3200Variables204Recoding a Few VariablesRecoding Many VariablesExample Programs for Recoding10.7.110.8xiiiKeeping and Dropping Variables10.8.1Example Programs for 010.9Stacking/Appending Data Sets10.9.1Example Programs for Stacking/Appending10.10Joining/Merging Data Sets10.10.1 Example Programs for Joining/MergingCreating Collapsed or Aggregated Data Sets210Data 610.12 By10.13213214Data Sets.219219Merging Aggregates with Original DataTabular AggregationThe reshape PackageExample Programs for ample Programs for By or Split-file ProcessingRemoving Duplicate Observations10.13.1Example Programs for Removing DuplicateSelectingFirstorSortingGroupSelecting Last ObservationData FramesExample Programs for Sorting Data Sets10.17 Converting Data Structures10.17.1 Converting from Logical to Numeric Indexand BackYourValue 16.1Enhancing234236Last Observations perObservations and Back11.1226228Example Programs forper Group10.15 Reshaping Variables to Observations and Back10.15.1 Example Programs for Reshaping Variables11224Split-File Processing10.12.1Comparing Summarization Methodsor10.14.110.16217The aggregate FunctionThe tapply FunctionObservations10.14.253(and Measurement Level)25311.1.1Character Factors25411.1.2Numeric Factors256

xivContents11.1.311.1.411.2Making Factors of Many VariablesConverting Factors into Numeric or Character258Variables26011.1.5Dropping Factor Levels11.1.6Example Programs for Value Labels262orFormatsVariable Labels11.2.1266Variable Labels in The Hmisc11.2.2Long11.2.3OtherVariable NamesasThatPackage266Labels267Variable LabelsPackagesSupportExample Programs for Variable LabelsOutput for Word Processing and Web Pages11.3.1The xtable Package11.3.2Other Options for Formatting Output11.3.3Example Programs for Formatting Output11.2.411.312270270271272274275Generating Data12.1Generating Numeric Sequences12.2Generating Factors12.3Generating Repetitious Patterns (Not Factors)27727827928012.4Generating Integer Measures28112.5Generating Continuous MeasuresGenerating a Data Frame283Example Programs for Generating DataStata Program for Generating Data12.7.1R Program for Generating Data12.7.2285Managing Your Files and Workspace13.1Loading and Listing Objects13.2Understanding Your Search Path13.3Attaching Data Frames13.4Attaching Files13.5Removing Objects from Your Workspace13.6Minimizing Your Workspace13.7Setting Your Working Directory13.8Saving Your Workspace13.8.1Saving Your Workspace Manually13.8.2Saving Your Workspace Automatically13.9Getting Operating Systems to Show You ".RData"13.10 Organizing Projects with Windows Shortcuts13.11 Saving Your Programs and Output13.12 Saving Your History13.13 Large Data Set Considerations13.14 Example R Program for Managing Files29112.612.713263and les303304304305305307

Contents14 Graphics 4.3The Grammar ofGraphicsOther Graphics PackagesGraphics Procedures and Graphics Systems313Graphics DevicesPractice Data: mydatalOO31614.414.514.614.715 Traditional15.1315315318319GraphicsBar Plots31915.1.1Bar Plots of Counts15.1.2Bar Plots for15.1.3Bar Plots of Means319Subgroupsof Counts32432615.2Adding Titles, Labels, Colors,15.3Graphics15.4Pie Charts33115.5Dot Charts33315.6Histograms33315.6.1Parameters andandLegendsMultiple Plots on327aPageBasic Histograms33033433615.8Histograms Stacked15.6.3Histograms OverlaidNormal QQ PlotsStrip Charts15.9Scatter Plots and Line Plots34715.6.215.733734134215.9.1Scatter plots with Jitter35015.9.235015.9.3Scatter plots with Large Data SetsScatter plots with Lines15.9.4Scatter plots with Linear Fit by Group35315.9.5Scatter plots35415.9.6Scatter plots with Confidence Ellipse15.9.7Scatterplotsby GrouporLevel(Coplots)15.9.9356with Confidence and PredictionIntervals15.9.8352Plotting Labels InsteadScatter plot Matrices357of Points36236415.10 Dual-Axes Plots36615.11 Box Plots36815.12 Error Bar Plots37015.13 Interaction Plots37015.1415.1515.1615.17Adding Equationsand Symbols to GraphsSummary of Graphics Elements and ParametersPlot Demonstrating Many ModificationsExample Program for Traditional GraphicsStata Program for Traditional Graphics15.17.1R Program for Traditional Graphics15.17.2371372373374375375

xvi16ContentsGraphics with ggplot238516.1385Introduction16.1.2Overview qplot and ggplotMissing Values16.1.3Typographic16.1.116.238916.2.1Pie Charts16.2.2Bar Charts forPlots by Group16.4Presummarized Data16.5Dot ChartsAddingHistograms16.7.4394396397399Density PlotsHistogramsDensity PlotsHistograms with Density OverlaidHistograms for Groups, StackedandHistograms for Groups,Normal QQ PlotsStrip Plots16.7.516.816.9393GroupsTitles and 040040140140340440540540816.10 Scatter Plots and Line Plots16.10.1387388ConventionsBar Plots16.7.1386410Scatter Plots with JitterData Sets41116.10.2Scatter Plots for16.10.3Hexbin Plots41416.10.4Scatter Plots with Fit Lines41516.10.5Scatter Plots with Reference Lines41616.10.6Scatter Plots with Labels Instead of Points42016.10.7Changing Plot Symbols42116.10.8Scatter Plot with Linear Fits42216.10.9Scatter Plots Faceted for422Largeby GroupGroups16.10.10 Scatter Plot Matrix42416.11 Box Plots42516.12 Error Bar hmic AxesAspect RatioMultiple Plots on a PageSaving ggplot2 Graphs to a FileAn Example Specifying All DefaultsSummary of Graphic Elements and ParametersExample Programs for ggplot243043043143343343543645317 Statistics17.1Scientific Notation45317.2Descriptive Statistics45417.2.1The Hmisc describe Function45417.2.2The summary Function456

Contents17.317.417.2.3The table Function and Its Relatives45717.2.4The459meanFunction and Its RelativesCross-Tabulation46017.3.1The CrossTable Function46017.3.2The tables and chisq.test nction468470RegressionPlotting Diagnostics17.5.117.5.2473Comparing Models47417.5.317.617.717.817.9Making Predictions with New Datat-Test: Independent GroupsEquality of Variancet-Test: Paired or Repeated MeasuresWilcoxon Mann-Whitney Rank Sum Test: Independent475Groups47917.10 Wilcoxon17.11Analysis17.12 Sums ofSigned-RankTest: PairedGroupsof VarianceSquaresExample Programs17.14.1Stata17.14.2Rof R477478480486487for Statistical Testsfor Statistical TestsProgramProgram for Statistical Tests18 ConclusionGlossary47648117.13 The Kruskal-Wallis Test17.14xvii489489491497jargon499Comparison of Stata commands and R functions505Automating Your R SetupC.lSetting OptionsC.2Creating ObjectsC.3Loading Packages507C.4RunningC. 5Example 511D. lStata511D.2R512Example SimulationExample SimulationReferences513Index517

2.4 Uninstalling R 15 2.5 ChoosingRepositories 15 2.6 Accessing Data in Packages 17 3 RunningR 19 3.1 Running RInteractively onWindows 19 3.2 Running RInteractively onMacintosh 21 3.3 Running RInteractively on Linux or UNIX 23 3.4 Running Programs That Include Other Programs 25 3.5 Running Rin Batch Mode 25 3.6 GraphicalUser Interfaces 26 3.6.1 .