Advanced R - γσ ξηg

Transcription

Advanced R 2015 by Taylor & Francis Group, LLCK20319 FM.indd 18/25/14 12:28 PM

Chapman & Hall/CRCThe R SeriesSeries EditorsJohn M. ChambersDepartment of StatisticsStanford UniversityStanford, California, USATorsten HothornDivision of BiostatisticsUniversity of ZurichSwitzerlandDuncan Temple LangDepartment of StatisticsUniversity of California, DavisDavis, California, USAHadley WickhamRStudioBoston, Massachusetts, USAAims and ScopeThis book series reflects the recent rapid growth in the development and applicationof R, the programming language and software environment for statistical computingand graphics. R is now widely used in academic research, education, and industry.It is constantly growing, with new versions of the core software released regularlyand more than 5,000 packages available. It is difficult for the documentation tokeep pace with the expansion of the software, and this vital book series provides aforum for the publication of books covering many aspects of the development andapplication of R.The scope of the series is wide, covering three main threads: Applications of R to specific disciplines such as biology, epidemiology,genetics, engineering, finance, and the social sciences. Using R for the study of topics of statistical methodology, such as linear andmixed modeling, time series, Bayesian methods, and missing data. The development of R, including programming, building packages, andgraphics.The books will appeal to programmers and developers of R software, as well asapplied statisticians and data analysts in many fields. The books will featuredetailed worked examples and R code fully integrated into the text, ensuring theirusefulness to researchers, practitioners and students. 2015 by Taylor & Francis Group, LLCK20319 FM.indd 28/25/14 12:28 PM

Published TitlesStated Preference Methods Using R, Hideo Aizaki, Tomoaki Nakatani,and Kazuo SatoUsing R for Numerical Analysis in Science and Engineering, Victor A. BloomfieldEvent History Analysis with R, Göran BroströmComputational Actuarial Science with R, Arthur CharpentierStatistical Computing in C and R, Randall L. Eubank and Ana KupresaninReproducible Research with R and RStudio, Christopher GandrudIntroduction to Scientific Programming and Simulation Using R, Second Edition,Owen Jones, Robert Maillardet, and Andrew RobinsonNonparametric Statistical Methods Using R, John Kloke and Joseph McKeanDisplaying Time Series, Spatial, and Space-Time Data with R,Oscar Perpiñán LamigueiroProgramming Graphical User Interfaces with R, Michael F. Lawrenceand John VerzaniAnalyzing Sensory Data with R, Sébastien Lê and Theirry WorchAnalyzing Baseball Data with R, Max Marchi and Jim AlbertGrowth Curve Analysis and Visualization Using R, Daniel MirmanR Graphics, Second Edition, Paul MurrellMultiple Factor Analysis by Example Using R, Jérôme PagèsCustomer and Business Analytics: Applied Data Mining for Business DecisionMaking Using R, Daniel S. Putler and Robert E. KriderImplementing Reproducible Research, Victoria Stodden, Friedrich Leisch,and Roger D. PengUsing R for Introductory Statistics, Second Edition, John VerzaniAdvanced R, Hadley WickhamDynamic Documents with R and knitr, Yihui Xie 2015 by Taylor & Francis Group, LLCK20319 FM.indd 38/25/14 12:28 PM

Advanced RHadley Wickham 2015 by Taylor & Francis Group, LLCK20319 FM.indd 58/25/14 12:28 PM

CRC PressTaylor & Francis Group6000 Broken Sound Parkway NW, Suite 300Boca Raton, FL 33487-2742 2015 by Taylor & Francis Group, LLCCRC Press is an imprint of Taylor & Francis Group, an Informa businessNo claim to original U.S. Government worksVersion Date: 20140813International Standard Book Number-13: 978-1-4665-8697-0 (eBook - PDF)This book contains information obtained from authentic and highly regarded sources. Reasonableefforts have been made to publish reliable data and information, but the author and publisher cannotassume responsibility for the validity of all materials or the consequences of their use. The authors andpublishers have attempted to trace the copyright holders of all material reproduced in this publicationand apologize to copyright holders if permission to publish in this form has not been obtained. If anycopyright material has not been acknowledged please write and let us know so we may rectify in anyfuture reprint.Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,transmitted, or utilized in any form by any electronic, mechanical, or other means, now known orhereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers.For permission to photocopy or use material electronically from this work, please access www.copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc. (CCC), 222Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organization that provides licenses and registration for a variety of users. For organizations that have been granted a photocopy license by the CCC, a separate system of payment has been arranged.Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and areused only for identification and explanation without intent to infringe.Visit the Taylor & Francis Web site athttp://www.taylorandfrancis.comand the CRC Press Web site athttp://www.crcpress.com

To Jeff, who makes me happy, and who madesure I had a life outside this book. 2015 by Taylor & Francis Group, LLC

2015 by Taylor & Francis Group, LLC

Contents1 IntroductionI11.1Who should read this book . . . . . . . . . . . . . . . .31.2What you will get out of this book . . . . . . . . . . . .31.3Meta-techniques . . . . . . . . . . . . . . . . . . . . . .41.4Recommended reading. . . . . . . . . . . . . . . . . .51.5Getting help. . . . . . . . . . . . . . . . . . . . . . . .61.6Acknowledgments1.7Conventions1.8. . . . . . . . . . . . . . . . . . . . .6. . . . . . . . . . . . . . . . . . . . . . . .8Colophon . . . . . . . . . . . . . . . . . . . . . . . . . .8Foundations112 Data structures2.12.213Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . .142.1.1Atomic vectors . . . . . . . . . . . . . . . . . . .152.1.1.1Types and tests . . . . . . . . . . . . .162.1.1.2Coercion . . . . . . . . . . . . . . . . .162.1.2Lists . . . . . . . . . . . . . . . . . . . . . . . . .172.1.3Exercises . . . . . . . . . . . . . . . . . . . . . .19Attributes. . . . . . . . . . . . . . . . . . . . . . . . .2.2.0.12.3Names19. . . . . . . . . . . . . . . . . .202.2.1Factors . . . . . . . . . . . . . . . . . . . . . . .212.2.2Exercises . . . . . . . . . . . . . . . . . . . . . .23Matrices and arrays . . . . . . . . . . . . . . . . . . . .24ix 2015 by Taylor & Francis Group, LLC

xContents2.3.12.42.5Exercises . . . . . . . . . . . . . . . . . . . . . .Data frames. . . . . . . . . . . . . . . . . . . . . . . .272.4.1Creation . . . . . . . . . . . . . . . . . . . . . . .272.4.2Testing and coercion . . . . . . . . . . . . . . . .282.4.3Combining data frames . . . . . . . . . . . . . .282.4.4Special columns . . . . . . . . . . . . . . . . . . .292.4.5Exercises . . . . . . . . . . . . . . . . . . . . . .30Answers . . . . . . . . . . . . . . . . . . . . . . . . . . .313 Subsetting3.13.22633Data types . . . . . . . . . . . . . . . . . . . . . . . . .343.1.1Atomic vectors . . . . . . . . . . . . . . . . . . .343.1.2Lists . . . . . . . . . . . . . . . . . . . . . . . . .373.1.3Matrices and arrays . . . . . . . . . . . . . . . .373.1.4Data frames . . . . . . . . . . . . . . . . . . . . .383.1.5S3 objects . . . . . . . . . . . . . . . . . . . . . .393.1.6S4 objects . . . . . . . . . . . . . . . . . . . . . .393.1.7Exercises . . . . . . . . . . . . . . . . . . . . . .39Subsetting operators . . . . . . . . . . . . . . . . . . . .403.2.1Simplifying vs. preserving subsetting . . . . . . .413.2.2 . . . . . . . . . . . . . . . . . . . . . . . . . . .433.2.3Missing/out of bounds indices . . . . . . . . . . .443.2.4Exercises . . . . . . . . . . . . . . . . . . . . . .453.3Subsetting and assignment. . . . . . . . . . . . . . . .453.4Applications. . . . . . . . . . . . . . . . . . . . . . . .463.4.1Lookup tables (character subsetting) . . . . . . .463.4.2Matching and merging by hand (integer subsetting) . . . . . . . . . . . . . . . . . . . . . . . . .473.4.3Random samples/bootstrap (integer subsetting)483.4.4Ordering (integer subsetting) . . . . . . . . . . .49 2015 by Taylor & Francis Group, LLC

xiContents3.53.4.5Expanding aggregated counts (integer subsetting)503.4.6Removing columns from data frames (charactersubsetting) . . . . . . . . . . . . . . . . . . . . .513.4.7Selecting rows based on a condition (logical subsetting) . . . . . . . . . . . . . . . . . . . . . . .513.4.8Boolean algebra vs. sets (logical & integer subsetting) . . . . . . . . . . . . . . . . . . . . . . . . .533.4.9Exercises . . . . . . . . . . . . . . . . . . . . . .54Answers . . . . . . . . . . . . . . . . . . . . . . . . . . .554 Vocabulary574.1The basics. . . . . . . . . . . . . . . . . . . . . . . . .4.2Common data structures. . . . . . . . . . . . . . . . .594.3Statistics . . . . . . . . . . . . . . . . . . . . . . . . . .604.4Working with R. . . . . . . . . . . . . . . . . . . . . .614.5I/O. . . . . . . . . . . . . . . . . . . . . . . . . . . . .625 Style guide5.15.25.35763Notation and naming. . . . . . . . . . . . . . . . . . .635.1.1File names . . . . . . . . . . . . . . . . . . . . .635.1.2Object names . . . . . . . . . . . . . . . . . . . .64. . . . . . . . . . . . . . . . . . . . . . . . . . .655.2.1Spacing . . . . . . . . . . . . . . . . . . . . . . .655.2.2Curly braces . . . . . . . . . . . . . . . . . . . .665.2.3Line length . . . . . . . . . . . . . . . . . . . . .675.2.4Indentation . . . . . . . . . . . . . . . . . . . . .675.2.5Assignment . . . . . . . . . . . . . . . . . . . . .67Organisation . . . . . . . . . . . . . . . . . . . . . . . .685.3.168SyntaxCommenting guidelines . . . . . . . . . . . . . . 2015 by Taylor & Francis Group, LLC

xiiContents6 Functions6.16.269Function components. . . . . . . . . . . . . . . . . . .716.1.1Primitive functions . . . . . . . . . . . . . . . . .716.1.2Exercises . . . . . . . . . . . . . . . . . . . . . .72Lexical scoping . . . . . . . . . . . . . . . . . . . . . . .736.2.1Name masking . . . . . . . . . . . . . . . . . . .746.2.2Functions vs. variables . . . . . . . . . . . . . . .756.2.3A fresh start . . . . . . . . . . . . . . . . . . . .766.2.4Dynamic lookup . . . . . . . . . . . . . . . . . .776.2.5Exercises . . . . . . . . . . . . . . . . . . . . . .786.3Every operation is a function call. . . . . . . . . . . .796.4Function arguments . . . . . . . . . . . . . . . . . . . .816.4.1Calling functions . . . . . . . . . . . . . . . . . .816.4.2Calling a function given a list of arguments . . .836.4.3Default and missing arguments . . . . . . . . . .836.4.4Lazy evaluation . . . . . . . . . . . . . . . . . . .846.4.5. . . . . . . . . . . . . . . . . . . . . . . . . . .886.4.6Exercises . . . . . . . . . . . . . . . . . . . . . .896.56.66.7Special calls. . . . . . . . . . . . . . . . . . . . . . . .896.5.1Infix functions . . . . . . . . . . . . . . . . . . .906.5.2Replacement functions . . . . . . . . . . . . . . .916.5.3Exercises . . . . . . . . . . . . . . . . . . . . . .93Return values. . . . . . . . . . . . . . . . . . . . . . .946.6.1On exit . . . . . . . . . . . . . . . . . . . . . . .976.6.2Exercises . . . . . . . . . . . . . . . . . . . . . .97Quiz answers . . . . . . . . . . . . . . . . . . . . . . . .98 2015 by Taylor & Francis Group, LLC

Contentsxiii7 OO field guide997.1Base types. . . . . . . . . . . . . . . . . . . . . . . . .1017.2S3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1027.37.2.1Recognising objects, generic functions, and methods . . . . . . . . . . . . . . . . . . . . . . . . . .1027.2.2Defining classes and creating objects . . . . . . .1057.2.3Creating new methods and generics . . . . . . .1067.2.4Method dispatch . . . . . . . . . . . . . . . . . .1077.2.5Exercises . . . . . . . . . . . . . . . . . . . . . .109S4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1117.3.1Recognising objects, generic functions, and methods . . . . . . . . . . . . . . . . . . . . . . . . . .1117.3.2Defining classes and creating objects . . . . . . .1137.3.3Creating new methods and generics . . . . . . .1157.3.4Method dispatch . . . . . . . . . . . . . . . . . .1157.3.5Exercises . . . . . . . . . . . . . . . . . . . . . .116. . . . . . . . . . . . . . . . . . . . . . . . . . . . .1167.4.1Defining classes and creating objects . . . . . . .1177.4.2Recognising objects and methods . . . . . . . . .1197.4.3Method dispatch . . . . . . . . . . . . . . . . . .1197.4.4Exercises . . . . . . . . . . . . . . . . . . . . . .1207.5Picking a system . . . . . . . . . . . . . . . . . . . . . .1207.6Quiz answers . . . . . . . . . . . . . . . . . . . . . . . .1217.4RC8 Environments8.1Environment basics. . . . . . . . . . . . . . . . . . . .124Exercises . . . . . . . . . . . . . . . . . . . . . .130Recursing over environments . . . . . . . . . . . . . . .1308.2.11328.1.18.28.3123Exercises . . . . . . . . . . . . . . . . . . . . . .Function environments 2015 by Taylor & Francis Group, LLC. . . . . . . . . . . . . . . . . .133

xivContents8.48.3.1The enclosing environment . . . . . . . . . . . .1338.3.2Binding environments . . . . . . . . . . . . . . .1348.3.3Execution environments . . . . . . . . . . . . . .1368.3.4Calling environments . . . . . . . . . . . . . . . .1388.3.5Exercises . . . . . . . . . . . . . . . . . . . . . .140Binding names to values. . . . . . . . . . . . . . . . .141Exercises . . . . . . . . . . . . . . . . . . . . . .143Explicit environments . . . . . . . . . . . . . . . . . . .1448.5.1Avoiding copies . . . . . . . . . . . . . . . . . . .1458.5.2Package state . . . . . . . . . . . . . . . . . . . .1468.5.3As a hashmap . . . . . . . . . . . . . . . . . . . .146Quiz answers . . . . . . . . . . . . . . . . . . . . . . . .1478.4.18.58.69 Debugging, condition handling, and defensive programming1499.1Debugging techniques . . . . . . . . . . . . . . . . . . .1519.2Debugging tools. . . . . . . . . . . . . . . . . . . . . .1539.2.1Determining the sequence of calls . . . . . . . . .1549.2.2Browsing on error . . . . . . . . . . . . . . . . .1559.2.3Browsing arbitrary code . . . . . . . . . . . . . .1579.2.4The call stack: traceback(), where, and recover() 1589.2.5Other types of failure . . . . . . . . . . . . . . .9.39.49.5Condition handling158. . . . . . . . . . . . . . . . . . . .1609.3.1Ignore errors with try . . . . . . . . . . . . . . .1609.3.2Handle conditions with tryCatch() . . . . . . . .1629.3.3withCallingHandlers() . . . . . . . . . . . . . . .1659.3.4Custom signal classes . . . . . . . . . . . . . . .1669.3.5Exercises . . . . . . . . . . . . . . . . . . . . . .168Defensive programming . . . . . . . . . . . . . . . . . .1689.4.1Exercises . . . . . . . . . . . . . . . . . . . . . .169Quiz answers . . . . . . . . . . . . . . . . . . . . . . . .170 2015 by Taylor & Francis Group, LLC

xvContentsIIFunctional programming17310 Functional programming10.1 Motivation175. . . . . . . . . . . . . . . . . . . . . . . . .17610.2 Anonymous functions . . . . . . . . . . . . . . . . . . .18110.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . .18310.3 Closures. . . . . . . . . . . . . . . . . . . . . . . . . .18310.3.1 Function factories . . . . . . . . . . . . . . . . .18610.3.2 Mutable state . . . . . . . . . . . . . . . . . . . .18610.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . .18810.4 Lists of functions . . . . . . . . . . . . . . . . . . . . . .18910.4.1 Moving lists of functions to the global environment 19110.4.2 Exercises . . . . . . . . . . . . . . . . . . . . . .10.5 Case study: numerical integration193. . . . . . . . . . . .19310.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . .19611 Functionals11.1 My first functional: lapply()199. . . . . . . . . . . . . . .20111.1.1 Looping patterns . . . . . . . . . . . . . . . . . .20311.1.2 Exercises . . . . . . . . . . . . . . . . . . . . . .20411.2 For loop functionals: friends of lapply() . . . . . . . . .20511.2.1 Vector output: sapply and vapply . . . . . . . . .20511.2.2 Multiple inputs: Map (and mapply) . . . . . . . . .20711.2.3 Rolling computations. . . . . . . . . . . . . . .20911.2.4 Parallelisation . . . . . . . . . . . . . . . . . . . .21211.2.5 Exercises . . . . . . . . . . . . . . . . . . . . . .21311.3 Manipulating matrices and data frames . . . . . . . . .21411.3.1 Matrix and array operations . . . . . . . . . . . .21411.3.2 Group apply . . . . . . . . . . . . . . . . . . . .21611.3.3 The plyr package . . . . . . . . . . . . . . . . . .217 2015 by Taylor & Francis Group, LLC

xviContents11.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . .21811.4 Manipulating lists . . . . . . . . . . . . . . . . . . . . .21911.4.1 Reduce() . . . . . . . . . . . . . . . . . . . . . . .21911.4.2 Predicate functionals . . . . . . . . . . . . . . . .22011.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . .22111.5 Mathematical functionals . . . . . . . . . . . . . . . . .22211.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . .22411.6 Loops that should be left as is. . . . . . . . . . . . . .22411.6.1 Modifying in place . . . . . . . . . . . . . . . . .22511.6.2 Recursive relationships . . . . . . . . . . . . . . .22511.6.3 While loops . . . . . . . . . . . . . . . . . . . . .22611.7 A family of functions. . . . . . . . . . . . . . . . . . .22711.7.1 Exercises . . . . . . . . . . . . . . . . . . . . . .23212 Function operators23312.1 Behavioural FOs . . . . . . . . . . . . . . . . . . . . . .23512.1.1 Memoisation . . . . . . . . . . . . . . . . . . . .23712.1.2 Capturing function invocations . . . . . . . . . .23912.1.3 Laziness . . . . . . . . . . . . . . . . . . . . . . .24212.1.4 Exercises . . . . . . . . . . . . . . . . . . . . . .24312.2 Output FOs. . . . . . . . . . . . . . . . . . . . . . . .24412.2.1 Minor modifications . . . . . . . . . . . . . . . .24512.2.2 Changing what a function does . . . . . . . . . .24612.2.3 Exercises . . . . . . . . . . . . . . . . . . . . . .24812.3 Input FOs. . . . . . . . . . . . . . . . . . . . . . . . .24812.3.1 Prefilling function arguments: partial function application . . . . . . . . . . . . . . . . . . . . . . .24812.3.2 Changing input types . . . . . . . . . . . . . . .24912.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . .25112.4 Combining FOs 2015 by Taylor & Francis Group, LLC. . . . . . . . . . . . . . . . . . . . . .252

xviiContentsIII12.4.1 Function composition . . . . . . . . . . . . . . .25212.4.2 Logical predicates and boolean algebra . . . . . .25412.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . .255Computing on the language25713 Non-standard evaluation25913.1 Capturing expressions . . . . . . . . . . . . . . . . . . .26013.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . .26213.2 Non-standard evaluation in subset . . . . . . . . . . . .26313.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . .26613.3 Scoping issues. . . . . . . . . . . . . . . . . . . . . . .26713.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . .26913.4 Calling from another function. . . . . . . . . . . . . .26913.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . .27213.5 Substitute. . . . . . . . . . . . . . . . . . . . . . . . .13.5.1 Adding an escape hatch to substitute273. . . . . .27613.5.2 Capturing unevaluated . . . . . . . . . . . . .27713.5.3 Exercises . . . . . . . . . . . . . . . . . . . . . .27713.6 The downsides of non-standard evaluation. . . . . . .27813.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . .27914 Expressions28114.1 Structure of expressions . . . . . . . . . . . . . . . . . .28214.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . .28614.2 Names. . . . . . . . . . . . . . . . . . . . . . . . . . .28614.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . .28714.3 Calls. . . . . . . . . . . . . . . . . . . . . . . . . . . .28814.3.1 Modifying a call . . . . . . . . . . . . . . . . . .28914.3.2 Creating a call from its components . . . . . . .290 2015 by Taylor & Francis Group, LLC

xviiiContents14.3.3 Exercises . . . . . . . . . . . . . . . . . . . . . .14.4 Capturing the current call291. . . . . . . . . . . . . . . .29214.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . .29514.5 Pairlists . . . . . . . . . . . . . . . . . . . . . . . . . . .29614.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . .29814.6 Parsing and deparsing . . . . . . . . . . . . . . . . . . .29814.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . .30014.7 Walking the AST with recursive functions. . . . . . .30014.7.1 Finding F and T . . . . . . . . . . . . . . . . . .30114.7.2 Finding all variables created by assignment . . .30214.7.3 Modifying the call tree . . . . . . . . . . . . . . .30714.7.4 Exercises . . . . . . . . . . . . . . . . . . . . . .30915 Domain specific languages15.1 HTML311. . . . . . . . . . . . . . . . . . . . . . . . . . .31215.1.1 Goal . . . . . . . . . . . . . . . . . . . . . . . . .31315.1.2 Escaping. . . . . . . . . . . . . . . . . . . . . .31415.1.3 Basic tag functions . . . . . . . . . . . . . . . . .31515.1.4 Tag functions . . . . . . . . . . . . . . . . . . . .31715.1.5 Processing all tags . . . . . . . . . . . . . . . . .31815.1.6 Exercises . . . . . . . . . . . . . . . . . . . . . .32015.2 LaTeX. . . . . . . . . . . . . . . . . . . . . . . . . . .32015.2.1 LaTeX mathematics . . . . . . . . . . . . . . . .32115.2.2 Goal . . . . . . . . . . . . . . . . . . . . . . . . .32115.2.3 to math . . . . . . . . . . . . . . . . . . . . . . .32215.2.4 Known symbols . . . . . . . . . . . . . . . . . . .32215.2.5 Unknown symbols . . . . . . . . . . . . . . . . .32315.2.6 Known functions . . . . . . . . . . . . . . . . . .32515.2.7 Unknown functions . . . . . . . . . . . . . . . . .32615.2.8 Exercises . . . . . . . . . . . . . . . . . . . . . .328 2015 by Taylor & Francis Group, LLC

xixContentsIVPerformance32916 Performance33116.1 Why is R slow?. . . . . . . . . . . . . . . . . . . . . .16.2 Microbenchmarking332. . . . . . . . . . . . . . . . . . . .33316.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . .33416.3 Language performance. . . . . . . . . . . . . . . . . .33516.3.1 Extreme dynamism . . . . . . . . . . . . . . . . .33516.3.2 Name lookup with mutable environments . . . .33716.3.3 Lazy evaluation overhead . . . . . . . . . . . . .33916.3.4 Exercises . . . . . . . . . . . . . . . . . . . . . .34016.4 Implementation performance . . . . . . . . . . . . . . .34116.4.1 Extracting a single value from a data frame . . .34116.4.2 ifelse(), pmin(), and pmax() . . . . . . . . . . .34216.4.3 Exercises . . . . . . . . . . . . . . . . . . . . . .34416.5 Alternative R implementations . . . . . . . . . . . . . .34417 Optimising code34917.1 Measuring performance . . . . . . . . . . . . . . . . . .35017.1.1 Limitations . . . . . . . . . . . . . . . . . . . . .35417.2 Improving performance. . . . . . . . . . . . . . . . . .35517.3 Code organisation . . . . . . . . . . . . . . . . . . . . .35617.4 Has someone already solved the problem? . . . . . . . .35717.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . .35817.5 Do as little as possible . . . . . . . . . . . . . . . . . . .35917.5.1 Exercises . . . . . . . . . . . . . . . . . . . . . .36517.6 Vectorise. . . . . . . . . . . . . . . . . . . . . . . . . .36617.6.1 Exercises . . . . . . . . . . . . . . . . . . . . . .36817.7 Avoid copies. . . . . . . . . . . . . . . . . . . . . . . .36817.8 Byte code compilation . . . . . . . . . . . . . . . . . . .370 2015 by Taylor & Francis Group, LLC

xxContents17.9 Case study: t-test. . . . . . . . . . . . . . . . . . . . .371. . . . . . . . . . . . . . . . . . . . . . . . .37317.11Other techniques . . . . . . . . . . . . . . . . . . . . . .37517.10Parallelise18 Memory37718.1 Object size . . . . . . . . . . . . . . . . . . . . . . . . .37818.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . .38218.2 Memory usage and garbage collection. . . . . . . . . .38318.3 Memory profiling with lineprof . . . . . . . . . . . . . .38518.3.1 Exercises . . . . . . . . . . . . . . . . . . . . . .38818.4 Modification in place. . . . . . . . . . . . . . . . . . .38918.4.1 Loops . . . . . . . . . . . . . . . . . . . . . . . .39218.4.2 Exercises . . . . . . . . . . . . . . . . . . . . . .39319 High performance functions with Rcpp19.1 Getting started with C 395. . . . . . . . . . . . . . . .39719.1.1 No inputs, scalar output . . . . . . . . . . . . . .39819.1.2 Scalar input, scalar output . . . . . . . . . . . .39919.1.3 Vector input, scalar output . . . . . . . . . . . .39919.1.4 Vector input, vector output . . . . . . . . . . . .40119.1.5 Matrix input, vector output . . . . . . . . . . . .40219.1.6 Using sourceCpp . . . . . . . . . . . . . . . . . .40319.1.7 Exercises . . . . . . . . . . . . . . . . . . . . . .40519.2 Attributes and other classes. . . . . . . . . . . . . . .40619.2.1 Lists and data frames . . . . . . . . . . . . . . .40719.2.2 Functions . . . . . . . . . . . . . . . . . . . . . .40819.2.3 Other types . . . . . . . . . . . . . . . . . . . . .40919.3 Missing values. . . . . . . . . . . . . . . . . . . . . . .40919.3.1 Scalars . . . . . . . . . . . . . . . . . . . . . . . .41019.3.1.1 Integers . . . . . . . . . . . . . . . . . .410 2015 by Taylor & Francis Group, LLC

xxiContents19.3.1.2 Doubles . . . . . . . . . . . . . . . . . .41119.3.2 Strings . . . . . . . . . . . . . . . . . . . . . . . .41119.3.3 Boolean . . . . . . . . . . . . . . . . . . . . . . .41219.3.4 Vectors . . . . . . . . . . . . . . . . . . . . . . .41219.3.5 Exercises . . . . . . . . . . . . . . . . . . . . . .41319.4 Rcpp sugar . . . . . . . . . . . . . . . . . . . . . . . . .41319.4.1 Arithmetic and logical operators . . . . . . . . .41419.4.2 Logical summary functions . . . . . . . . . . . .41419.4.3 Vector views . . . . . . . . . . . . . . . . . . . .41519.4.4 Other useful functions . . . . . . . . . . . . . . .41619.5 The STL. . . . . . . . . . . . . . . . . . . . . . . . . .41619.5.1 Using iterators . . . . . . . . . . . . . . . . . . .41719.5.2 Algorithms . . . . . . . . . . . . . . . . . . . . .41819.5.3 Data structures . . . . . . . . . . . . . . . . . . .41919.5.4 Vectors . . . . . . . . . . . . . . . . . . . . . . .42019.5.5 Sets . . . . . . . . . . . . . . . . . . . . . . . . .42119.5.6 Map . . . . . . . . . . . . . . . . . . . . . . . . .42219.5.7 Exercises . . . . . . . . . . . . . . . . . . . . . .42319.6 Case studies. . . . . . . . . . . . . . . . . . . . . . . .42319.6.1 Gibbs sampler . . . . . . . . . . . . . . . . . . .42419.6.2 R vectorisation vs. C vectorisation . . . . . .42519.7 Using Rcpp in a package. . . . . . . . . . . . . . . . .42819.8 Learning more . . . . . . . . . . . . . . . . . . . . . . .42919.9 Acknowledgments430. . . . . . . . . . . . . . . . . . . . .20 R’s C interface43120.1 Calling C functions from R . . . . . . . . . . . . . . . .43220.2 C data structures434. . . . . . . . . . . . . . . . . . . . .20.3 Creating and modifying vectors. . . . . . . . . . . . .43520.3.1 Creating vectors and garbage collection . . . . .435 2015 by Taylor & Francis Group, LLC

xxiiContents20.3.2 Missing and non-finite values . . . . . . . . . . .43720.3.3 Accessing vector data . . . . . . . . . . . . . . .43920.3.4 Character vectors and lists . . . . . . . . . . . .44020.3.5 Modifying inputs . . . . . . . . . . . . . . . . . .44120.3.6 Coercing scalars . . . . . . . . . . . . . . . . . .44220.3.7 Long vectors . . . . . . . . . . . . . . . . . . . .44220.4 Pairlists . . . . . . . . . . . . . . . . . . . . . . . . . . .44320.5 Input validation445. . . . . . . . . . . . . . . . . . . . . .20.6 Finding the C source code for a functionIndex 2015 by Taylor & Francis Group, LLC. . . . . . . .447451

1IntroductionWith more than 10 years experience programming in R, I’ve had theluxury of being able to spend a lot of time trying to figure out andunderstand how the language works. This book is my attempt to passon what I’ve learned so that you can quickly become an effective Rprogrammer. Reading it will help you avoid the mistakes I’ve made anddead ends I’ve gone down, and will teach you useful tools, techniques,and idioms that can help you to attack many types of problems. In theprocess, I hope to show that, despite its frustrating quirks, R is, at itsheart, an elegant and beautiful language, well tailored for data analysisand statistics.If you are new to R, you might wonder what makes learning such aquirky language worthwhile. To me, some of the best features are: It’s free, open source, and available on every major platform. As aresult, if you do your analysis in R, anyone can easily replicate it. A massive set of packages for statistical modelling, machine learning, visualisation, and importing and manipulating data. Whatevermodel or graphic you’re trying to do, chances are that someone hasalready tried to do it. At a minimum, you can learn from theirefforts. Cutting edge tools. Researchers in statistics and machine learningwill often publish an R package to accompany their articles. Thismeans immediate access to the very latest statistical techniques andimplementations. Deep-seated language support for data analysis. This includes features likes missing values, data frames, and subsetting. A fantastic community. It is easy to get help from experts on the Rhelp mailing list ckoverflow (http://stackoverflow.com/questions/tagged/r), orsubject-specific mailing lists

Published Titles Stated Preference Methods Using R, Hideo Aizaki, Tomoaki Nakatani, and Kazuo Sato Using R for Numerical Analysis in Science and Engineering, Victor A. Bloomfield Event History Analysis with R, Göran Broström Computational Actuarial Science with R, Arthur Charpentier Statistical Computing in C and