R Programming - Tutorialspoint

Transcription

R ProgrammingAbout the TutorialR is a programming language and software environment for statistical analysis, graphicsrepresentation and reporting. R was created by Ross Ihaka and Robert Gentleman at theUniversity of Auckland, New Zealand, and is currently developed by the R DevelopmentCore Team.R is freely available under the GNU General Public License, and pre-compiled binaryversions are provided for various operating systems like Linux, Windows and Mac.This programming language was named R, based on the first letter of first name of thetwo R authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of theBell Labs Language S.AudienceThis tutorial is designed for software programmers, statisticians and data miners who arelooking forward for developing statistical software using R programming. If you are tryingto understand the R programming language as a beginner, this tutorial will give youenough understanding on almost all the concepts of the language from where you can takeyourself to higher levels of expertise.PrerequisitesBefore proceeding with this tutorial, you should have a basic understanding of ComputerProgramming terminologies. A basic understanding of any of the programming languageswill help you in understanding the R programming concepts and move fast on the learningtrack.Copyright & Disclaimer Copyright 2016 by Tutorials Point (I) Pvt. Ltd.All the content and graphics published in this e-book are the property of Tutorials Point (I)Pvt. Ltd. The user of this e-book is prohibited to reuse, retain, copy, distribute or republishany contents or a part of contents of this e-book in any manner without written consentof the publisher.We strive to update the contents of our website and tutorials as timely and as precisely aspossible, however, the contents may contain inaccuracies or errors. Tutorials Point (I) Pvt.Ltd. provides no guarantee regarding the accuracy, timeliness or completeness of ourwebsite or its contents including this tutorial. If you discover any errors on our website orin this tutorial, please notify us at contact@tutorialspoint.comi

R ProgrammingTable of ContentsAbout the Tutorial . iAudience . iPrerequisites . iCopyright & Disclaimer. iTable of Contents . ii1.R – OVERVIEW . 1Evolution of R . 1Features of R . 12.R – ENVIRONMENT SETUP . 3Try it Option Online . 3Local Environment Setup. 33.R – BASIC SYNTAX . 6R Command Prompt . 6R Script File . 6Comments . 74.R – DATA TYPES . 8Vectors . 10Lists . 10Matrices . 11Arrays. 11Factors . 12Data Frames . 125.R – VARIABLES . 14Variable Assignment . 14ii

R ProgrammingData Type of a Variable . 15Finding Variables . 15Deleting Variables . 166.R – OPERATORS . 18Types of Operators . 18Arithmetic Operators . 18Relational Operators . 20Logical Operators . 21Assignment Operators. 23Miscellaneous Operators . 247.R – DECISION MAKING. 26R - If Statement . 27R – If.Else Statement . 28The if.else if.else Statement . 29R – Switch Statement . 308.R – LOOPS. 33R - Repeat Loop . 34R - While Loop . 35R – For Loop . 36Loop Control Statements. 37R – Break Statement. 38R – Next Statement . 399.R – FUNCTION . 42Function Definition . 42Function Components . 42Built-in Function . 42iii

R ProgrammingUser-defined Function . 43Calling a Function . 43Lazy Evaluation of Function . 4610. R – STRINGS . 47Rules Applied in String Construction . 47String Manipulation . 4811. R – VECTORS. 53Vector Creation . 53Accessing Vector Elements . 55Vector Manipulation . 5512. R – LISTS . 59Creating a List. 59Naming List Elements . 60Accessing List Elements . 60Manipulating List Elements . 61Merging Lists . 62Converting List to Vector . 6313. R – MATRICES . 66Accessing Elements of a Matrix . 67Matrix Computations . 6814. R – ARRAYS . 71Naming Columns and Rows . 72Accessing Array Elements . 72Manipulating Array Elements . 73Calculations Across Array Elements . 74iv

R Programming15. R – FACTORS . 76Factors in Data Frame . 76Changing the Order of Levels . 77Generating Factor Levels . 7816. R – DATA FRAMES . 80Extract Data from Data Frame . 82Expand Data Frame . 8417. R – PACKAGES. 8618. R – DATA RESHAPING . 89Joining Columns and Rows in a Data Frame . 89Merging Data Frames . 91Melting and Casting . 92Melt the Data . 93Cast the Molten Data . 9419. R – CSV FILES . 96Getting and Setting the Working Directory . 96Input as CSV File . 96Reading a CSV File . 97Analyzing the CSV File . 97Writing into a CSV File . 10020. R – EXCEL FILE . 102Install xlsx Package . 102Verify and Load the "xlsx" Package . 102Input as xlsx File . 102Reading the Excel File . 103v

R Programming21. R – BINARY FILES . 105Writing the Binary File . 105Reading the Binary File . 10622. R – XML FILES . 108Input Data . 108Reading XML File . 110Details of the First Node . 112XML to Data Frame . 11423. R – JSON FILE . 115Install rjson Package . 115Input Data . 115Read the JSON File . 115Convert JSON to a Data Frame . 11624. R – WEB DATA . 11825. R – DATABASES. 120RMySQL Package . 120Connecting R to MySql . 120Querying the Tables . 121Query with Filter Clause . 121Updating Rows in the Tables . 122Inserting Data into the Tables . 122Creating Tables in MySql . 122Dropping Tables in MySql . 12326. R – PIE CHARTS . 124Pie Chart Title and Colors . 127vi

R ProgrammingSlice Percentages and Chart Legend . 1303D Pie Chart . 13327. R – BAR CHARTS . 135Bar Chart Labels, Title and Colors . 136Group Bar Chart and Stacked Bar Chart . 13728. R – BOXPLOTS. 140Creating the Boxplot . 141Boxplot with Notch . 14229. R – HISTOGRAMS. 144Range of X and Y values . 14530. R – LINE GRAPHS . 147Line Chart Title, Color and Labels . 148Multiple Lines in a Line Chart . 14931. R – SCATTERPLOTS . 151Creating the Scatterplot . 152Scatterplot Matrices . 15332. R – MEAN, MEDIAN & MODE . 155Mean. 155Applying Trim Option . 156Applying NA Option . 156Median . 157Mode . 15733. R – LINEAR REGRESSION . 159Steps to Establish a Regression . 159lm() Function . 160vii

R Programmingpredict() Function . 16134. R – MULTIPLE REGRESSION . 164lm() Function . 164Example . 16435. R – LOGISTIC REGRESSION . 167Create Regression Model . 16836. R – NORMAL DISTRIBUTION . 170dnorm() . 170pnorm() . 173qnorm() . 176rnorm(). 17937. R – BINOMIAL DISTRIBUTION . 181dbinom() . 181pbinom() . 184qbinom() . 184rbinom() . 18438. R – POISSON REGRESSION . 18639. R – ANALYSIS OF COVARIANCE . 18940. R – TIME SERIES ANALYSIS . 192Different Time Intervals . 193Multiple Time Series . 19441. R – NONLINEAR LEAST SQUARE . 19642. R – DECISION TREE . 200Install R Package . 200viii

R Programming43. R – RANDOM FOREST . 203Install R Package . 20344. R – SURVIVAL ANALYSIS . 20645. R – CHI SQUARE TEST . 211ix

1. R – OverviewR ProgrammingR is a programming language and software environment for statistical analysis, graphicsrepresentation and reporting. R was created by Ross Ihaka and Robert Gentleman at theUniversity of Auckland, New Zealand, and is currently developed by the R Development CoreTeam.The core of R is an interpreted computer language which allows branching and looping as wellas modular programming using functions. R allows integration with the procedures written inthe C, C , .Net, Python or FORTRAN languages for efficiency.R is freely available under the GNU General Public License, and pre-compiled binary versionsare provided for various operating systems like Linux, Windows and Mac.R is free software distributed under a GNU-style copy left, and an official part of the GNUproject called GNU S.Evolution of RR was initially written by Ross Ihaka and Robert Gentleman at the Department ofStatistics of the University of Auckland in Auckland, New Zealand. R made its first appearancein 1993. A large group of individuals has contributed to R by sending code and bug reports.Since mid-1997 there has been a core group (the "R Core Team") who can modify theR source code archive.Features of RAs stated earlier, R is a programming language and software environment for statisticalanalysis, graphics representation and reporting. The following are the important features ofR: R is a well-developed, simple and effective programming language which includesconditionals, loops, user defined recursive functions and input and output facilities.R has an effective data handling and storage facility,R provides a suite of operators for calculations on arrays, lists, vectors and matrices.R provides a large, coherent and integrated collection of tools for data analysis.R provides graphical facilities for data analysis and display either directly at thecomputer or printing at the papers.10

R ProgrammingAs

R is a well-developed, simple and effective programming language which includes conditionals, loops, user defined recursive functions and input and output facilities. R has an effective data handling and storage facility, R provides a suite of operators