Twitter Data Analysis With R - Dicook.github.io

Transcription

Twitter Data Analysis with RYanchang ZhaoRDataMining.comMaking Data Analysis Easier – Workshop Organised by the Monash BusinessAnalytics Team (WOMBAT 2016), Monash University, Melbourne19 February 20161 / 40

OutlineIntroductionTweets AnalysisExtracting TweetsText CleaningFrequent Words and Word CloudWord AssociationsTopic ModellingSentiment AnalysisFollowers and Retweeting AnalysisFollower AnalysisRetweeting AnalysisR PackagesReferences and Online Resources2 / 40

TwitterIAn online social networking service that enables users to sendand read short 140-character messages called “tweets”(Wikipedia)IOver 300 million monthly active users (as of 2015)ICreating over 500 million tweets per day3 / 40

RDataMining Twitter AccountIII@RDataMining: focuses on R and Data Mining580 tweets/retweets (as of February 2016)2,300 followers4 / 40

Techniques and ToolsITechniquesIIIIIText miningTopic modellingSentiment analysisSocial network analysisToolsIITwitter APIR and its ph5 / 40

ProcessIExtract tweets and followers from the Twitter website with Rand the twitteR packageIWith the tm package, clean text by removing punctuations,numbers, hyperlinks and stop words, followed by stemmingand stem completionIBuild a term-document matrixIAnalyse topics with the topicmodels packageIAnalyse sentiment with the sentiment140 packageIAnalyse following/followed and retweeting relationships withthe igraph package6 / 40

OutlineIntroductionTweets AnalysisExtracting TweetsText CleaningFrequent Words and Word CloudWord AssociationsTopic ModellingSentiment AnalysisFollowers and Retweeting AnalysisFollower AnalysisRetweeting AnalysisR PackagesReferences and Online Resources7 / 40

Retrieve Tweets## Option 1: retrieve tweets from Twitterlibrary(twitteR)library(ROAuth)## Twitter authenticationsetup twitter oauth(consumer key, consumer secret, access token,access secret)## 3200 is the maximum to retrievetweets - userTimeline("RDataMining", n 3200)## Option 2: download @RDataMining tweets from RDataMining.comurl - s-20160212.rds"download.file(url, destfile "./data/RDataMining-Tweets-20160212.rds")## load tweets into Rtweets - witter Authentication with OAuth:Section 3 of http://geoffjentry.hexdump.org/twitteR.pdf8 / 40

(n.tweet - length(tweets))## [1] 448# convert tweets to a data frametweets.df - twListToDF(tweets)# tweet #190tweets.df[190, c("id", "created", "screenName", "replyToSN","favoriteCount", "retweetCount", "longitude", "latitude", "text")]##idcreated screenName re.## 190 362866933894352898 2013-08-01 09:26:33 RDataMining.##favoriteCount retweetCount longitude latitude## 19099NANA##.## 190 The R Reference Card for Data Mining now provides lin.# print tweet #190 and make text fit for slide widthwriteLines(strwrap(tweets.df text[190], 60))## The R Reference Card for Data Mining now provides links to## packages on CRAN. Packages for MapReduce and Hadoop added.## http://t.co/RrFypol8kw9 / 40

Text Cleaninglibrary(tm)# build a corpus, and specify the source to be character vectorsmyCorpus - Corpus(VectorSource(tweets.df text))# convert to lower casemyCorpus - tm map(myCorpus, content transformer(tolower))# remove URLsremoveURL - function(x) gsub("http[ [:space:]]*", "", x)myCorpus - tm map(myCorpus, content transformer(removeURL))# remove anything other than English letters or spaceremoveNumPunct - function(x) gsub("[ [:alpha:][:space:]]*", "", x)myCorpus - tm map(myCorpus, content transformer(removeNumPunct))# remove stopwordsmyStopwords - c(setdiff(stopwords('english'), c("r", "big")),"use", "see", "used", "via", "amp")myCorpus - tm map(myCorpus, removeWords, myStopwords)# remove extra whitespacemyCorpus - tm map(myCorpus, stripWhitespace)# keep a copy for stem completion lat

Process I Extract tweets and followers from the Twitter website with R and the twitteR package I With the tm package, clean text by removing punctuations, numbers, hyperlinks and stop words, followed by stemming and stem completion I Build a term-document matrix I Analyse topics with the topicmodels package I Analyse sentiment with the sentiment140 package I Analyse following/followed and .