Getting Started With Text Miner - SAS Support Communities

Transcription

Ask the ExpertSAS Text Miner: Getting StartedCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Ask the ExpertSAS Text Miner: Getting StartedPresenter: Twanda BakerSenior Associate Systems EngineerSAS Customer Loyalty TeamQ&A: Melodie RushSenior Analytical EngineerSAS Customer Loyalty TeamCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 1sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

GoalsIncrease awareness of and comfort withcapabilities in SAS Text Miner Share resources for learning more Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Today’s AgendaThis presentation will demonstrate thebasic steps for getting started using SAS Text Miner, such as how to: Parse test data Filter text data Analyze text data including topicdiscovery and cluster analysis Use text mining results as input topredictive modelingCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 2sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

SAS Text MinerPrerequisites for this sessionThis example assumes basic knowledge of using the interface inSAS Enterprise Miner and how to create Data Sources andDiagrams.For an introduction or review, visit these resources. Six part video tutorial on YouTube “Getting Started with SAS Enterprise Miner ” in the Ask theExpert series Review the free downloadable step-by-step tutorial.Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Is there valuableinformation “lockedaway” in yourunstructured data?Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 3sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

Unstructured Text: Where is it?Call Center NotesAssociate CommentsSurvey FeedbackResearch & PublicationsClaims & Case NotesLive ChatFactory/Tech’n NotesHR dataMedical/Health RecordsContracts & ApplicationsOnline ForumsBlogsConsumer ReviewsOnline NewsSocial NetworksCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .What if you could Discover new insightsfrom large text data sourcesExtract key patterns fromtext data to predict the futureDiscover current topics about yourproducts from customer opinionsFind patterns within customer feedback,that predicts good interest in upsellopportunitiesDetect anomalies from usual topicsdescribed in text reports,text applications or feedbackFind patterns in reports that may seemto predict/ relate to suspicious behaviorUnderstand previously unknown issues/concerns, from citizen discussions ontwitter/ forumsExtract key opinions from citizenfeedback to forecast citizen sentimentsin the near futureCustomersFraudPublic OpinionCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 4sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

Where is Text Mining used?Text Mining hasnumerousapplications inany industryGovernmentFinanceInsuranceDetect fraudulent activity. Spotemerging trends and publicconcerns.Retention of current customerbase using call centertranscriptions or transcribedaudio. Identification of potentiallyfraudulent activities.Identify fraudulent claims.Track competitiveintelligence.Brand e SciencesIdentify the most profitablecustomers and the underlyingreasons for their loyalty.Brand managementReduce time to detect root causeof product issues.Identify trends in marketsegments.Help prevent churn and suggest upsell/cross-sell opportunities forindividual customers.Identify adverse events.Recommend appropriateresearch materials.Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .TEXT MININGSAS TEXT ANALYTICSImport and explore textual datato uncover valuable patterns &themes, and incorporate textinto predictive modelsCONTEXTUALANALYSIS Information Retrieval Automatic TopicDetection ContentCategorization Entity/Fact Extraction Sentiment AnalysisSAS Text AnalyticsIntegrate structured andunstructured data forenhanced: Forecasting Optimization Predictive Modeling Network AnalysisCo p y ri g2012,h t S SASA S I n Istnsitute I n c. IAnc.l l riAllg h t srightsreserv ed. erved.CopyrighttituteresPage 5INTEGRATEDANALYTICSsas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

SAS Text Miner SAS Text Miner is an add-on product to SAS Enterprise Miner This adds the capabilities of analyzing unstructured data to the broad set oftechniques in Enterprise Miner .The interface is Enterprise Miner – when Text Miner is licensed, there is anadditional tab of tools.Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .How does Text Mining work?Exploring & Discovering Insights1. Input text messages –e.g. twitter data, reports,email, news, forummessages2. Parse & explore Text Data –breakdown text and explore relationshipsof key concepts such as persons,places, organizations Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 6sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.3. Discover Topics – clusterdocuments of similar contentand describe them withimportant key words

How does Text Mining work?Discover patterns for predictive modeling1. Input text messageswith relevant structureddata –e.g. email, call centernotes, applications2. Parse Text Data and DiscoverTopics – Break down text intostructured data, group messagesof similar content3. Predictive Modeling with text data– text data input into models mayprovide reliable info to predictoutcome & behaviorCustomerdataPredict activity that is likelyfraudulent Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .What can we discover?Discover relationships betweenconcepts described in largecorpus of text data –how are persons, places,organizations related?Discover topics mentioned in text data–what are main topics mentioned?What are the rare topics?Discover patterns related tostructured data –e.g. how is feedback related tocustomer purchase behavior?Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 7sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

SAS Text MinerCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .SAS TEXT MINER – ANALYTICAL WORKFLOWText MiningModel with Structuredand Unstructured DataRaw DataCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 8sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

Text Mining Process FlowsCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Text Mining Process FlowsStart with a table that contains either:- Documents saved as a variable (column)- A column that points to physical text filesCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 9sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

Example Input DataVariable contains full textCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Example Input DataVariable contains pointer to text fileCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 10sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

Importing TextIf your data isn’t already in a SAS Table The Text Import node enables you to create data sets dynamically from filescontained in a directory or from the Web.The Text Import node takes an import directory containing text files inpotentially proprietary formats such as MS Word and PDF files as input.The tool traverses this directory and filters or extracts the text from the files,places a copy of the text in a plain text file, and a snippet (or possibly evenall) of the text in a SAS data set.Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Text Mining Process FlowsApply natural language processing algorithms to parse thedocuments and quantify information about the terms in thecorpus.Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 11sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

Text Parsing Node Tokenization - break sentences or documents into termsPart of speech identification (noun, verb, etc.)Stemming - identify the root form of a word (run, runs, running, ran, etc.)SynonymsRemove low-information words such as a, an, and the (stop list)Identify Standard and Custom Entities (names, places, etc.) Multiword terms or phrases (“blue screen of death”)Import custom entities, facts, and events as defined in SAS Enterprise ContentCategorization (ECC)Include negation entities from SAS ECC for Sentiment AnalysisCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Text Mining Process FlowsPerform spell-checking and refine synonym lists. Discoverrelated concepts using Concept Linking. Perform full text search.Subset documents and/or terms for further analysis.Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 12sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

Text Mining Process FlowsAnalyze the documents to create topics and assign eachdocument to one or more topics. In addition to derived topics,users can add their own topic definitions.Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Text Mining Process FlowsAnalyze the documents to create clusters and assign eachdocument to a single cluster.Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 13sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

Text Mining Process FlowsClusters can be further explored using the Segment Profile nodeto identify factors that differentiate data segments from thepopulation.Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Text Mining Process Flows: PredictionSeveral methods are available to use the unstructured data tocreate predictions.Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 14sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

SAS Text MinerExample Vaccine Adverse Events (VAERS)Publicly available dataset from the U.S. Department of Health and HumanServices (HHS) 25,000 patient records, including physician/patient commentsNot representative of all recipients of vaccinesPredict hospitalizationsThis example is covered in detail in “Getting Started with SAS Text edoc/txtminer/index.htmlCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .SAS Text MinerVAERS Example Symptom Text – text variable containing the reported symptomsSerious – binary flag that is used as a target to predict serious side effectsthat resulted in hospitalizations, disability or life-threatening event16 vaccine flag variables for 16 most common vaccinesAge and gender of patientNominal variables to indicate by whom vaccine was administered and how itwas fundedPediatric flagVaccine count – number of vaccines administered in visitCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 15sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

SummaryCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .SAS TEXT MINER – ANALYTICAL WORKFLOWText MiningModel with Structuredand Unstructured DataRaw DataCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 16sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

ReviewText Mining DiscoveryCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .ReviewText Mining PredictionCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 17sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

Learning MoreCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Potential Next Steps Work through the example in “Getting Started with SAS TextMiner” - Both the data and the documentation are available oc/txtminer/index.html Contact SAS Technical Support if you get stuck There is no charge for this – it is included in your SAS softwarelicense.http://support.sas.com/techsup/Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 18sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

SAS Text c/txtminer/index.htmlClick tabs for other versionsCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Links SAS Text Miner Documentation and Tutorial SAS Technical Support http://support.sas.com/SAS Text Miner Technical Forum (Join Today!) https://support.sas.com/edu/prodcourses.html?code TM&ctry USSupport Website http://support.sas.com/techsup/SAS Text Miner Training ity/support-communities/sas data mining and text miningSAS Customer Loyalty http://support.sas.com/contact/customerloyalty/Co p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 19sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

Learn SAS with SAS EducationSAS Education will support you in continual learning to grow your career. SAS Training Coursessupport.sas.com/training Get SAS Certifiedsupport.sas.com/certify SAS Bookssupport.sas.com/booksContact SAS Training Customer Service(800) 727-0025 or training@sas.comCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Questions?Thank you for your time and attention!sas.comCo p y ri g h t S A S I n st i t u te I n c. A l l ri g h t s reserv ed .Page 20sas.comCopyr i ght SAS I nsti tute I nc . Al l r i ghts reser ved.

SAS Text Miner is an add-on product to SAS Enterprise Miner . SAS Text Miner Documentation and Tutorial