Deciphering Your Ustomer's Voice With Speech Analytics

Transcription

/Deciphering yourcustomer’s voice withspeech analyticsWhitepaper

Table of Contents1. Introduction . . . 32. What is speech analytics? . . . . .42.1 Challenges analyzing speech data . . . . . 42.2 Algorithms for transforming speech to structured data. . . . . 52.3 Key components of a speech recognition system for analytics . . 52.4 The technology comparison . . . . . . . . 83. Organizational use cases for speech analytics . . . . . 94. Solution landscape: Vendors, product and their market share . . . . . .105. Recommended architecture for speech analytics . . . . 126. Case study: Predicting NPS in health insurance . . . . . 137. Conclusion .14

1. Introduction/Deciphering your customer’s voice withspeech analyticsFor most retail businesses, customerinteraction via call centers is a verysignificant communication channel.Organizations typically receive thousandsof customer calls every day.According to an industry report, over 56million hours of conversations (nearly 420billion words) are spoken a day in callcenters worldwide.If the audio data thus collected can beaggregated and analyzed, it can yieldquality insights into customer expectations,preferences, service issues & productusage.While speech analytics is not a newtechnology to the market, most of thebusiness executives are still skepticalabout the value it can add.This whitepaper aims to illustrate basictechnologies used in speech analytics,their use cases and how return oninvestment9 from speech analyticssoftware can be maximized.page 3

2. What is speechanalytics?Speech analytics is a powerful tool foranalyzing recorded calls, structuringcustomer interactions and gaining insightin the hidden information. It can be usedfor audio mining, speech categorization,intelligence extraction, decision making,monitoring agent performance.If applied correctly and used effectivelyspeech analytics can help improve servicequality, reduce operating expenses, boostrevenue, and reduce customer attrition. Ifintegrated well with overall strategy it canhelp businesses drive product & processinnovation leading to significant marketdifferentiation. However there aresignificant challenges in transformingspeech data to a structured form whichcan be subjected to further analysis.2.1 Challenges analyzing speech dataThe speakers differ in speaking style andspeed, gender, age, dialects, physicalattributes (such as vocal tract). Anyspeech recognition system has to take allthese features into consideration. Forexample “service provider” may berecognized as “serve his provide her”./Deciphering your customer’s voice withspeech analytics Humans in addition to speech, alsocommunicate via facial expressions,emotions, postures and eye movements,and these are missed by an automaticrecognition system (ASR). While interacting in the real timeenvironment humans encounter lot ofunwanted sounds called noise and theseneed to be filtered from the speechsignals. Homophones (i.e. words that arepronounced the same but differ inmeaning. For e.g. two & too) and wordboundary ambiguities pose a majorproblem to speech recognition systems. The acoustic waves change with theproperties of the channels used. The continuity in speech leads toproblems related to identification of wordboundaries. Grammatically spoken language is verydifferent from written languages.page 4

2. What is speechanalytics?To fully leverage the informationencapsulated in customer calls, you needto transform the interaction data in theaudio files to a more structured data whichcan be queried by analysts and can beconsumed by sophisticated machinelearning algorithms.The different speech analytics approachessuch as phonetic indexing, speech-to-text(LVCSR) and direct phrase extractionachieve this by deploying a speechrecognition system which comprises ofsome or all of the following components:an acoustic model, grammars, a languagemodel and recognition algorithms.Unstructured dataFigure 1 gives the complete high-level flowof the process for conversion ofunstructured voice data to a structured andmore useful information.“Transfer me tothe head office”Structured outputPhonetic indexingSpeech-to-text(LVCSR)Audio and textinteractionsThese components are standard in anyspeech analytics software today. Butunderstanding them is important as thequality of speech analytics is dependenton how these components are configuredand what algorithms they use.Direct phraseextractionValuable insights for continuousimprovement2.2 Algorithms for transforming speechto structured data/Deciphering your customer’s voice withspeech analyticsRoot Cause AnalysisEmotion DetectionTalk AnalysisScript AdherenceSpeech analytics algorithmsFigure 1: Flow for transformation of unstructured to structured data in speech analyticspage 5

2. What is speechanalytics?/Deciphering your customer’s voice withspeech analytics2.3 Key components of a Speechrecognition system for analyticsAcoustic model: an acoustic modelrepresents the relationship between anaudio signal and the phonemes or otherlinguistic units that make up speech. Anacoustic model contains statisticalrepresentations of each of the distinctsounds that makes up a word.An acoustic model is created using aspeech corpus and training algorithms thatcreate the statistical representations calledHidden Markov Models (HMMs), for eachphoneme in a language.Each phoneme has its own HMM, andHMM is one most common type ofacoustic models.Figure 2: Different components of a speech recognition systemThe different components of a speechrecognition system are shown in Figure 2.page 6

2. What is speechanalytics?/Deciphering your customer’s voice withspeech analyticsLanguage-specific acoustic models areused directly in the Phonetics indexingapproach being used for speechanalytics. The basic recognition unit forthis approach is a phoneme and it is adictionary independent approach. It allowsusers to query phonetic strings andperform orthographic search using apronunciation model.Language model: a statistical languagemodel (SLM) consists of a list of wordswith their probability of occurrence. It isused to restrict search by limiting thenumber of possible words that need to beconsidered at any one point in the search.This results in faster execution and higheraccuracy. Tri-grams are the mostcommonly used LMs in ASR. Theprobabilities of n-grams help indetermining which n-gram is moreprobable in comparison to the other similarn-grams. For example, the P (I saw a van) P (eyes awe of an) for correctrecognition of the phrase. This iscalculated using a language model similarto what is shown below.page 7

2. What is speechanalytics?The large-vocabulary continuous speechrecognition (LVCSR, also known asspeech-to-text (STT) or word spotting)speech analytics approach uses bothlanguage-specific acoustic models andlanguage models. The basic unit for thisapproach is a set of words, and these aregenerally bi-grams or tri-grams. In order tomap words to phonetic forms it also uses apronunciation model or a dictionary. This isa dictionary dependent approach.Recognition algorithms: perform speechrecognition based on written grammar.This grammar describes the possiblepatterns of words. The recognitiongrammar is generally given using two filesthe ‘grammar’ and the ‘vocabulary’ file./Deciphering your customer’s voice withspeech analyticsS: NS B HMM SENT NS ES: NS B SENT NS ESENT: TAKE V HOUSE PLEASESENT: TAKE V HOUSESENT: HOUSE PLEASESENT: HOUSEFRUIT: NUM HOUSE NFRUIT: HOUSE N 1The vocabulary file contains definition ofeach word defined in the ‘grammar’ file.The partial ‘house.voca’ file for the‘house.grammar’ file is given below.The ‘grammar’ file defines category-levelsyntax, whereas, the ‘vocabulary’ filedefines word candidates in each category,with its pronunciation information. Forillustration, consider the sentence “I’ll takeone house please”. The ‘house.grammar’in the BNF would be as shown below. ‘S’indicates the sentence start symbol. Therewrite rules are defined using ‘:’ symbol.page 8

2. What is speechanalytics?2.4 The technology comparisonThere has been a long-standing debate onthe merits of these approaches. Severalfactors are considered while choosing thetechnology. There is no one best approachthat may be pointed out, however, theapproach may be chosen based on therequirements as discussed below.Targeted listening or calls of interest: If youaim to listen to calls containing specifickeywords, then a system with dictionarydependent (LVCSR) may be preferable.The dictionary dependent approaches canrecognize words that are already in thelexicon.Out of vocabulary (OOV) words: If it isvery likely that you encounter new wordsin your search domain, then dictionaryindependent approach should bepreferred. OOV handling is a major issuein the dictionary dependent approaches.Audio processing: The audio needs to bereprocessed for new words added to thedictionary. This proves to be a timeconsuming and a costly process, as/Deciphering your customer’s voice withspeech analyticsexperts are also needed for entering wordsin the dictionary. The audio is processedjust once in the dictionary independentapproaches.Speech transcripts: The dictionaryindependent approaches (phonetic), dueto absence of a language model cannot beused to generate a meaningfulorthographic transcript of speech.Precision vs. Recall: A higher precisionrate may be obtained with a dictionarydependent system on words that arealready in the dictionary, but the recall ratesuffers as there are always some missingwords. This also results in higher errorrates. Phonetic approaches have lowprecision but higher recall.Table 1: Comparison of the approachespage 9

3. Organizational usecases for speech analyticsSome of the use cases of speech analyticsin a call center could be:Tone and sentiment analysis: Speechanalytics systems can analyze the toneand detect the sentiment of voice. Tonecan also signify age, this can be used todetermine the effectiveness of a marketingcampaign on a specific age segment./Deciphering your customer’s voice withspeech analyticsmay be identified, segmented, andimplemented using specific businessprocesses.Decision making: The insights gainedfrom the information may help in makingdecisions and implementing new policiesfor product & service improvement.Talk and silence pattern analysis: Thetalk and the silence patterns can beanalyzed to measure emotions and levelsof satisfaction. A set of user-definedphrases can be used for detecting agentactions. Thus, it helps in identifying andprioritizing what needs immediateattention.Agents’ performance monitoring:Monitoring agents’ interactions withcustomers can easily detect proactiveagents and also agents not successful insatisfying customers. This analysis may beused for training the agents for improvingtheir performance.Call segmentation: Certain calls may bedifficult for agents to handle. Such callspage 10

4. Solution landscape:Vendors, products andtheir market share/Deciphering your customer’s voice withspeech analyticsAccording to DMG consulting andContactBabel1, only 24% of theorganizations are currently using a speechanalytics solution.However, interest in speech analytics isgrowing and the market for this willcontinue to expand over the next severalyears. DMG consulting estimates thisgrowth to be 18% in 2015, and 16% in2016 and 2017. The market shares ofsome of the leaders of speech analyticsorganizations is summarized in table 2.Some other speech analytic vendors withtheir market share (%) are CallMiner(10.7%), Avaya (1.7%), Mattersight (1.7%),Calabrio (0.6%), and InteractiveIntelligence (0.5%). An overall marketshare of top ten vendors is shown in figure3.4.21.70.61.70.68.76.4Market Share (%)3010.713.124.51 Speech Analytics Product and Market Report Reprint - DMGNICEVerintHP/AutonomyCallMinerConsulting LLC (as of May, 2014)NexidiaGenesys (UTOPY)Avaya (Aurix)MattersightUptivityCalabrioOtherspage 11

5. Recommended architecturefor speech analytics/Deciphering your customer’s voice withspeech analyticsWe propose a 2-phase architecture shownin figure 4, which allows data discoveryand predictive analytics on the voice calldata. This architecture aims at classifyingcontexts and issues being constantlytalked about on calls and predicting thecustomer behavior.The R Analytics Engine (RAE) takesphonetics or text as input. The engineanalyzes the input and models it based onthe key factors, which proves helpful inderiving customer segment basedstrategic insights and predicting the KPI’sand the behavior.Derive customersegment basedstrategic insightsFigure 4: Recommended architecture for speech analyticspage 12Predict KPIs andbehaviorSince the insights generated arecontextual these can help productmanagers, service units and other units toderive strategic inputs.

6. Case study: PredictingNPS in health insuranceNet Promoter Score (NPS) is a metricfrequently being used to measure anorganizations’ performance fromcustomers’ perspective, and to measurethe loyalty existing between two entities.The entity could be a company-customeror an employer-employee. NPS iscalculated based on the score provided bya customer on a scale of 0-10 for thequestion “How likely is it that you wouldrecommend my company to someone?”This scale segregates customers intothree categories based on the scoresprovided; the Promoters (9-10), theDetractors (0-6) and the Passives (7-8).NPS is computed as the differencebetween the % of Promoters and theDetractors on the basis of sample surveyswhich are conducted by organizationsperiodically. Predicting NPS on the basisof customer call & speech data is one ofthe most compelling use cases of speechanalytics. Rather than waiting for nextsurvey, speech analytics can be used topredict probable promoters and detractorswhich can help organizations estimatetheir overall NPS score in real time basis.In case of declining NPS score thuspredicted, strategic changes can be donein business and service to maintain thedesired results without waiting for nextsurvey results.Objective: Our objective of this case studywas to predict promoters, detractors andpassives from the call data available froma large health insurance provider in US.The data available included customer’sbehavioral data from CRM, the callcharacteristics (hold times, FCRs,recommended CSA etc.) and texttranscripts generated using the speechanalytics engine.Challenges: The text transcripts asprovided had lot of noise. The data had tobe preprocessed and cleaned by our R/Deciphering your customer’s voice withspeech analyticsAnalytics engine to remove noise &convert it to a more structured form whichwas further amenable for analysis andmodelling. We typically found 2-grams & 3grams having more information gain ratiosthan uni-grams and hence betterpredictors of class categories.Results: The impact of call category, callduration and other related factors wasanalyzed on the NPS score. It was foundthat the FCRs & the hold time had asignificant impact on the NPS score.Addition of 2-gram & 3-grams extractedfrom text transcripts using speechanalytics improved the NPS predictionsignificantly by 8.6%. We were also able topredict the NPS performance at agent, callcenter and campaign level. The modelresults were used to drive specific trainingagenda for call center agents.2. ge 13

7. Conclusion/Deciphering your customer’s voice withspeech analyticsThe information shared and exchangedthrough spoken interaction data largelyremains untapped. Speech analytics canprovide a solution by collecting this dataand provide insight on these interactions.Speech analytics can prove to be arevolutionary approach in measuring thecustomers’ emotions, context and theintent. Customers generally do not bothertelling people about a great customerservice experience but the same customermakes sure that almost everyone knowsabout a bad one.Therefore, business success dependsheavily on the customer experience and soenhancing this experience is critical forsuccess of any business. The R AnalyticsEngine (RAE) can help accelerate datadiscovery and mining of speech data.page 14

About the authorsRenu Balyan is a Lead Analyst (DataScience) with R Systems International Ltd.Her areas of interest include dataanalytics, machine learning, machinetranslation and information extraction. Shehas published 14 papers in national andinternational conferences & journals. Sheis currently pursuing her PhD from IITDelhi. She has worked as an intern withDublin City University, Ireland. She hasalso worked as a research fellow andproject engineer with Centre forDevelopment of Advanced Computing,Noida & worked on various projectsrelated to natural language processing(NLP) for nearly 6 years.Praveen Pathak is the Analytics PracticeHead for R Systems International Ltd. Withover 14 years of consulting, in-house &offshore analytics delivery experience,Praveen focuses on providing R System’sclients best in class analytics solutions andservices. He has extensive hands-on aswell as leadership experience in analytics,information management, predictive/Deciphering your customer’s voice withspeech analyticsmodelling, optimization and big datatechnologies to draw data driven insightsand help address business challenges. Hisinterests includes Artificial Intelligence,Neural Networks & High performancecomputing. ComputarisComputaris, an R Systems business,provides specialist BSS technicalconsultancy, software development andsystem integration services for thetelecommunications industry in Europe,North America and South East Asia. Ourcompany offers the highest level ofexpertise in the area of real-time ratingand charging, messaging, provisioning,mediation, subscriber data management,mobile broadband data policymanagement, and loyalty and churnmanagement.For more information, please visitwww.computaris.com. . R SystemsR Systems is a leading OPD and ITServices company, which caters toFortune 1000, Government, and Mid-sizedorganizations, worldwide. The company ishailed as an industry leader with some ofthe world’s highest quality standards,including SEI CMMI Level 5, PCMM Level5, ISO 9001:2008, and ISO 27001:2005certifications. With a rich legacy spreadover two decades, the company generatesvalue that helps organizations transcend tohigher levels of efficiency and growth.Quite like the Oyster delivering the Pearl.For more information, visitwww.rsystems.comGot any questions? Contact us:Tel: 44 20 7193 9189/Email: marketing@computaris.compage 15

/Thankyou!/[ 44]20.7193.9189www.computaris.com

Some other speech analytic vendors with their market share (%) are CallMiner . 1 Speech Analytics Product and Market Report Reprint - DMG Consulting LLC (as of May, 2014) 30 13.1 24.5 10.7 8.7 4.2 1.7 1.7 0.6 0.6 6.4 Market Share (%) NICE Verint HP/Autonomy CallMiner Nexidia Genesys (UTOPY) Avaya (Aurix) Mattersight Uptivity Calabrio Others /