Deciphering Voice Of Customer Through Speech Analytics

Transcription

Deciphering Voice of Customerthrough Speech AnalyticsDiscover Customer Insightsand Improve CXTHOUGHT LEADERSHIPW H I T E P A P E R90% of all customerconversations are happeningon the phone. That’s 56 millionhours of customer phone callsevery day!Source: Gartner

Contents1.Introduction .22.What is Speech Analytics? .22.1.Challenges analyzing speech data .22.2. Algorithms for transforming speech to structured data .32.3. Key components of a speech recognition system for Analytics .32.4. The Technology comparison .53. Organizational Use Cases For Speech Analytics .64.Solution Landscape: Vendors, Product and their market share .65.Recommended Architecture for Speech Analytics .76.Case Study: Predicting NPS in Health Insurance .87.Conclusion .8

1. Introduction For most retail businesses, customer interactionvia call centers is a very significantcommunication channel. Organizations typicallyreceive thousands of customer calls every day.According to an industry report, over 56 millionhours of conversations (nearly 420 billion words)are spoken a day in call centers worldwide. If theaudio data thus collected can be aggregated2. What is SpeechAnalytics?Speech analytics is a powerful tool for analyzingrecorded calls, structuring customer interactionsand gaining insight in the hidden information. It canbe used for audio mining, speech categorization,intelligence extraction, decision making, monitoringagent performance.and analyzed, it can yield quality insights intoIf applied correctly and used effectively speechcustomer expectations, preferences, serviceanalytics can help improve service quality, reduceissues & product usage. While speech analytics isoperating expenses, boost revenue, and reducenot a new technology to the market, most of thecustomer attrition. If integrated well with overallbusiness executives are still skeptical about thestrategy it can help businesses drive product &value it can add.process innovation leading to significant market This whitepaper aims to illustrate basictechnologies used in speech analytics, their usecases and how ROI from speech analyticssoftware can be maximized.differentiation. However there are significantchallenges in transforming speech data to astructured form which can be subjected to furtheranalysis.2.1. Challenges analyzing speechdata The speakers differ in speaking style and speed,gender, age, dialects, physical attributes (such asvocal tract). Any speech recognition system has totake all these features into consideration. Forexample “service provider” may be recognized as“serve his provide her”.Unstructured Humans in addition to speech, also communicatevia facial expressions, emotions, postures and eyemovements, and these are missed by anautomatic recognition system (ASR). While interacting in the real time environmenthumans encounter lot of unwanted sounds callednoise and these need to be filtered from thespeech signals. Homophones (i.e. words that are pronounced thesame but differ in meaning. For e.g. two & too) andword boundary ambiguities pose a major problemto speech recognition systems.analytics.rsystems.com 2

The acoustic waves change with the properties ofthe channels used.process for conversion of unstructured voice data The continuity in speech leads to problems relatedto identification of word boundaries.from written languages.To fully leverage the information encapsulated incustomer calls, you need to transform the interactiondata in the audio files to a more structured datawhich can be queried by analysts & can be consumedby sophisticated machine learning algorithms.direct phrase extraction achieve this by deploying asome or all of the following components: anacoustic model, grammars, a language model andrecognition algorithms. These components arestandard in any speech analytics software today.But understanding them is important as thequality of speech analytics is dependent on howthese components are configured and whatalgorithms they use.Phonetic IndexingRoot Cause AnalysisSpeech-to-Text(LVCSR)Emotion DetectionDirect PhraseExtractionScript AdherenceTalk AnalysisSpeech Analytics AlgorithmsValuable Insights forcontinuous improvementStructured OutputAudio and TextInteractionsphonetic indexing, speech-to-Text (LVCSR) andspeech recognition system which comprises of2.2. Algorithms for TransformingSpeech to Structured DataTransfer me tothe head officeto a structured and more useful information.The different speech analytics approaches such as Grammatically spoken language is very differentUnstructured DataFigure 1 gives the complete high-level flow of theFigure 1: Flow for transformation of unstructured to structured data in speech analytics2.3. Key Components of a SpeechRecognition system for Analyticscomponents of a speech recognition system areshown in Figure 2.Acoustic model: an acoustic model represents therelationship between an audio signal and thephonemes or other linguistic units that make upspeech. An acoustic model contains statisticalrepresentations of each of the distinct sounds that72% companies believe that speech analyticscan lead to improved customer experience,makes up a word. An acoustic model is created using68% regard it as a cost saving mechanism, 52%a speech corpus and training algorithms that createrespondents trust that speech analyticsthe statistical representations called Hidden Markovdeployment can lead to revenue enhancement.Models (HMMs), for each phoneme in a language.Each phoneme has its own HMM, and HMM is oneSource: Opus Research Surveymost common type of acoustic models. The differentanalytics.rsystems.com 3

househouseHOUSAND [HOUSAND] hh aw s ax n dHOUSDEN [HOUSDEN] hh aw s d ax nHOUSE [HOUSE] hh aw sHOUSE'S [HOUSE'S] hh aw s ix zAcoustic waveformHMM ModelGrammar&LMsPronunciation dictionaryAcoustic ModelFigure 2: Different components of a Speech Recognition SystemLanguage-specific acoustic models are usedthe number of possible words that need to bedirectly in the Phonetics indexing approach beingconsidered at any one point in the search. This resultsused for speech analytics. The basic recognition unitin faster execution and higher accuracy. Tri-grams arefor this approach is a phoneme and it is a dictionarythe most commonly used LMs in ASR. Theindependent approach. It allows users to queryprobabilities of n-grams help in determining whichphonetic strings and perform orthographic searchn-gram is more probable in comparison to the otherusing a pronunciation model.similar n-grams. For example the P (I saw a van) P(eyes awe of an) for correct recognition of the phrase.Language model: a statistical language model(SLM) consists of a list of words with their probabilityof occurrence. It is used to restrict search by limitingThis is calculated using a language model similar towhat is shown below.P(I s ) 0.67P(eyes s ) 0.25P(saw I) 0.63P(a saw) 0.5P(of awe) 0.45P( /s van) 0.6P( /s an) 0.15P(awe eyes) 0.33The large-vocabulary continuous speechwhereas, the ‘vocabulary’ file defines wordrecognition (LVCSR, also known as speech-to-textcandidates in each category, with its pronunciation(STT) or word spotting) speech analytics approachinformation. For illustration, consider the sentenceuses both language-specific acoustic models and“I’ll take one house please”. The ‘house.grammar’ inlanguage models. The basic unit for this approachthe BNF would be as shown below. ‘S’ indicates theis a set of words, and these are generally bi-gramssentence start symbol. The rewrite rules are definedor tri-grams. In order to map words to phoneticusing ‘:’ symbol.forms it also uses a pronunciation model or adictionary. This is a dictionary dependentS : NS B HMM SENT NS Eapproach.S : NS B SENT NS ESENT: TAKE V HOUSE PLEASERecognition algorithms: perform speechSENT: TAKE V HOUSErecognition based on written grammar. ThisSENT: HOUSE PLEASEgrammar describes the possible patterns of words.SENT: HOUSEThe recognition grammar is generally given usingFRUIT: NUM HOUSE Ntwo files the ‘grammar’ and the ‘vocabulary’ file.FRUIT: HOUSE N 1The ‘grammar’ file defines category-level syntax,analytics.rsystems.com 4

The vocabulary file contains definition of each word defined in the ‘grammar’ file. The partial ‘house.voca’ filefor the ‘house.grammar’ file is given below.% NS B s sil /s % NS Esil% TAKE VI'll takeay l t ey k% PLEASEpleasep l iy z% HOUSE Nhousehh aw s%HOUSE N 1Houses% NUMonetwow ah nt uwhh aw si z2.4. The Technology ComparisonFILLERFILLER% HMMfmw eh l Audio Processing: The audio needs to bereprocessed for new words added to theThere has been a long-standing debate on the meritsdictionary. This proves to be a time consumingof these approaches. Several factors are consideredand a costly process, as experts are also neededwhile choosing the technology. There is no one bestfor entering words in the dictionary. The audio isapproach that may be pointed out, however, theprocessed just once in the dictionaryapproach may be chosen based on the requirementsindependent approaches.as discussed below. Speech transcripts: The dictionary independent Targeted listening or calls of interest: If you aim toapproaches (phonetic), due to absence of alisten to calls containing specific keywords, then alanguage model cannot be used to generate asystem with dictionary dependent (LVCSR) may bemeaningful orthographic transcript of speech.preferable. The dictionary dependent approachescan recognize words that are already in the lexicon. Out of Vocabulary (OOV) words: If it is very likely Precision vs. Recall: A higher precision rate maybe obtained with a dictionary dependentsystem on words that are already in thethat you encounter new words in your searchdictionary, but the recall rate suffers as there aredomain, then dictionary independent approachalways some missing words. This also results inshould be preferred. OOV handling is a major issuehigher error rates. Phonetic approaches havein the dictionary dependent approaches.low precision but higher recall.FeaturesSpeedRecallPrecisionError LowHighHighLowApproachTable 1: Comparison of the approachesanalytics.rsystems.com 5

3. Organizational Use Cases for Speech AnalyticsSome of the use cases of speech analytics in a call Agents’ Performance Monitoring: Monitoringagents’ interactions with customers can easily detectcenter could be: Tone and Sentiment Analysis: Speech analyticssystems can analyze the tone and detect thesentiment of voice. Tone can also signify age, thissatisfying customers. This analysis may be used fortraining the agents for improving their performance. Call Segmentation: Certain calls may be difficult forcan be used to determine the effectiveness of amarketing campaign on a specific age segment.agents to handle. Such calls may be identified,segmented, and implemented using specific Talk and Silence Pattern Analysis: The talk andthe silence patterns can be analyzed to measureemotions and levels of satisfaction. A set ofproactive agents and also agents not successful inbusiness processes. Decision making: The insights gained from theuser-defined phrases can be used for detectinginformation may help in making decisions andagent actions. Thus, it helps in identifying andimplementing new policies for product & serviceprioritizing what needs immediate attention.improvement.4. Solution Landscape: Vendors, Products and theirMarket ShareAccording to DMG consulting and ContactBabel1, only 24% of the organizations are currently using a speechanalytics solution. However, interest in speech analytics is growing & the market for this will continue to expandover the next several years. DMG consulting estimates this growth to be 18% in 2015, and 16% in 2016 and 2017.The market shares of some of the leaders of speech analytics organizations is summarized in Table 2.MarketShare 1(%)VendorProductApproach usedNICENICE Interaction AnalyticsPhonetic indexing and TranscriptionVerintImpact 360 Speech AnalyticsAudio indexing24.5HPAutonomyHP IDOL (intelligent DataOperating LayerMeaning based computing (MBC).MBC stresses relevance along with accuracy13.1NexidiaNeural Phonetic SpeechAnalyticsAutomatic speech recognition (ASR),phonetic indexing & word-level transcription8.7Genesys(UTOPY)Speech and Text Analytics(formerly, SpeechMiner)Direct phrase detection and Transcription4.230Table 2: The product, approach & the market share of major vendors in speech analyticsanalytics.rsystems.com 6

Some other speech analytic vendors with their market share (%) are CallMiner (10.7%), Avaya (1.7%),Mattersight (1.7%), Calabrio (0.6%), and Interactive Intelligence (0.5%). An overall market share of top tenvendors is shown in Figure nesys (UTOPY)Avaya (Aurix)MattersightUptivityCalabrio4.21.7Market Share(%)6.4308.710.713.1Others24.5Figure 3: Top ten vendors’ market share5. Recommended Architecture for Speech AnalyticsWe propose a 2-phase architecture shown in Figure 4, which allows data discovery & predictive analytics onthe voice call data. This architecture aims at classifying contexts and issues being constantly talked about oncalls & predicting the customer behavior. Since the insights generated are contextual these can help productmanagers, service units & other units to derive strategic inputs. The R Analytics Engine (RAE) takes phoneticsor text as input. The engine analyzes the input & models it based on the key factors, which proves helpfulin deriving customer segment based strategic insights and predicting the KPI’s and the behavior.Tag & ModelTAG & MODELSEARCH & PRE-PROCESSPREDICT & ANALYZEAutomatic Speech Recognition (ASR)ASR extTranscriptionSearch PhraseTextDataSTEPS & ACTIONSSpeechCorpusPhoneticIndexingR Analytics Engine (RAE)Derive customersegment basedstrategic insightsPhrase ParsingGOALS & OBJECTIVESMODELLINGPredict KPIs andbehaviorCallDataGAIN INSIGHTSFigure 4: Recommended Architecture for Speech AnalyticsSpeech Analytics Product and Market Report Reprint - DMG Consulting LLC(as of May, 2014)1analytics.rsystems.com 7

6. Case Study: PredictingNPS in Health InsuranceNet Promoter Score (NPS) is a metric frequentlyfurther amenable for analysis and modeling. Wetypically found 2-grams & 3-grams having moreinformation gain ratios than uni-grams and hencebetter predictors of class categories.being used to measure an organizations’Results: The impact of call category, call duration,performance from customers’ perspective, and toand other related factors were analyzed on the NPSmeasure the loyalty existing between two entities.score. It was found that the FCRs & the hold timeThe entity could be a company-customer or anhad a significant impact on the NPS score. Additionemployer-employee. NPS is calculated based onof 2- gram & 3-grams extracted from text transcriptsthe score provided by a customer on a scale of 0-10using speech analytics improved the NPS predictionfor the question “How likely is it that you wouldsignificantly by 8.6%. We were also able to predictrecommend my company to someone?” This scalethe NPS performance at agent, call center andsegregates customers into three categories basedcampaign level. The model results were used toon the scores provided; the Promoters (9-10), thedrive specific training agenda for call center agents.Detractors (0-6) and the Passives (7-8)2.NPS is computed as the difference between the %of Promoters and the Detractors on the basis ofsample surveys which are conducted byorganizations periodically. Predicting NPS on thebasis of customer call & speech data is one of themost compelling use cases of speech analytics.Rather than waiting for next survey, speechanalytics can be used to predict probablepromoters and detractors which can helporganizations estimate their overall NPS score inreal time basis. In case of declining NPS score thuspredicted, strategic changes can be done inbusiness and service to maintain the desiredresults without waiting for next survey results.7. ConclusionThe information shared and exchanged throughspoken interaction data largely remains untapped.Speech analytics can provide a solution by collectingthis data and provide insight on these interactions.Speech analytics can prove to be a revolutionaryapproach in measuring the customers’ emotions,context and the intent. Customers generally do notbother telling people about a great customerservice experience but the same customer makessure that almost everyone knows about a bad one.Therefore, business success depends heavily on thecustomer experience and so enhancing thisexperience is critical for success of any business. TheObjective: Our objective of this case study was toR Analytics Engine (RAE) can help accelerate datapredict promoters, detractors and passives fromdiscovery and mining of speech data.the call data available from a large healthinsurance provider in US. The data availableincluded customer’s behavioral data from CRM, thecall characteristics (hold times, FCRs,recommended CSA etc.) and text transcriptsCurrently, the global speech analyticsgenerated using the speech analytics engine.market is estimated to be aroundChallenges: The text transcripts as provided had lot 450 million.of noise. The data had to be preprocessed andcleaned by our R Analytics engine to remove noise& convert it to a more structured form which owanalytics.rsystems.com 8

About the AuthorsRenu Balyan is a Lead Analyst (Data Science) with R Systems International Ltd. Her areas of interest includedata analytics, machine learning, machine translation and information extraction. She has published 14papers in national and international conferences & journals. She is currently pursuing her PhD from IIT Delhi.She has worked as an intern with Dublin City University, Ireland. She has also worked as a research fellow andproject engineer with Centre for Development of Advanced Computing, Noida & worked on various projectsrelated to natural language processing (NLP) for nearly 6 years.Praveen Pathak is the Analytics Practice Head for R Systems International Ltd. With over 14 years ofconsulting, in-house & offshore analytics delivery experience, Praveen focuses on providing R System’s clientsbest in class analytics solutions and services. He has extensive hands-on as well as leadership experience inanalytics, information management, predictive modeling, optimization and big data technologies to drawdata driven insights and help address business challenges. His interests include Artificial Intelligence, NeuralNetworks & High-performance computing.About R SystemsR Systems is a global digital transformation leader that provides AI-driven solutions to clients across industries,through a broad range of technology & AI/analytics services. We continue to empower organizations for over27 years, with 16 delivery centers, 25 offices worldwide and a workforce of 2750 professionals.For more information, visit analytics.rsystems.com www.rsystems.comYou can also directly contact us at: 1 8447797276 / 8557797276analytics@rsystems.com 2021 R Systems International Limited. All Rights Reserved.All content/information present here is the exclusive property of R Systems International Ltd. The content/information contained here is correct atthe time of publishing. No material from here may be copied, modified, reproduced, republished, uploaded, transmitted, posted or distributed inany form without prior written permission from R Systems International Ltd. Unauthorized use of the content/information appearing here mayviolate copyright, trademark and other applicable laws, and could result in criminal or civil penalties.

2. What is Speech Analytics? Speech analytics is a powerful tool for analyzing recorded calls, structuring customer interactions and gaining insight in the hidden information. It can be used for audio mining, speech categorization, intelligence extraction, decision making, monitoring agent performance. If applied correctly and used effectively .