XML Based Interactive Voice Response System - Ijcaonline

Transcription

International Journal of Computer Applications (0975 – 8887)Volume 74– No.14, July 2013XML based Interactive Voice Response SystemSharad Kumar SinghPT PureTesting Software P Ltd.Noida, IndiaABSTRACTThe paper presents the architecture of a web based interactivevoice response system using Voice XML. The paper includesa discussion on the architecture of the IVR system, itscomponents, and a detailed description of the functionality ofVXML Interpreter and its use in IVR systems. It alsodescribes the integration of VXML Interpreter, CCXMLInterpreter and the related media & telephony resources.Finally it presents performance measurement techniques andtechnical proposal for increasing the performance of such asystem.General TermsInteractive Voice Response System, Telephony, Voice XML,Call Control XMLKeywordsIVR, Voice XML, Web based IVR, VXML Interpreter,CCXML, FIA.1. INTRODUCTIONInteractive voice response (IVR) is a phone technology thatallows a computer to interact with humans through the use ofvoice and DTMF tones input via phone [1]. It has been anessential element in the customer support equation for morethan a decade. The driving idea behind the development ofVXML was to reduce cost because it would not requireexpensive call center agents but would be fully automatedsystems capable of quick modifications.In order to understand the how a VoiceXML IVR applicationoperates it is useful to compare it with a traditional webapplication. In a traditional web application, web browserpresents an HTML based web page from a web server,populates data from a database server and performs specifiedactions based on the user inputs through the web browser.Similarly, in the case of a VoiceXML IVRs, the VXMLengine is responsible for performing the tasks that wereperformed by the web browser and the web server.2. A VOICE XML DOCUMENT2.1 Sample VXML documentFollowing is a sample VXML document: ?xml version "1.0"? vxml application "sample.vxml" version "2.0" form id "form1" block prompt Welcome World /prompt /block /form /vxml Each tag in the VXML document has its meaning and clearlydescribed functionality as per the W3C standards. However tokeep it short, this document plays “Welcome World” as soonas the user dials in the voice xml gateway.2.2 HTML versus VXMLThe advancement in technology from touch-tone to speechenable and then integrating both has introduced newchallenges including:HTML pages and VXML pages are equally alike and distinct.Following is a code sample from a simple HTML page and aVXML page. Integrating disparate customer access points and back-enddata sources.Table 1. HTML versus VXML documentHTML PageVXML Page Scaling to ever-larger systems. Catering to larger number of customers. Quick development time for new customers and maintenancetime for the existing ones.It was difficult to provide solution to the above problemsusing the traditional interactive voice response systems.Hence, the modern tone-speech enabled solutions weredeveloped to alleviate these problems. Voice XML is used todesign and develop the backbone of these solutions which areknown as voice portals.The time required for the development of a voice xml basedsolution is almost three times lesser than what is required fortraditional IVRs [2]. For the very same reasons, VoiceXMLhas been widely adopted and accepted as the platform fordeveloping IVRs in the speech industry. html body imgsrc "Coffee.jpg"/ /body /html vxml version "2.0" form block prompt Coffee /prompt /block /form /vxml In the above example of HTML, a page is set up wherevisitors can view a picture of a cup of coffee. However, in theVXML example, a document has been set up where callerscan hear a prompt stating "coffee. The theories behind theVoiceXML example could be easily understood by digging alittle deeper.31

International Journal of Computer Applications (0975 – 8887)Volume 74– No.14, July 2013Distributing data and logic flow in and around the othercomponents of the gateway. VoiceXML is not only used toconceive and develop vocal but also multimodal solutions [3].In such a case, VXML interpreter acts as a modalitycomponent in the multi modal architecture [4].4.1 VXML InterpreterFig 1: HTML v/s VXMLVXMLInterpreterContextUsing the HTML versus VXML analogy, it is important tocompare how HTML pages are served and how VXML pagesare executed when a caller calls. When designing an HTMLpage, a document like above is first created, and thenuploaded it to some webserver so that it can be fetchedwhenever requested. Similarly in VoiceXML the content mustbe located on a webserver so that the servers can fetch itwhenever required.VoiceXML works with the same general principles but herethe telephone is used as the browser.The next point of distinction is in how documents are fetchedand executed. When a user clicks a link to the 'coffee' HTMLpage, a request is sent to the webserver hosting the document,and the HTML page will then load the page in the user's webbrowser with the accompanying picture of a cup of coffee.Again, VoiceXML works on the same principles, with theonly difference that the telephone is used as the web browser.Instead of clicking on a link, a user will dial the numberpointing to the VMXL gateway, which is the equivalent of ahyperlink in HTML. When this number is dialed, it tells theserver/interpreter to fetch the document that has beenassociated with that particular phone number. Assuming thatthe code is well formed XML, and that the mapping has notypographical errors, the coffee.vxml file will be loaded andexecuted, thus outputting the text to speech message to thecaller.3. ARCHITECTURE OF VXML BASEDIVR SYSTEMA typical VoiceXML based system contains the four maincomponents. Telephone Network: It could be a PSTN network or VoIPpacket network. VoiceXML Gateway: It is the core of the voice xml basedIVR systems. It comprises of an engine for interpretingVXML documents, speech synthesis, grammar recognition,audio playbacks and telephony resources. Application Server: It is typically a Web Server that hoststhe VoiceXML documents. TCP/IP Network: LAN, WAN or public Internet.VoiceXML connects to Telephone Network on one side andTCP/IP network and Application Server on the other side.4. VOICE XML GATEWAYThe components included in the voice xml gateway revolvearound the engine which is known as the VXML interpreter.This engine is responsible for Mediating, Controlling ECMA Script Scopes andVariablesXML ParserModuleWeb ServerInterfaceModuleFig 2: VXML Interpreter4.1.1 Design optionsThe VXML interpreter can be designed in any of theprogramming languages, the most common being C andJAVA [5]. However, there isn’t much difference in the designperspective while using C or JAVA, both being an ObjectOriented Language, the difference lays in the third partylibraries and devices which would be required tocommunicate with the VXML Interpreter. Here, in this articlethe design and discussions are based on the C paradigm.4.1.2 ComponentsVXML Interpreter engine comprises of the following majorcomponents:4.1.2.1 Web Server Interface ModuleThe Web Server Interface Module uses various protocols,provided within the implementation to fetch documents andapplications from a document server.The common protocols which are used in order to request forand download documents from the document server via theHTTP client are through HTTP get, HTTP post, ftp andHTTPS. The protocol to be used and the document to bedownloaded are specified during session initiation [6].However, these could be specified again while traversingfrom one document to another.32

International Journal of Computer Applications (0975 – 8887)Volume 74– No.14, July 2013A DNS lookup service is sometimes also required to resolvethe named address. Multi-threading safe third party librariessuch as libcurl etc. are used to achieve high performancedocument fetching in the engine.encounters prompt element or block element with a text, itneeds to be played as an audio to the caller [10]. The VXMLinterpreter sends this audio text or a .wav file location to themedia resource which in turn plays it to the caller.4.1.2.2 Form Interpretation Module4.3 Telephony ResourceThe form Interpretation Module is responsible for theimplementing the functional logic and traversing through theVXML document based on caller inputs [7]. The forminterpretation algorithm (FIA) drives the interaction betweenthe caller and a VoiceXML input items. Execution of the codein forms is handled by the form interpretation algorithm(FIA), which loops through the items in the form andprocesses (or reprocesses) them.The telephony resources include DTMF inputs and Callcontrol. DTMF inputs serve as caller responses to the Voicegateway. The call control features such as call disconnect, calltransfer, conference etc. which are sent from the telephone arehandled by the telephony resource. In order to achieve thesefunctionalities Call Control Interpreter and VXML Interpretershould work in conjunction as described in the fig. 3.According to the W3C recommendation, FIA must handle thefollowing [7] Form initialization. Prompting or delivering audio to the caller. Grammar activation and deactivation Entering the form with an utterance that matchedone of the form's document-scoped grammars whilethe user was visiting a different form or menu. Leaving the form because the user matched anotherform, menu, or link's document-scoped grammar. Processing multiple field fills from one utterance,including the execution of the relevant filled actions. Selecting the next form item to visit, and thenprocessing that form item. 5. CCXML GATEWAYThe CCXML (Call Control XML) gateway is required tohandle call control requirements that are beyond the scope ofthe VoiceXML specification. Although the CCXML andVXML can be used in conjunction, both are mutuallyindependent. VoiceXML does support certain call controlfeatures like transfer etc. but these would not be enough forproviding complex solutions for call controls, which would berequired in IVR systems. Hence, there are two approaches indesigning an IVR system using the VXML. The first approachcombines SIP as a control language and VoiceXML as theinteraction language [11]. The second approach is to useCCXML as a control language. Former being more used withtraditional IVR systems built with CCXML and the latter isused in more recent and advanced IVR systems. CCXMLoffers more controlled and advanced call control features withthe advantage of short development time required toimplement and modify the system. The followingrequirements are addressed by this w3c specification [12]: Choosing the correct catch element to handle anyevents thrown while processing a form item.The interpretation algorithm of FIA is broken into threedifferent phases viz. select, collect and process phase [8]. Inselect phase the VXML interpreter selects the form item(input item) which needs to be processed. Then in the collectphase it collects the caller input against the selected item andvalidates with the active grammars which are basicallyvalidation rules for the input item [9]. Finally in the processphase, it processes the user input as per the execution logicdefined in the VXML document. 4.1.2.3 XML Parser ModuleThe XML Parser module acts as the interface between theengine and the third party XML parsing and validation library.It is also responsible for handling the tags pertaining to theparsed DOM trees. It also contains functionality which allowsthe user to validate dynamically generated XML scriptsagainst DTD and Schema. Its major functionality is to convertthe parsed VXML file into a binary format to facilitate quickcycles for the Form Interpretation Module.4.2 Media ResourceThe media resource is responsible for speech recognition andgrammar validations for the input received from the caller asan audio. It matches the input against the active grammars andconverts the audio response to binary format which could beunderstood by the VXML Interpreter. It is also responsible forconverting text to speech and audio playback. Whenever FIA Support for multi-party conferencing, with advancedconference and audio control. A conferencing applicationinvolves multiple participants, and is dependent upon callcontrol to establish relationships between thoseparticipants.The ability to give each active call leg its own dedicatedVoiceXML interpreter. For example, in VoiceXML, thesecond leg of a transferred call lacks a VoiceXMLinterpreter of its own, limiting the scope of possibleapplications.Sophisticated multiple-call handling and control,including the ability to place outgoing calls.Handling for a richer class of asynchronous events.Advanced telephony operations involve substantialamounts of signals, status events, and message-passing.VoiceXML 2.0 does not integrate asynchronous"external" events into its event-processing model.VoiceXML lacks the external interfaces required tointeract with an outside call queue, or place calls onbehalf of an external document server.The Voice XML when used with CCXML provides thebenefits such as lower code complexity and higherextensibility [11]. In such a scenario, the engine and gatewayonce developed does not require repeated and frequentmodifications. The only customization that is required is increating or modifying the VXML/CCXML documents whichare easier to code when compared to traditional codedeveloped in a high level language. Hench the codecomplexity is reduced. Also XML based languages are farmore extensible when compared with SIP based architecture.33

International Journal of Computer Applications (0975 – 8887)Volume 74– No.14, July rolInterfaceMediaDialogServerCallerFig 3: CCXML – VXML Integration Architecture6. DEVELOPMENT TOOLS There are numerous resources which can be used to quicklycreate or modify an error free VXML document. Followingare few of them:6.1 Plain text IDEThese are integrated development environments for creatingand modifying a VXML/CCXML document. The IDEsusually should have any or all of the following features: Color Coding – This enables the developer toquickly identify the reserve keywords, matchingtags, custom strings for prompt and audio playbacksetc.Inline syntax checking – The IDEs usually act like amini interpreter; in which code can be continuouslyparsed while it is being edited, providing instantfeedback when syntax errors are introduced.Document traversal – The VXML documents maycontain certain transition tags which require a newdocument to be loaded. These IDEs are also able toimmediately load the next document as soon as theuser clicks on the hyperlink or the URL.6.2 GUI based IDEUsing the GUI based integrated development environmentsdeveloper can use the drag and drop feature to create ormodify any VXML document. These IDEs serve the samepurpose as Plain Text IDEs but differ in the following: Drag and Drop – If a developer wants to create ormodify any tag in the VXML document, it is notrequired to type the tag and its parameters. Instead,the developer drags the required tag from theavailable tag tools and it is automatically populatedat the desired location.Proactive rather than reactive – The IDE does notallow developer to insert any incorrect tags orcorrect tags at incorrect locations. Hence, it actsproactively rather than throwing an error after thedeveloper finishes typing.Large binary size –GUI based IDEs are complex todesign and have a very large binary or source codesize.Low availability – Since these types of IDEs requirea lot of monetary and development effort, these arenot easily available commercially as well as in theopen source domain.7. PERFORMANCE MEASUREMENTAND OPTIMIZATIONPerformance is one of the key features of any VXML orCCXML interpreter. It is measured as how the integratedsystem performs in terms of responsiveness and stabilityunder a particular workload. The approach suggested heretreats the complete gateway as a black box. Artificial load isgenerated from a different program which is known as aworkload generator. The workload generator calls the APIsexposed by the telephony interface. As soon as a call isreceived on the telephony interface through the workload34

International Journal of Computer Applications (0975 – 8887)Volume 74– No.14, July 2013generator, it follows the normal execution path to completethe call which would typically involve session initiation,fetching documents, parsing, traversal and dialoginterpretations etc. When it completes the call it returns to theworkload generator with the status of the call. This statuswould comprise of time taken, success/failure etc. A numberof calls are generated from the workload generated in order toperform a load test on the system. The through put of thesystem varies depending on many reasons but the followingtwo are the critical ones to look at:𝑇ℎ𝑟𝑜𝑢ℎ𝑝𝑢𝑡 1number of calls to the interface𝑇ℎ𝑟𝑜𝑢ℎ𝑝𝑢𝑡 1size of the VXML or CCXML documentThe first one is more of a business requirement that drives theload testing and cannot be regulated or improved throughtechnical improvements in the system. The second one ismore relevant for making improvements to the system. It isobserved that the performance degradation for hugedocuments is due to two major factors: Time taken in downloading the document.Parsing the document.Caching could be used in order to reduce the time taken todownload the document. The document should be cached andchecked before downloading, for any modifications that havebeen done on the previously downloaded document. If thedocument has not been modified from the last download it canbe fetched from the cache. Similarly, a caching mechanismcan be developed for the parsed binary tree which is createdafter the parsing of the document as described in section4.1.2.3 of this article. These two techniques could be used inconjunction with each other in order to make majorimprovements in the performance of the VXML Engine.8. PLATFORM CERTIFICATIONIn order to establish a VXML based IVR system, the platformhas to go through a series of validation tests which areprovided by the VXML forum [13]. These tests are based onW3C VXML specifications and standards. These are basicallyan exhaustive set of tests which act as a benchmark for theplatform based on the W3C standards. The VoiceXMLForum’s Platform Certification Program usually providesvendor-independent, industry-standard certification thatsupports all parts of the VoiceXML ecosystem.IJCATM : www.ijcaonline.org9. ACKNOWLEDGMENTSThe author would like to express his sincere thanks to all thepeople in the VXML Interpreter development team for theirhelp and support without which it would not have beenpossible to get the deep insights in the Voice XML.10. REFERENCES[1] From Wikipedia, description about interactive voiceresponse systemhttp://en.wikipedia.org/wiki/Interactive voice response[2] VoiceXML Forum is a global industry organization thatworks to accelerate the adoption of VoiceXML andadjacent technologies. The reference is taken from thefrequently asked questions of the ed-questions[3] Anderson, E. A., Breitenbach, S., Burd, T.,Chidambaram, N., Houle, P., D. Newsome, D, Tang, X.,Zhu, X., Early Adopter VoiceXML, Wrox, 2001 p 24.[4] W3C Recommendation for VoiceXML 3.0.http://www.w3.org/TR/voicexml30/[5] Adam Hocek, David Cuddihy, Prentice Hall ProfessionalPublished: January 2003, Definitive VoiceXML[6] The Staff of DreamTech Inc, McGraw-Hill Companies2002, VoiceXML 2.0 Developer's Guide[7] W3C Recommendation for VoiceXML 2.0.http://www.w3.org/TR/voicexml20/[8] A. Larson, Prentice Hall Professional TechnicalReference 2002, VoiceXML: Introduction to DevelopingSpeech Applications[9] W3C Recommendation for SRGS grammar.http://www.w3.org/TR/speech-grammar/[10] Bob C. Edgar, C M P Books, 2001, the VoiceXMLHandbook.[11] Daniel Amyot and Renato Simoes 2007 “CombiningVoice XML with CCXML: A Comparative Study”,Consumer Communications and Networking Conference,2007.[12] W3C Recommendation for CCXML 1.0.http://www.w3.org/TR/ccxml/[13] VoiceXML Forum is a global industry organization thatworks to accelerate the adoption of VoiceXML andadjacent technologies. The reference is taken from theplatform certification section of the on35

4. VOICE XML GATEWAY The components included in the voice xml gateway revolve around the engine which is known as the VXML interpreter. This engine is responsible for Mediating, Controlling and Distributing data and logic flow in and around the other components of the gateway. VoiceXML is not only used to