Thin Client Development And Wireless Markup Languages Cont.

Transcription

Thin Client Development andWireless Markup Languages cont.David TipperAssociate ProfessorDepartment of Information Science andTelecommunicationsUniversity of edu/ dtipper/2727.htmlSlides 12VoiceXML and Voice Portals VoiceXML together with Voice Portalprovide speech enabled access totext/web/voice automated information. Allows user to navigate through voice webpages Why VoiceXML? remember it is a phone first – computer/web devicesecond Advantages– Device independence works with any digital phone (wired or wireless) –Easier more natural I/O Times when voice interaction more appropriate/easier– while driving a car, obtaining directions, access emailover phone, input info/data– Low CostInfsci 1073/ Telcom 27272

VoiceXML Standards based– VoiceXML Forum Industry group (Motorola, Lucent, AT&T, etc) developed VXML 1.0 released in 2000 Based on XML– W3C Voice Browser working group Developed VoiceXML 2.0 VoiceXML 2.1 June 2007– Current focus on improved speech and grammarrecognition and text to speech translation– multi-modal applications Æ Voice Webapplications – call for directions get map plus voicedirections3Infsci 1073/ Telcom 2727VoiceXML Applications Predicted boom in VoiceXML applications –especially in replacement for human operators Sample applications– Information Retrieval Check weather, sports scores, directions (Cingular Voice DialService), stock price, tec.– Directory Assistance AT&T uses this– E-Commerce Catalog ordering, tickets, bill payment, etc– Telephone Services Voice mail management, teleconferencing, secure phonecalls– Unified Messaging Browse – listen to email messages over the phone Record voice and have it sent via email, SMS or voice mail.Infsci 1073/ Telcom 27274

VoiceXML Architecture User connects to Voice Portal that contains VoiceXMLBrowser VoiceXML Browser handles– interaction with user (I/O)– fetches information from web servers– transforms VoiceXML content for delivery to user Portal contains several technology components accessedby browser to handle communication, process VoiceXMLdocumentsInternetWWANClientVoice Portal/Gateway withVoiceXML BrowserServerVoiceXML documents5Infsci 1073/ Telcom 2727Portal Technology Components Automatic Speech Recognition (ASR)Resource– Converts speech signal to text or numbers– Strives to be speaker independent or speakeradaptive– Matches speech with a given set of words orphrases (called a grammar) Much less computationally intensive than speechrecognition Text to Speech Synthesis (TTS) Resource– Coverts text/numeric input to synthesizedspeech - older systems robotic sounding– New systems use waveform oResourceTelephonyResourceTCP/IPResource– .htmlInfsci 1073/ Telcom 27276

Portal Technology Components Audio resource– for playing prerecorded audio files– Recording user input for post-processing Telephony resource– Call processing– Dual Tone Multi-Frequency (DTMF)keypad input– Call transfer to third party– TelephonyResourceTCP/IPResource TCP/IP resource– Provides communication with webservers7Infsci 1073/ Telcom 2727VoiceXML Session1.2.3.4.User calls application phone numberVXML gateway coverts input to a http request to web serverServer responds to VXML gateway with contentGateway converts to interactive audio session with userThe score of the game is .(1) Calling a voice application(2) HTTP requestCellular Network(4) Interactive audiobetween user andvoice applicationInfsci 1073/ Telcom ResourceTelephonyResourceTCP/IPResource(3) ResponseWeb Server(VoiceXML documents,(hosting VoiceXML documentsaudio files)and audio resources)8

VoiceXML Input/Output In a typical session user and application take turns inspeaking/listening - I/O is crucial– Methods for user input1. Spoken CommandsInterpreted by ASR – accuracy improved by specifying a grammar2. DTMF (Dual Tone Multi-Frequency) key inputUsers enters data on keypad – accuracy improved by specifyingexpected input3. Recorded speech for post processingSaved in a standard format (e.g., .wav file)InternetWWANClientServerVoiceXML documentsVoice Portal/Gateway withVoiceXML Browser9Infsci 1073/ Telcom 2727VoiceXML Input/Output Methods for output to user1.Text to Speech (TTS) – synthesized speech on the fly – can soundmachine likeCan mark up how TTS is played2.Prerecorded audio files – downloaded from server and played byportal – sounds more natural to the user and easier to understandoften recorded by a professionalInternetWWANClientInfsci 1073/ Telcom 2727Voice Portal/Gateway withVoiceXML BrowserServerVoiceXML documents10

VoiceXML Concepts Session– Begins when user connects to portal andinteracts with browser– VoiceXML documents are loaded andunloaded as session continues– Session end controlled by user, gateway ordocument Application– A set of VoiceXML documents that share thesame root document.11Infsci 1073/ Telcom 2727VoiceXML Concepts Dialogs– Conversation with user- two basic types Form: presents information and collects user input,contains fields Menu: gives use options to select from and changesdialog state based on input– Sub-dialogs are possible like a function call to commonly used forms/menus– Dialog between user and the application needsto be carefully designed - typically applicationprompts user and user responds in turnInfsci 1073/ Telcom 272712

VoiceXML Concepts Grammars– The expected user input, either spoken orDTMF key presses For example - say or enter your 5 digit zip code”– If spoken input a grammar library is oftenspecified to help interpret the input correctly– Specifying a grammar library greatlyincreases the accuracy of automatic speechrecognition– Should always include error checking andreprompting of user to handle mistakes ininput13Infsci 1073/ Telcom 2727VoiceXML Documents VoiceXML Documents define one or moredialogs VoiceXML documents can contain––––––Spoken prompts (synthetic speech or recorded)Output of audio files and streamsRecognition of spoken words and phrasesRecognition of touch tone key pressesRecording of spoken inputControl of dialog flow Links to other VoiceXML documents Events – response to interruption or incorrect input– Telephony control Call transfer to third party, hang up, etc.Infsci 1073/ Telcom 272714

vXML ConceptsBasic concepts are inter-related as shown below Session invokes 1 or more applications Applications involves 1 or more documents Document can contain 0 to many dialogsInfsci 1073/ Telcom 272715Basic VoiceXML Elements Follows XML format basic Elements startand end with tags element name attribute name attributevalue” /element name Main elements form dialog for presenting and collecting data object platform specific script that may gatheruser input and return grammar set of valid expressions that a user can sayor type when interacting with an application block A piece of non-interactive executable codeInfsci 1073/ Telcom 272716

VoiceXML Output Elements prompt outputs computer generated speech(TTS) or audio filesText for TTS can be marked up to improve quality break insert a pause emphasis increase volume (provide emphasis) say-as to specify a particular style “Still-ers” say-as type “phone” 014126249421 /say-as audio plays a prerecorded file (.wav) audio src “file.wav” common audio file cached at portal reprompt sends processing to original prompt17Infsci 1073/ Telcom 2727VoiceXML example ?xml version ‘1.0’ vxml version ‘2.0’ form block prompt Pitt is it /prompt /block /form /vxml All VoiceXML files (.vxml)begin withxml, vxml prologThis document has a single formwhich contains a block that synthesizesand plays to the user Pitt is it’’Since a successor dialog is not specifiedthe conversation endsVoiceXMLPitt is itPitt is it.xhtml-mp,WML,cHTMLInfsci 1073/ Telcom 272718

Basic VoiceXML Elements Additional elements menu dialog for selecting among several options choice alternative in a menu dialog field gathers user input as defined by a specifiedgrammar filled block of executable code that is run after userinput field filled record records an audio file from user if elseif else conditional logic goto control flow from form within and betweendocuments – like links in html var declare variables transfer - transfers phone call to another number Can add scripting with JavascriptInfsci 1073/ Telcom 272719VoiceXML Examples menu prompt This is the main menu.Please choose a service: news, weather, or sports. /prompt choice next "news.vxml" news /choice choice next "weather.vxml" weather /choice choice next "sports.vxml" sports /choice /menu Infsci 1073/ Telcom 272720

VoiceXML Examples menu prompt This is the main menu .For news press 6; for weather press 9; for sportspress 7. /prompt choice dtmf “6” next "news.vxml" news /choice choice dtmf “9” next "weather.vxml" weather /choice choice next "sports.vxml" sports /choice /menu Note in real applications need error checking and timeouts in place todeal with user input errors. Special VoiceXML elements for this noinput , nomatch etc.21Infsci 1073/ Telcom 2727VoiceXML Error handling noinput catches a noinput event within a timeout period noinput I'm sorry. I didn't hear anything. reprompt/ /noinput nomatch catches a nomatch event when input doesn’t match aspecified grammar nomatch I didn't get that. reprompt/ /nomatch help executed when user says help – can be made universal to wholedocument or local to various parts property name "universals" value "all" / help block Now taking you to Coustemer Services. /block transfer name "services" bridge "true" connecttimeout "300"dest "phone://14088502255" / /help property can control platform features – for example, how longapplication waits for input Æ timeout after 10 secs property name "timeout" value "10" Infsci 1073/ Telcom 272722

Grammars Grammar specifies the natural language words orphrases that will be matched Can be included in the document or reference aseparate file or standard dictionary – several formatsavailable grammar ;GSL2.0 . grammar definition text /grammar grammar src ‘’filename.gram” type “grammar type”/ Most VoiceXML Portals specify a grammar type – forexample based on nuance speech technology grammar type "application/x-nuance-gsl" [ news weather sports ] /grammar 23Infsci 1073/ Telcom 2727VoiceXML Example block prompt This is the BeVocal calculator. /prompt /block field name "op" prompt Choose add, subtract, multiply, or divide. /prompt grammar type "application/x-nuance-gsl" [add subtract multiply divide] /grammar help Please say what you want to do. reprompt/ /help filled prompt Okay, let's value expr "op"/ two numbers. /prompt /field Infsci 1073/ Telcom 272724

VoiceXML Applications Pros– Easy to develop and implement don’t need serviceprovider– Several hosting service available – bevocal, Tellme,VoiceGenie, etc– Easy to use and cost effective (according to GoldmanSachs average 3 /call if human assisted vs. .20/callif automated– Easy to upgrade/modify Cons– Need to carefully construct dialogs or users getfrustrated– Non-uniform grammars and document types can leadto cross platform problems25Infsci 1073/ Telcom 2727VoiceXML Example ?xml version "1.0" encoding "UTF-8"? vxml version "2.0" xmlns "http://www.w3.org/2001/vxml" xmlns:xsi "http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation "http://www.w3.org/2001/vxml http://www.w3.org/TR/voicexml20/vxml.xsd" form property name "bargein" value "true"/ block prompt Welcome to Mad Libs. Press the pound key after you say each word. /prompt /block record name "one" beep "true" maxtime "5s" finalsilence "4000ms" dtmfterm "true" type "audio/x-wav" prompt timeout "5s" Say a verb. /prompt noinput I didn't hear anything, please try again. /noinput /record record name "two" beep "true" maxtime "5s" finalsilence "4000ms" dtmfterm "true" type "audio/x-wav" prompt timeout "5s" Say a noun. /prompt noinput I didn't hear anything, please try again. /noinput Infsci 1073/ Telcom 272726

VoiceXML Example block prompt To be, or not to audio expr "one"/ that is the audio expr "two"/ Whether 'tis nobler in the audio expr "three"/ to suffer the slings and audio expr "four"/ of audio expr "five"/ fortune, Or to take audio expr "six"/ against a sea of , audio expr "seven"/ And by audio expr "eight"/ end them. To die, to audio expr "nine"/ No more; and by a audio expr "nine"/ to say we end the audio expr "ten"/ and the audio expr "eleven"/ natural shocks that flesh is audio expr "twelve"/ /prompt 27Infsci 1073/ Telcom 2727Markup Language Future Multi-modal markup languages proposed to combine features For example X V language proposed by Motorola, Opera SoftwareASA and IBM to W3CInfsci 1073/ Telcom 272728

Gateway Voice Browser ASR TTS TCP/IP Resource Telephony Resource Audio Resource Infsci 1073/ Telcom 2727 8 VoiceXML Session 1. User calls application phone number 2. VXML gateway coverts input to a http request to web server 3. Server responds to VXML gateway with content 4. Gateway converts to interactive audio session with user Cellular .