Virtual Human Toolkit Tutorial - Institute For Creative Technologies

Transcription

Virtual Human Toolkit TutorialAdam Reilly & Wendy WhitcupOctober 2018The work depicted here was sponsored by the U.S. Army. Statements and opinions expressed do not necessarilyreflect the position or the policy of the United States Government, and no official endorsement should be inferred.

Virtual Humans GoalCreate compelling characters that can engage users inmeaningful and realistic social interactions2

Objectives To enable digital characters that Are autonomousFully perceive their environmentInteract naturally, both verbally and nonverballyModel their own and other’s beliefs, desires and intentionExhibit emotionAre efficient, coherent and integrated3

Related Work – Capabilities Automated Speech Recognition: Sphinx (CMU) Perception: Computer Expression Recognition Toolbox(CERT) (Littlewort et al., 2011) Task Modeling: DTask (Bickmore et al., 2009) Natural Language Generation (Stone, 2003) Text-To-Speech: Festival (Taylor et al., 1998)4

Related Work – Frameworks SAIBA Framework Function Markup Language (FML) (Heylen et al., 2008) Behavior Markup Language (BML) (Kopp et al, 2006) BML Realizers LiteBody (Bickmore et al., 2009)Greta (Poggi et al., 2005)Elckerlyc (Van Welbergen et al., 2010)EMBR (Heloir & Kipp., 2009)SmartBody (Shapiro 2011)5

Integration Research opportunities Which integrated abilities are essential for which type of interactions?Which are desired?Which minimal and preferred dependencies exist between them?How do abilities affect each other?How can they best leverage each other to increase effectiveness? Larger effort with all VH and other ICT groups, as well as theresearch and development community (BML, PML, etc.)6

Virtual Human erRenderer8

Twins Architecture – Question / Answering AgentNPCEditor dedSpeechSmartBodyUserGamebryo9

Rapport Architecture – Virtual ListenerRule-Based Action Selector peechSmartBodyUserUnity10

SimSensei aPocketSphinx /BBNPrerecordedSpeechSmartBodyUnity11

Gunslinger12

Gunslinger13

Monticello: VR15

Monticello: AR16

VHToolkit

Virtual Human Toolkit Overview Offers components that cover Audio-visual sensingAutomated speech recognitionNatural language processingNonverbal behavior generationBehavior realizationText-to-speechRendering Integrated as part of a modular, flexible architecture Allows mixing and matching with one’s own technologieshttps://vhtoolkit.ict.usc.edu18

Toolkit – Main Modules Audio-Visual Sensing: MultiSenseNatural Language Processing: NPCEditorNonverbal Behavior Generation: NVBGBehavior Realization: SmartBody19

Toolkit – Remaining Components ASR: AcquireSpeech, w/ PocketSphinx (Google, AT&T)TTS: Festival, MS SAPI, CereVoiceRenderer: Unity, OgreActiveMQ-based messaging (VHMsg)Authoring toolsDebug toolsSupporting libraries20

Toolkit Implementationnvb parservrPerception (PML)NPCEditorvrExpress (FML BML)NVBGNVB-Parsernvb resultvrSpeak (BML)vrSpeechRemoteSpeechCmdMultiSense(GAVAM, CLM, FAAST)TTSRelayAcquireSpeechPocketSphinxSmartBody DLLRemoteSpeechReplyGoogle ASRFestivalUnityHuman UserAT&T ASRMS SAPIFramework21

Main Modules

The Multimodal Challenge: From Signals to User StateAudio signals Voice pitchVisual signals Head poseBehaviors /Gestures Head nodPitch accentPerceived User State DistressEngagementFrustration Verbal Auditory VisualHuman Communication Dynamics23

MultiSense: Multimodal Perception Library(Scherer et al., 2012)CONSUMERTOOLS\MODULESPROVIDERS:Audio CaptureWebcam CaptureMouseKinect( kupLanguageSensing Layer person id “subjectA” sensingLayer headPose position z "223" y "345" x "193" / rotation rotZ "15" rotY "35" rotX "10" / confidence 0.34 confidence/ /headPose . /sensingLayer /person TRANSFORMERS:Facetrackers Gavam CLM/CLMZ Okao ShoreReal-time hCRFActiveMQ –VHMessengerEmoVoiceCONSUMERS:Image PainterSignal PainterBehavior Layer person id “subjectB” behaviorLayer behavior type attention /type level high /level value 0.6 /value confidence 0.46 confidence/ /behavior . /behaviorLayer /person 24

NPCEditor(Leuski & Traum 2008) Statistical textclassifier Authoring GUI Basic DM Groovy scripting25

NPCEditor Novel question:“what do you do here, girls”? Rank Similar questions Relevant answers Relevant words Appropriate answers26

NVBGCognitiveModuleReasoning,Emotion,Language(Lee & Marsella tionFunction RulesNVB RulesBehaviorRealizationBehaviorDescriptionto AnimationControllerBehaviorDescriptionParse TreeCacheSurface TextNaturalLanguage Parser 26 speech related rules29 rules for gaze behaviors20 rules for listener reactionsOther rules: negotiation stance, idle behaviorsRules are modifiable and extensible for different projects, animations, characters, etc.27

NVBGRulesmapped tosemanticstructureRulesmapped tosyntacticstructureDerivationNo, not, nothing,cannot, noneReally, very, quite,great, absolutely,gorgeous Yes, yeah, I do, Wehave, It’s true, OKFunctionNegationBehaviorHead shakes onphraseIntensificationHead nod and browfrown on wordAffirmationHead nods and browraise on phraseEverything, all, whole,several, plenty, full InclusivityLateral head sweepand brow flash on word NP (noun phrase)Life-likeFirst VP (verb phrase)Life-like Head nod at the startof noun phraseHead nod and beatgesture on start of thefirst verb phrase 28

NVBGInfo oncognitivestateInfo onlistener’sreactionInfo onnegotiationstanceInput MessageFunctionBehaviorStatus: speaker &planning speechHold ConversationTurnGaze aversion fromaddresseeStatus: speaker andseek social supportShow desire for socialsupportGlance at other entityStatus: listener andspeechcomprehension lowShow confusionEyebrow lowered,head tiltStatus: listener andemotion surpriseShow emotionalreactionEyebrow raised andpulled together, eyelidopenNegotiation stance:delayShow defensivestanceChange posture tocrossed arms on chest 29

SmartBody Controllers that generatemotion via:Procedural algorithmExample dataSpeechSome controllers used inhierarchyGestures EmotionGeneralized movements - Specialized movementsPostureHeadMovementsReaching /TouchingGazingLocomotionControl via BML gaze target “john”/ Lip SyncingEyeMovementsSteeringBreathing30

Virtual Human Toolkit Tutorial Adam Reilly & Wendy Whitcup October 2018. 2 . Unity Natural Language Generation Nonverbal Behavior Understanding. 12 Gunslinger. 13 Gunslinger. 15 Monticello: VR. 16 . Groovy scripting. 26 NPCEditor Novel question: "what do you do here, girls"? Rank