PETEEI: A PET With Evolving Emotional Intelligence

Transcription

PETEEI: A PET with Evolving Emotional IntelligenceThomas R. loergerMagy Seif El-NasrTexas A&M UniversityComputer Science DepartmentCollege Station, TX 77844-3112Texas A&M UniversityComputer Science DepartmentCollege Station, TX 77844-3112ioerger@cs.tamu.edumagys@cs.tamu.eduThe emergenceof what is now called ‘emotional intelligence’ hasrevealedyet another aspectof human intelligence. Emotions havebeen shown to have a major impact on many of our everydayfunctions, including decision-making, planning, communication,and behavior. AI researchershave recently acknowledged thismajor role that emotions play, and thus have beganto incorporatemodels for simulating emotions into agents. However, theemotional processis not a simple process;it is often linked withmany other processes,one of which is learning. It has long beenemphasizedin psychology that memoryand experiencehelp shapethe dynamic nature of the emotional process.In this paper, weintroduce PETEEI (a PET with Evolving Emotional Intelligence).PETEEI is bred on a fuzzy logic model for simulating emotionsin agents, with a particular emphasis on incorporating variouslearning mechanisms so that an agent can adapt its emotionsacrording to its own experience. Additionally, PETEEI isdesigned to recognize and cope with the various moods andemotional responsesof its owner. A user evaluation experimentindicated that simulating the dynamic emotional processthroughlearning provides a significantly more believable agent.1. INTRODUCTIONArtificial Intelligence (AI) and psychology researchershave longbeen concerned with defining intelligence and finding a way tosimulate it. Many theories of intelligence have been formulated,however, very few included emotions. In Howard Gardner’sbook,Frames of the Mind, he describes the concept of MultipleIntelligence [9]. He divided intelligence into six types: linguistic,musical, logical-mathematical, spatial, bodily kinesthetic andpersonal intelligence. Accordingly, a person’s intelligence canvary along these dimensions. Gardner’s theory is importantbecause,by including personal intelligence, it incorporated thesocial and emotional capabilities that people possess,which thenled to the rise of what is called the ‘emotional intelligence’ theory[lo]. The importance of emotions in the theory of humanintelligence has recently been strengthenedthrough neurologicalevidencepresentedby Damasio[4].l%nission to makedigital or hardOr CklSSKKNllLWis @xntedcopies of all or part “fthiswithoutfen: proci&cithatTexas A&M UniversityComputer Science DepartmentCollege Station, TX 77844-3112yen @cs.tamu.eduAs a result, many researcherswithin the agentsand AI field havebegan to develop computational models of emotions. Simulatingemotional intelligence in certain types of computer programsisimportant. Computational models of emotions are very useful tomany applications, including personal assistance applications,training simulations, intelligent interfaces, and entertainmentapplications.Recognizing the importance of the emotional process in thesimulation of life-like characters[6], a number of computationalmodels of emotions have been proposed within the agent’scommunity [33, 231. An important effort in this area is the OZproject [2, 251 at CMU. The OZ project simulates believableemotional and social agents; each agent initially has presetattitudes towards certain objects in the environment. Furthermore,each agent has someinitial goals and a set of strategiesthat it canfollow to achieve each goal. The agent perceives an event in theenvironment. This event is then evaluated according to the agent’sgoals, standardsand attitudes. After an event is evaluated, specialrules are used to produce an emotion with a specific intensity.These rules are based on Ortony et al.‘s event-appraisal model[21]. The emotions triggered are then mapped,according to theirintensity, to a specific behavior. This behavior is expressedin theform of text or animation [25].Even though Oz’s model of emotions has many interestingfeatures,it has somelimitations too. An emotion is triggered onlywhen an event affects the agent’s standards, attitudes or thesuccessof its goals. The notion of partial successand failure ismissing, since typically an event does not cause a goal to fullysucceedor fully fail. Additionally, some emotions, such as hopeand relief, may never be simulated since they rely on expectations.In order for an agent to simulate a believable emotionalexperience, it will have to dynamically change its emotionsthrough its own experience.We have developed a new model foran agent that can simulate this dynamic nature of emotionsthrough learning. In this paper, we will present our model anddiscuss how different learning mechanismsare used to simulatethe dynamic aspect of the emotional process. Additionally, wewill report the results of some user evaluation experiments of asystemwe implementedwith this model, called PETEEI.ABSTRACTIX3SOKllJohn Yen2. COMPUTATIONALEMOTIONIn introducing our model, we will first describe the differentlearning mechanismsthat were implementedin the model. In lightof the learning mechanismsinvolved, we will describe how theyinfluence or facilitate the emotional process. In this way thereaderwill better understandthe specifics of the model.work forcopiesare lmt made or tlistrihutcd for prolit or commercinl a(lva.ntage alld thatWJies bear this notice and the full citation 011 the first page. -r. copy(xhrrwise. to republish. to post on servers or to redistribute to lists,requires prior specific permission &!‘or a fee.AutonornousAgents ‘99 SeattleWA USACopyrightACM 1999I-58113-066-x/99/05. s.0o2.1 LearningWe simulated four different learning mechanisms,which include(1) learning about event sequencesand possible rewards, (2)learning about the user’s actions and moods, (3) learning about9

which actions are pleasing to the user and which actions are not,and (4) the Pavlovian S-R conditioning. These learningmechanisms will be detailed in the next paragraphs, givingexamples of their effects on the emotional process whenappropriate.This algorithm is guaranteedto converge only for deterministicMarkov Decision Processes(MDP)‘. However, in our applicationthere is a one-to-one alternation betweenthe agent and the user. Itis almost impossible to predict the reward given a state and anaction, becauseit may change according to the user’s actions andenvironment. For example, at a statesothe agent may do an actionx and as a consequencethe user gives it a positive reward. At alater time step, the agent is in the samestatesoso it takesaction xagain but this time the user rewards it negatively. Therefore, theuser introduces a non-deterministic response to the agent’sactions. In this case, we treated the reward as a probabilitydistribution over outcomesbasedon the state-actionpair. Figure 1illustrates this processin more detail. The agent starts off in stateso and takes an action a0 then the user can take either action althat puts the agent in state sj, or action a2 that puts the agent instate sZ. The dotted lines show the nondeterminism induced byuser’s actions, while the straight black lines show the state-actiontransition as representedin the table. We use another learningmodel, which will be detailed in the next section, to determinetheprobability of user’s actions.2.1. I Learning about EventsThe agent needsto know what events to expect, how likely theyare to occur and how bad or good they are. This information iscrucial for creating emotions. Consider the following: You are astudent in a classroom.Your professoris giving out exam grades.You know that you did really bad, so you are expecting a badgrade. Your professor gives you your exam back, andsurprisingly, you got a very good grade. Now, let’s look at youremotional state. Before you got the exam, you felt some fear,because your goal of passing the class was threatened by theexpectation of a bad grade. After you got the exam back, yourealized that you got a good grade, and therefore your feargradually changesto relief. As you can see,expectation played avery important role here in creating and further revising youremotions with the eventsthat are occurring around you.The example discussedabove was described in a very simplisticway. As a matter of fact, many factors are influencing youremotional states.When an event occurs or when you think anevent is about to occur, a desirability measure is calculatedrelative to a specific event [21]. This desirability measurewilltypically dependon the event’s impact on a set of goals. It is oftenthe case that a given event does not have any impact on anyspecific goal directly, but somesequenceof events may eventuallyhave an impact on some goals. Thus, in order to measurethedesirability of a specific event, we will need to identify the linkbetween an event or a sequenceof events and the correspondinggoals, which has been noted to be a very complex task [24]. Theagent can potentially learn this link or sequenceof links by usinga reinforcement learning algorithm [l 1, 191; we will brieflyoutline one reinforcement algorithm, namely the Q-learningalgorithm. The readeris referredback to [ 11, 191for more details.It is often the casethat an agent does not know the consequencesof a given action until a complete sequenceof actions ends. Theagent,therefore, facesthe problem of temporal credit assignment,which is defined as determining which of the actions in itssequenceare responsible for producing the eventual rewards. Toillustrate the solution that reinforcement learning offers to thisproblem, we will look at reinforcement learning in more detail.The agent representsthe problem spaceusing a table of Q-valuesin which eachentry correspondsto a state-actionpair. Optimal Qvalues are supposedto representthe expected payoff in the longrun if a specific action is taken in a specific state.The table can beinitially filled with random values. The agent will begin from astate s. It will take an action, a, which takes it to a newstates’ The agent may obtain a reward, r, for its action [ 191.If itreceives a reward, it updatesthe table above using the followingformula:Figure 1. Nondeterministic RewardsWe used a new formula, which was statedin [ 191to calculate theQ-value for a non-markov model. The formula is as follows:Q,(s,o) Elr(s,a)l ? z P ( ‘1 s,n)my Q(s’.a’)I’where p(s’ls,u) is the probability of reaching a new state s’ ifaction a is taken in state s. @ ,a)] is the expected amount ofreward [ 191.At any given time, the agentwill be faced with different actions totake with the possibility of different outcomes and differentrewards. The formula described above gives the maximumexpectedreward given that the agentis at a particular state.Emotions and moodshave a great impact on human decisions andevent evaluation [4, lo]. The mood is thus used in the model toguide the expectation values of the next state, s’ , given that theagent is in a state s. Instead of calculating the Q-value of a stateby maximizing the reward, the mood is used as an averagingfactor for the new states.As noted in [3], when the agent is in apositive mood it will tend to expect positive events to occur, andotherwise it will expect negative eventsto occur. Thus, we refinedthe expectation mechanism to capture this phenomenon. If theagent’s mood is positive then the agent will expect desirableevents with a jl degree more than the undesirable events andvise versa. We modified the equation above to include the pvalue. Thus, the revised formula is as follows:Q(w) t r y max Q(s’,a’) [ 191a’where r is the immediate reward, y is a discount factor, s’is thenew state, and a’is an action from the new states’ Thus, the Qvalue of the previous state-actionpair dependson the Q-value ofthe new state-actionpair.For all the actions that match the mood, the value will beincrementedby a /Idegree, otherwise the previous equation holds.’ A Markov Decision Processrefersto an environment in whichrewards and transition probabilities for each action in each statedependonly on that state,and are independentof the previousactions and states.10

We will take an example to illustrate this idea. For example, theagent is trying to decide between two actions. These actions areillustrated in Figure 2. If he plays with the ball then there is a 70%chancethat it will end up in state,s2,which has a max Q-value of2.5, but there is also a 30% chanceof ending up in state,s,, whichhas a max Q-value of -1.5. While if it ignores the ball then there isan equal chanceof getting to state,s3,which has a max Q-value of-3, or state, s4, which has a max Q-value of 5. Thus, if we useregular probability calculation, action Ignore(Bnl1) will have avalue of 1 and playWith(Bal1) has a value of 1.3, so the agent willplay with the ball. However, if we take the mood into accountthen the calculation will be different, becausethe agent will nolonger expect all outcomesequally. So if for example,the agent isin a good mood, he will expect positive rewards with a degreepmore than the negative rewards. If we set fl to 50%, thenPlayWith(Bal1) will have a value of 2.7 and Ignore(bal1) will havea value of 3. By considering the mood, the agentwill tend to favorIgnoreBall(Ball) over PlayWith(ball), which was consideredmoredesirableusing probability calculation.Q -1.5respectively. If patter , y, z] O, then we calculate theprobability of z to occur using the following formula:P(zlx,y) pattent[x, y. zlc patetix, y.ilOtherwise, if the number of observations is too scarce,then weuse the following formula, conditioned on only one nrior action:cpattem[i,y, zl--pattemtj, Y, kli tOtherwise, we use the following formula, which is just the priorprobability of z:ccPam?U.k,zfUsing this probability, the agent can calculate what actions toexpect from the user, and it can further calculate a desirabilitymeasureusing the Q-table discussed in Section 2.1. Using thedesirability and expectation measures,the agent will be able tocreateemotions,aswe will discusslater.2.1.3 Learning about pleasing & displeasing actionsExternal feedbackwas used to learn what actions are pleasing ordispleasing to the user. In some situations, the user can give theagent an assessmentof his actions by saying someremarks,suchas ‘good boy’ or ‘shameon you.’ Through this learning scheme,the agent will identify the most recent action performed by it asthe action that was evaluated by the user. Temporal creditassignmentis not a problem here; we assumedthat standardsorlearning about pleasing and displeasing actions may in fact rely onone action rather than a sequence of actions that can begeneralized to a one global action. Therefore, we used thislearning schemeto build up the values of the agent using the userfeedbackto refer to most recent action done by the agent.The agent keepsthe user’s remarks with the action and situationthat precededthem. It can then evaluate its own actions in termsof a function: Quality(x, s) v, where x is an action that happenedin a particular situation s, and v is the value given to the actionsituation pair. When given feedback, the agent searches itsknowledge base for the action-situation pair. If they were notfound, a value, v,, is assigned to the new action-situation pair.However, if the action-situation pair was found and the Qualitymatches the Quality that was documented then the pair isreinforced (incremented), otherwise the agent becomesconfusedand decrementsthe pair.The utility of this learning mechanism is two folded. Firstly, itcreatesthe baseline by which the agent can set its own standards.These standardsare then used to determineemotions like pride orshame.In addition, these standardsare used to evaluatethe user’sactions and trigger emotions, including admiration and reproach.The intensities of these emotions are calculated as a function ofthe value, v, of the learnt rule. Secondly, at times the agent mayfeel guilty or may want to please the user to get something.Thistechnique is widely used in relationships, whenever a partner feelsguilty, he/shegoesunder a submissive role; he/she will go out ofhis/her way to do something for the other partner to make up forwhat he/shedid wrong [8].Figure 2. An Example of ReinforcementLearning2.1.2 Learning about UserSince the agent is interacting with the user, the agent will have tolearn about the user’s patterns of actions. In our model, wedevelop a probabilistic model of the user by keeping track ofsequencesof actions that the user takes, which we call patterns.We focusedon patternsof length three, i.e. the user did action al,then action, a2, and finally action, up This conceptcan be furtherillustrated using the pet and the owner example. Suppose theowner goesinto the kitchen (a,), takesout the pet’s food from thecupboard (4 and feeds the pet (a3). These three consecutiveactions led to the pet being fed. Thus, if the owner goes to thekitchen again, the pet would probably expect to be fed to a certaindegree.As it can be seenfrom the example,learning sequencesofactions can be very useful in predicting other people’s actions.Considering the simple behavior of a pet we are trying to model,we felt that conditioning probabilities on the prior two actionswas sufficient.To keeptrack of the different patternsof the user actions, we useda three dimensional table of action counts. For example,if action,aO,is followed by action, a,, followed by action, a2,then the entrypattem[ao,al, az] representsthe count of the action, ao, followedby the action, al, followed by action, a2. Using this table we cancalculate a probability of a given action to occur according to thehistory of actions that had already occurred.For example,supposewe know that actions, x and y, had occurred at times tj and t2,2.1.4 Pavlovian ConditioningAssociating an object directly with an emotion forms yet anothertype of learning. For example, if the agent experiencedpain when11

an object, g, touchesit, then an emotion of pain will be associatedwith the object, g. This kind of learning does not depend on thesituation per se, but it dependson the object-emotion association.This is especially true in the taste aversion paradigms ofPavlovian conditioning, where a rat would associatesweet withpoison after sometrials where the rats were injected with poisonsubsequentto eating sweets[5].Each of these associations will have an accumulator, which isincrementedby the repetition and intensity of the object-emotionoccurrence. This type of learning will provide the agent withanother type of expectation triggered by the object rather than theevent. Using the count and the intensity of the emotion triggered,the agent can calculate the expectedintensity of the emotion. Weused the formula shown below:XIj(e)events(i)(ue,Io) n0where events(i) are events that involve the object o, Ii(e) is theintensity of emotion e in event i, and R, is the total number ofeventsinvolving the object o. In essence,the formula is averagingthe intensity of the emotion in the events where the object, o, wasintroduced.2.2 Simulation of EmotionSo far we have described how learning occurs in the agent’smodel. Although, we have given some examples on how thedifferent learning mechanisms could potentially produceemotions, we did not describethe processof emotion simulationin detail. We will now attemptto detail the emotional processandlink it back to the learning processwhen applicable.Figure 3 depicts the emotional processand it outlines where thefour learning mechanismsare used within the process.The fourlearning mechanismsare shown in the figure through shadedovals. The figure also depicts the information passedthrough thelearning and the emotional model through boxes. As the figureshows, learning about events sequences and rewards/costsassociatedwith each sequenceis used for two purposes.Firstly, itdetermines the desirability measure of the sequence of eventsgiven their rewards and costs. Secondly, it gives the agent theability to predict the likelihood of certain events to occur giventhat a sequenceof events has already occurred. Learning user’sactions also enables the agent to predict, with a certainprobability, the next user action. Thus, if the agentis in situation sat time to and it is expecting n events to occur and y events havealready occurred at time to. According to the two learningmechanisms mentioned above, the agent will produce adesirability measureof the y events and the n expected events.These events along with their desirability measuresare passedtoan emotion generationprocess,as shown in figure 3. Additionally,the expectation measure (or likelihood of occurrence) of the nexpectedevents will have to be passedto the emotion generationprocess.Emotion is generated according to the expectation of certainevents occurrence and the desirability of those events. We usedsome rules to simulate the emotion generation process. Theserules were taken from [21]. For example,joy was describedas theoccurrence of a desirable event, and hope was described as theoccurrenceof an unconfirmed desirableevent. Therefore, as it canbe seen, the emotion of hope will use the expectation anddesirability measure. Hope’s intensity is then calculated as afunction of these measures.On the other hand, joy is calculatedusing the desirability measurealone, but its intensity is calculatedas a function of the desirability and the expectation of this event atthe earlier time step.Additionally, emotions can also be.generatedfrom the Pavlovianconditioning, where a single event or an object is linked inmemory with a single emotion, and thus, if the object or event ispresentedto the agent, the agent will tend to automatically triggerthe emotion associatedwith that object or event.2.3 Agent’s BehaviorThe agent’s behavior depends on its emotional, motivationalstatesand the current situation or the event perceived. We usedfuzzy rules to relate setsof emotions to correspondingbehaviors.For example, consider the following rule:Anger is HighAND Taken-away(dish)behavior is Bark-At(user)THENThe behavior, Bark-At(user), depends on what the user didand the emotional intensity of the agent. If the user did nottake the dish away and the agent was angry for some otherif1Event-emotion 1Figure 3. The Emotional/Learning Processes12

reason, it would not be inclined to bark at the user (although itmight in some situations), because the user might not be the causeof its anger. Thus, it is important to identify both the event and theemotion. It is equally important to identify the cause of the event.In this case, we are assuming that non-environmental events suchas taken-away(dish), throw(ball), etc. are all caused by the user.Furthermore, the agent’s behavior is not solely determined by itsemotional state, but it also depends on some predefined socialnorms that defines and limits the expressiveness of its behavior [1,27]. For example, if the user is in a very bad mood and the user isconsidered a good friend, the agent will not tend to be veryaggressive in its behavior. Hence, if its emotional state isdominated by anger, and the rules described above suggests anaggressive behavior, this anger and aggressive behavior will besuppressed. Moreover, if the user is perceived as happy and theagent needs food or drink, then the agent will go into a submissiverole and will act on actions that it learned to be pleasing to theuser, as previously mentioned.These rules of behavioral or social norms are preset. Typically,these preset behavioral rules will tend to shape the personality ofthe agent. Therefore, by incorporating personality and selfpresentation, the agent will tend to change these preset rules.Nevertheless, we kept this issue as a future enhancement.implemented a third version where PETEEI simulated emotionalresponses using the learning mechanisms discussed in Section 2.Model 2 and 3 were used to assess the importance of learning inthe believability of the emotional responses.Twenty-one users were recruited to evaluate the model and thehypothesis involved. The users were chosen from the first yearundergraduate class. This choice was made to decrease the biasinflicted by specialized background knowledge. Users met withthe principle investigator for a period of two and a half-hours.During this time, they ran the different simulations using tenscripted scenarios and answered a questionnaire.3. SIMULATION AND RESULTSIn this section, we will describe how the model, presented in theprevious section, was actually implemented. We chose to modelemotions in a pet, because pets are simpler than humans. They donot require sophisticated planning and their goal structure is muchsimpler than humans’. Furthermore, to evaluate the believabilityof an agent’s behavior, one will have to appeal to users’judgments, and thus it would be better to model a pet rather thansome other creature, because most people have some expectationsabout what a pet can or cannot do.PETEEI was designed to have the look and feel of a typical RolePlaying Game interface. It has five major scenes: a garden, abedroom, a kitchen, a wardrobe and a living room. The gardenscene is illustrated in Figure 4. The user can interact with the petthrough various actions, including (1) introducing objects to thepet’s environment, (2) taking objects away from the pet’senvironment, (3) hitting objects, including the pet, and (4) talkingaloud.Section 2 emphasized the role of learning in building the dynamicnature of emotional responses. To evaluate this claim, we builtthree models. In the first model, PETEEI was configured toproduce a random set of emotions and random behaviors. Thisexperiment was essential to provide a baseline for the otherexperiments. Sometimes users could be impressed by the graphicsand sound; and the fact that the pet in the pictures reacted to theuser’s actions might prompt some users to answer some questionswith positive feedback. A second model was implemented wherePETEEI simulated emotions with no learning. This model wasimplemented by using constant expectations. Thus, emotions weretriggered using the desirability degree which depends on thepreset goals and standards of the agent. Even though this model issufficient to invoke some emotions, such as anger, sadness or joy,the model cannot generate emotions such as fear or hope.Additionally, the agent’s emotional state in this model will varyonly according to the situation and time spectrum. On the otherhand, by adding the learning model to the agent’s emotionalprocess, the emotional states of the agents will be changingaccording to the situation, time and experience. We thusFigure 4. Model’s InterfaceTo explore how the users perceived the learning component ineach model, four questions were presented. Question A asked theuser to rate PETEEI’s learning about the user, question B askedthe user to rate PETEEI’s learning about the environment,question C asked the user to rate PETEEI’s learning about goodor bad actions, and finally question D asked the user to ratePETEEI’s overall learning ability. The users were asked to answerthe questions and rate PETEEI’s abilities in a scale from 1 to 10,where 1 is worst and 10 is best. The questions were as follows:A.A subject x can learn about the other people’s behaviorto know what to expect and from whom. Do you thinkPETEEI learns about you?B. A subject could learn more about the environment toplan for his/her actions. Do you think that PETEEIlearns about its environment?C. A subject could learn how to evaluate some actions asgood or bad according to what people say or accordingto his own beliefs. Do you think that PETEEI learnsabout good and bad actions?D. Overall how would you rate PETEEI’s learning usingthe four criteria listed above?Answers were collected for the questions above and an averagewas calculated for each question, shown in the Table 1 as themean. We also calculated the confidence interval for a 95%confidence with a sample size of 21. This measure was calculatedusing the following formula:where X is the mean, the ois the variance and n is the size of thesample which is 21 in this case. The confidence interval is shownin the table. As it can be seen, the numbers for all the modelswithout the learning component, namely models 1 and 2, were

between zero and two. There were some differences that can benoted from the table among these two models. However, thedifference is small and the confidence levels overlap greatly,leading us to believe that users did not really perceive learning inthese models. In contrast, the learning model (model 3) wasobserved to significantly increase the learning perceived in allquestions.Table 1. Learning RatingsModel 1123Question Ameanintervalo-o-0.1-1.17.3-8.100.57.7IQuestion Bmeaninterval00.35.95o-o-0.2-0.94.7-7.1The users were askedto rate the behavior of the pet in terms ofhow convincing it is. The userswere told not to rate it in terms ofanimation or facial expression, but to concentrate on thebehavioral and emotional aspect. We analyzed users’ answers.The results are shown in Table 2.Table 2. Behavior RatingsAs the table shows, the random model (model 1) did not convey arealistic or convincing behavior to the user since the rating was onaverage one. In contrast, the introduction of the nonlearningmodel (model 2) improved this measureby about 3.33 units onaverage. The learning model (model 3) further improved thismeasureby 2.6 to get an averageof 8.1 units.For a complete review of the protocol and questions asked, thereader is referred to [28]. The results discussedabove confirmedour earlier hypothesis that simulating the dynamic emotionalprocessprovides a significantly morebelievable agent.members. Clark Elliot summarizedcurrent efforts in human-likeagent simulations, some of which simulated characters withdifferent personalities and different emotions [6]. Therefore,modeling emotions and personality may provide a good startingpoint for teamtraining applications.Through many studies in social psychol

Artificial Intelligence (AI) and psychology researchers have long been concerned with defining intelligence and finding a way to simulate it. Many theories of intelligence have been formulated, however, very few included emotions. In Howard Gardner's book, Frames of the Mind, he describes the concept of Multiple Intelligence [9].