Bayesian Learning Theory Applied To Human Cognition PDF Free Download

1y ago

21 Views

1 Downloads

203.07 KB

14 Pages

Report/dmca

Download PDF

Transcription

OverviewBayesian learning theory appliedto human cognitionRobert A. Jacobs1 and John K. Kruschke2Probabilistic models based on Bayes’ rule are an increasingly popular approach tounderstanding human cognition. Bayesian models allow immense representationallatitude and complexity. Because they use normative Bayesian mathematics toprocess those representations, they define optimal performance on a given task.This article focuses on key mechanisms of Bayesian information processing, andprovides numerous examples illustrating Bayesian approaches to the study ofhuman cognition. We start by providing an overview of Bayesian modeling andBayesian networks. We then describe three types of information processingoperations—inference, parameter learning, and structure learning—in bothBayesian networks and human cognition. This is followed by a discussion ofthe important roles of prior knowledge and of active learning. We conclude byoutlining some challenges for Bayesian models of human cognition that will needto be addressed by future research. 2010 John Wiley & Sons, Ltd. WIREs Cogn Sci 2011 28–21 DOI: 10.1002/wcs.80INTRODUCTIONComputational modeling of human cognition hasfocused on a series of different formalisms overrecent decades. In the 1970s, production systemswere considered a methodology that would unitethe studies of human and machine intelligence. Inthe 1980s and 1990s, connectionist networks werethought to be a key to understanding how cognitiveinformation processing is performed by biologicalnervous systems. These formalisms continue to yieldvaluable insights into the processes of cognition. Sincethe 1990s, however, probabilistic models based onBayes’ rule have become increasingly popular, perhaps even dominant in the field of cognitive science.Importantly, Bayesian modeling provides a unifyingframework that has made important contributionsto our understanding of nearly all areas of cognition,including perception, language, motor control,reasoning, learning, memory, and development.In this article, we describe some advantagesoffered by Bayesian modeling relative to previousformalisms. In particular, Bayesian models allow Correspondenceto: robbie@bcs.rochester.edu1 Departmentof Brain and Cognitive Sciences, University ofRochester, Rochester, NY, USA2 Departmentof Psychological and Brain Sciences, Indiana University, Bloomington, IN, USADOI: 10.1002/wcs.808immense representational latitude and complexity,and use normative Bayesian mathematics to processthose representations. This representational complexity contrasts with the relative simplicity of nodes in aconnectionist network, or if-then rules in a productionsystem. In this article, we focus on key mechanismsof Bayesian information processing, and we provideexamples illustrating Bayesian approaches to humancognition. The article is organized as follows. Westart by providing an overview of Bayesian modeling and Bayesian networks. Next, we describe threetypes of information processing operations found inboth Bayesian networks and human cognition. Wethen discuss the important role of prior knowledge inBayesian models. Finally, we describe how Bayesianmodels naturally address active learning, which is abehavior that other formalisms may not address sotransparently.BAYESIAN MODELINGAre people rational? This is a complex questionwhose answer depends on many factors, includingthe task under consideration and the definitionof the word ‘rational’. A common observation ofcognitive scientists is that we live in an uncertainworld, and rational behavior depends on the abilityto process information effectively despite ambiguityor uncertainty. Cognitive scientists, therefore, need 2010 Jo h n Wiley & So n s, L td.Vo lu me 2, Jan u ary /Febru ary 2011

WIREs Cognitive ScienceBayesian learning theorymethods for characterizing information and theuncertainty in that information. Fortunately, suchmethods are available—probability theory provides acalculus for representing and manipulating uncertaininformation. An advantage of Bayesian modelsrelative to many other types of models is that theyare probabilistic.Probability theory does not provide just anycalculus for representing and manipulating uncertain information, it provides an optimal calculus.1Consequently, an advantage of Bayesian modeling isthat it gives cognitive scientists a tool for definingrationality. Using Bayes’ rule, Bayesian models optimally combine information based on prior beliefs withinformation based on observations or data. UsingBayesian decision theory, Bayesian models can usethese combinations to choose actions that maximizethe task performance. Owing to these optimal properties, Bayesian models perform a task as well as the taskcan be performed, meaning that the performance ofa Bayesian model on a task defines rational behaviorfor that task.Of course, the performance of a model dependson how it represents prior beliefs, observations, andtask goals. That is, the representational assumptionsof a model influence its performance. It is importantto keep in mind that any task can be modeledin multiple ways, each using a different set ofassumptions. One model may assume the use of certaindimensional perceptual representations which arecombined linearly into a probabilistic choice, whereasanother model may assume featural representationscombined conjunctively into a probabilistic choice.But for any specific probabilistic formalization of atask, a Bayesian model specifies optimal performancegiven the set of assumptions made by the model.This fact leads to an additional advantage ofBayesian modeling relative to many other approaches,namely that the assumptions underlying Bayesianmodels are often explicitly stated as well-definedmathematical expressions and, thus, easy to examine, evaluate, and modify. Indeed, a key reason forusing the Bayesian modeling formalism is that, relative to many other computational formalisms, itallows cognitive scientists to more easily study theadvantages and disadvantages of different assumptions. (Admittedly, this property is also possessed byother formalisms, especially in the mathematical psychology tradition, that have rigorous mathematical orprobabilistic interpretations; see, e.g., Busemeyer andDiederich,2 and many examples in Scarborough andSternberg.3 But such mathematically explicit modelsform only a subset of the spectrum of computationalmodels in cognitive science; cf. Refs 4 and 5.) ThroughVo lu me 2, Jan u ary /Febru ary 2011this study, scientists can ask questions about the natureor structure of a task (Marr6 referred to the analysisof the structure of a task as a ‘computational theory’),such as: What are the variables that need to be takeninto account, the problems that need to be solved, andthe goals that need to be achieved in order to performa task?; Which of these problems or goals are easy orhard?; and Which assumptions are useful, necessary,and/or sufficient for performance of the task?Let us assume that the performance of aperson on a task is measured, a Bayesian model isapplied to the same task, and the performances ofthe person and the model are equal. This providesimportant information to a cognitive scientist becauseit provides the scientist with an explanation for theperson’s behavior—the person is behaving optimallybecause he or she is using and combining all relevantinformation about the task in an optimal manner. Inaddition, this result supports (but does not prove) thehypothesis that the assumptions used by the modelabout prior beliefs, observations, and task goals mayalso be used by the person.Alternatively, suppose that the performance ofthe model exceeds that of the person. This resultalso provides useful information. It indicates that theperson is not using all relevant information or notcombining this information in an optimal way. Thatis, it suggests that there are cognitive ‘bottlenecks’preventing the person from performing better.Further experimentation can attempt to identify thesebottlenecks, and training can try to ameliorate orremove these bottlenecks. Possible bottlenecks mightinclude cognitive capacity limitations such as limitson the size of working memory or on the quantity ofattentional resources. Bottlenecks might also includecomputational limitations. Bayesian models oftenperform complex computations in high-dimensionalspaces. It may be that people are incapable ofthese complex computations and, thus, incapable ofperforming Bayesian calculations. Later in this article,we will address the possibility that people are not trulyBayesian, but only approximately Bayesian.If a cognitive scientist hypothesizes that aperson’s performance on a task is suboptimal becauseof a particular bottleneck, then the scientist candevelop a new model that also contains the positedbottleneck. For instance, if a scientist believes that aperson performs suboptimally because of an inabilityto consider stimuli that occurred far in the past, thescientist can study a model that only uses recentinputs. Identical performances by the new model andthe person lend support to the idea that the person,like the model, is only using recent inputs. 2010 Jo h n Wiley & So n s, L td.9

Overviewwires.wiley.com/cogsciFinally, suppose that the performance of theperson exceeds that of the model—that is, the person’sperformance exceeds the optimal performance definedby the model—then once again this is a useful result. Itsuggests that the person is using information sourcesor assumptions that are not currently part of themodel. For instance, if a model only considers theprevious word when predicting the next word in asentence, then a person who outperforms the model islikely using more information (e.g., several previouswords) when he or she makes a prediction. A cognitivescientist may consider a new model with additionalinputs. As before, identical performances by the newmodel and the person lend support to the idea thatthe person, like the model, is using these additionalinputs.In some sense, the Bayesian approach to thestudy of human cognition might seem odd. It is basedon an analysis of what level of task performanceis achievable given a set of assumptions. However,it does not make a strong statement about how aperson achieves that task performance—which mental representations and algorithms the person useswhile performing a task. Bayesian models are notintended to provide mechanistic or process accountsof cognition.6,7 For cognitive scientists interested inmechanistic accounts, a Bayesian model often suggests possible hypotheses about representations andalgorithms that a person might use while performinga task, but these hypotheses need to be evaluated byother means. Similarly, if a person performs suboptimally relative to a Bayesian model, the model maysuggest hypotheses as to which underlying mechanisms are at fault, but these hypotheses are only starting points for further investigation. Consequently,Bayesian modeling complements, but does not supplant, the use of experimental and other theoreticalmethodologies.For all the reasons outlined here, Bayesianmodeling has become increasingly important in thefield of cognitive science.6–16 Rather than describingparticular Bayesian models in depth, a main focusof this article is on information processing operations—inference, parameter learning, and structurelearning—found in Bayesian models and human cognition. Before turning to these operations, however,we first describe Bayesian networks, a formalismthat makes these operations particularly easy tounderstand.BAYESIAN NETWORKSWhen a theorist develops a mental model of acognitive domain, it is necessary to first identify10the variables that must be taken into account. Thedomain can then be characterized through the jointprobability distribution of these variables. Manydifferent information processing operations can becarried out by manipulating this distribution.Although conceptually appealing, joint distributions are often impractical to work with directlybecause real-world domains contain many potentiallyrelevant variables, meaning that joint distributions canbe high-dimensional. If a domain contains 100 variables, then the joint distribution is 100-dimensional.Hopefully, it is the case that some variables are independent (or conditionally independent given the valuesof other variables) and, thus, the joint distribution canbe factored into a product of a small number ofconditional distributions where each conditional distribution is relatively low-dimensional. If so, then it ismore computationally efficient to work with the smallset of low-dimensional conditional distributions thanwith the high-dimensional joint distribution.Bayesian networks have become popular inartificial intelligence and cognitive science becausethey graphically express the factorization of a jointdistribution.17–19 A network contains nodes, edges,and probability distributions. Each node correspondsto a variable. Each edge corresponds to a relationshipbetween variables. Edges go from ‘parent’ variablesto ‘child’ variables, thereby indicating that the valuesof the parent variables directly influence the valuesof the child variables. Each conditional probabilitydistribution provides the probability of a child variabletaking a particular value given the values of its parentvariables. The joint distribution of all variables isequal to the product of the conditional distributions.For example, we assume that the joint distribution ofvariables A, B, C, D, E, F, and G can be factored asfollows:p(A, B, C, D, E, F, G) p(A)p(B)p(C A)p(D A, B)p(E B)p(F C)p(G D, E) (1)Then the Bayesian network in Figure 1 represents thisjoint distribution.The parameters of a Bayesian network are theparameters underlying the conditional probability distributions. For example, suppose that the variables ofthe network in Figure 1 are real-valued, and supposethat each variable is distributed according to a normal distribution whose mean is equal to the weightedsum of its parent’s values (plus a bias weight) andwhose variance is a fixed constant [In other words,the conditional distribution of X given the valuesof its parents is a Normal distribution whose 2mean isi pa(X) wXi Vi wXb and whose variance is σX where 2010 Jo h n Wiley & So n s, L td.Vo lu me 2, Jan u ary /Febru ary 2011

WIREs Cognitive ScienceACp(C A)Fp(F C)Bayesian learning theoryp(A)BDp(D A, B)p(B)p(C)Ep(E B)RainSprinklerWet grassGp(G D, E)distribution of variables A, B, C, D, E, F, and G. A node represents thevalue of the variable it contains. Arrows impinging on a node indicatewhat other variables the value of the node depends on. The conditionalprobability besides each node expresses mathematically what thearrows express graphically.X is a variable, i indexes the variables that are parentsof X, Vi is the value of variable i, wXi is a weight relating the value of variable i to X, and wXb is a bias weightfor X. In Figure 1, for instance, A N(wAb , σA2 ), C N(wCA A wCb , σC2 ), and D N(wDA A wDB B wDb , σD2 ).] Then the weights would be the network’s parameters.INFERENCEIf the values of some variables are observed, then thesedata can be used to update our beliefs about othervariables. This process is referred to as ‘inference’,and can be carried out using Bayes’ rule. For example,suppose that the values of F and G are observed,and we would like to update our beliefs about theunobserved values A, B, C, D, and E. Using Bayes’rule:p(A, B, C, D, E F, G)p(F, G A, B, C, D, E)p(A, B, C, D, E) p(F, G)p(R C)p(W S, R)FIGURE 2 Bayesian network characterizing a domain with fourFIGURE 1 Bayesian network representing the joint probability(2)where p(A, B, C, D, E) is the prior probability ofA, B, C, D, and E, p(A, B, C, D, E F, G) is the posteriorprobability of these variables given data F and G,p(F, G A, B, C, D, E) is the probability of the observeddata (called the likelihood function of A, B, C, D, andE), and p(F, G) is a normalization term referred to asthe evidence.Inference often requires marginalization. Suppose we are only interested in updating our beliefsabout A and B (i.e., we do not care about the valuesVo lu me 2, Jan u ary /Febru ary 2011p(S C)Cloudybinary variables indicating whether it is cloudy, the sprinkler wasrecently on, it recently rained, and the grass is wet.of C, D, and E), this can be achieved by ‘integratingout’ the irrelevant variables: p(A, B F, G) p(A, B, C, D, E F, G)dCdDdE(3)In some cases, inference can be carried out ina computationally efficient manner using a localmessage-passing algorithm.18 In other cases, inferenceis computationally expensive, and approximationtechniques, such as Markov chain Monte Carlosampling, may be needed.20Is Bayesian inference relevant to human cognition? We think that the answer is yes. To motivatethis answer, we first consider a pattern of causal reasoning referred to as ‘explaining away’. Explainingaway is an instance of the ‘logic of exoneration’ (e.g.,if one suspect confesses to a crime, then unaffiliatedsuspects are exonerated). In general, increasing thebelievability of some hypotheses necessarily decreasesthe believability of others.The Bayesian network illustrated in Figure 2characterizes a domain with four binary variablesindicating whether it is cloudy, whether the sprinklerwas recently on, whether it recently rained, andwhether the grass is wet.18,19 If the weather is cloudy(denoted C 1), then there is a low probabilitythat the sprinkler was recently on (S 1) and ahigh probability that it recently rained (R 1). Ifthe weather is not cloudy, then there is a moderateprobability that the sprinkler was recently on and alow probability that it recently rained. Finally, if thesprinkler was recently on, rain fell, or both, then thereis a high probability that the grass is wet (W 1).You walk outside and discover that the grass iswet and that the sprinkler is on. What, if anything,can you conclude about whether it rained recently? Anintuitive pattern of reasoning, and one that seems to 2010 Jo h n Wiley & So n s, L td.11

Overviewwires.wiley.com/cogscibe exhibited by people, is to conclude that the waterfrom the sprinkler wet the grass and, thus, there isno good evidence suggesting that it rained recently.This pattern is called explaining away because theobservation that the sprinkler is on explains the factthat the grass is wet, meaning that there is no reasonto hypothesize another cause, such as recent rain, forwhy the grass is wet. This type of causal reasoningnaturally emerges from Bayes’ rule. Without goinginto the mathematical details, a comparison of theprobability of recent rain given that the grass is wet,p(R 1 W 1), with the probability of recent raingiven that the grass is wet and that the sprinkleris on, p(R 1 W 1, S 1), would show that thelatter value is significantly smaller. Thus, Bayes’ ruleperforms explaining away.Explaining away illustrates an important advantage of Bayesian statistics, namely, that Bayesianmodels maintain and update probabilities for all possible values of their variables. Because probabilitydistributions must sum or integrate to one, if somevalues become more likely, then other values mustbecome less likely. This is true even for variableswhose values are not directly specified in a data set.In the example above, the variables S and R are negatively correlated given the value of C. (If it is raining,it is unlikely that a sprinkler will be on. Similarly, if asprinkler is on, it is unlikely to be raining.) Therefore,if a Bayesian observer discovers that the grass is wetand that the sprinkler is on, the observer can reasonably accept the hypothesis that the water from thesprinkler wet the grass. In doing so, the observer simultaneously rejects the hypothesis that it rained recentlyand the rain water wet the grass, despite the fact thatthe observer did not directly obtain any informationabout whether it rained. Thus, a Bayesian observercan simultaneously maintain and update probabilitiesfor multiple competing hypotheses.Our scenario with sprinklers, rain, and wet grassprovides a simple example of Bayesian inference, butcognitive scientists have studied more complicatedexamples. Often these examples involve predictionand, thus, inference and prediction are closely related.Calvert et al.21 found that auditory cortex innormal hearing individuals became activated whenthese individuals viewed facial movements associatedwith speech (e.g., lipreading) in the absence of auditoryspeech sounds. It is as if the individuals used theirvisual percepts to predict or infer what their auditorypercepts would have been if auditory stimuli werepresent; that is, they computed p(auditory percept visual percept).Pirog Revill et al.22 used brain imaging to showthat people predicted the semantic properties of a12word while lexical competition was in process andbefore the word was fully recognized. For example,the speech input /can/ is consistent with the wordscan, candy, and candle and, thus, can, candy, andcandle are active lexical competitors when /can/ isheard. These investigators found that a neural regiontypically associated with visual motion processingbecame more activated when motion words wereheard than when nonmotion words were heard.Importantly, when nonmotion words were heard, theactivation of this region was modulated by whetherthere was a lexical competitor that was a motionword rather than another nonmotion word. It is asif the individuals used the on-going speech input topredict the meaning of the current word as the speechunfolded over time; i.e., they computed p(semanticfeatures on-going speech percept).Blakemore et al.23 used inference to explainwhy individuals cannot tickle themselves. A ‘forwardmodel’ is a mental model that predicts the sensoryconsequences of a movement based on a motorcommand. These investigators hypothesized that whena movement is self-produced, its sensory consequencescan be accurately predicted by a forward model; i.e.,a person computes p(sensory consequences motormovement), and this prediction is used to attenuatethe sensory effects of the movement. In this way, thesensory effects of self-tickling are dulled.Although now there is much experimentalevidence for perceptual inference, the functional roleof inference is not always obvious. For example, whyis it important to predict an auditory percept based ona visual percept? In the section below titled ParameterLearning, we review the hypothesis that inference mayplay an important role in learning.The examples above provide evidence thatpeople infer the values of unobserved variables basedon the values of observed variables. However, they donot show that this inference is quantitatively consistentwith the use of Bayes’ rule. Evidence suggestingthat people’s inferences are indeed quantitativelyconsistent with Bayes’ rule comes from the study ofsensory integration. The Bayesian network in Figure 3characterizes a mental model of an observer whoboth sees and touches the objects in an environment.The nodes labeled ‘scene variables’ are the observer’sinternal variables for representing all conceivablethree-dimensional scenes. As a matter of notation, letS denote the scene variables. Based on the values of thescene variables, haptic feature variables, denoted FH ,and visual feature variables, denoted FV , are assignedvalues. For instance, a scene with a coffee mug givesrise to both haptic features, such as curvature andsmoothness, and visual features, such as curvature 2010 Jo h n Wiley & So n s, L td.Vo lu me 2, Jan u ary /Febru ary 2011

WIREs Cognitive ScienceBayesian learning theoryScenevariablesFIGURE 3 Bayesian network characterizing adomain in which an observer both sees and touchesthe objects in an environment. At the top of thehierarchy, the values of scene variables determinethe probabilities of distal haptic and visual features.The distal haptic and visual features in turndetermine the probabilities of values of proximalhaptic and visual input (sensory) variables.Hapticfeaturevariablesp(FH S)p(FV S)VisualfeaturevariablesHapticinputvariablesp(ΙH FH)p(ΙV FV)Visualinputvariablesand color. The haptic features influence the valuesof the haptic input variables, denoted IH , when theobserver touches the mug. Similarly, the visual featuresinfluence the values of the visual input variables,denoted IV , when the observer views the mug.The values of the input variables are ‘visible’because the observer directly obtains these values aspercepts arising through touch and sight. However,the feature and scene variables are not directlyobservable and, thus, are regarded as hidden orlatent. The distribution of the latent variables may becomputed by the observer from the values of the visiblevariables using Bayesian inference. For instance, basedon the values of the haptic and visual input variables,the observer may want to infer the properties of thescene. That is, the observer may want to computep(S IH , IV ).To illustrate how an observer might computethis distribution, we consider a specific instance ofsensory integration. We suppose that an observer bothsees and grasps a coffee mug, and wants to infer thedepth of the mug (i.e., the distance from the front ofthe mug to its rear). Also we can suppose that theobserver’s belief about the depth of the mug givenits visual input has a normal distribution, denotedN(µv , σv2 ). Similarly, the observer’s belief about thedepth given its haptic input has a normal distribution,denoted N(µh , σh2 ). Then, given certain mathematicalassumptions, it is easy to show (as derived in manypublications; e.g., Ref 24) that the belief about thedepth, given both inputs, has a normal distributionwhose mean is a linear combination of µv and µh :µv,h σv 2µ 2 vσv 2 σhσh 2σv 2 σh 2µh(4)and whose variance is given by:111 2 22σσv,hσhvVo lu me 2, Jan u ary /Febru ary 2011p(S)(5)At an intuitive level, this form of sensoryintegration is appealing for several reasons. First,the variance of the depth distribution based on both2inputs σv,his always less than the variances basedon the individual inputs σv2 and σh2 , meaning thatdepth estimates based on both inputs are more precisethan estimates based on individual inputs. Second, thisform of sensory integration uses information to theextent that this information is reliable or precise. Thisidea is illustrated in Figure 4. In the top panel, thedistributions of depth given visual inputs and hapticinputs have equal variances (σv2 σh2 ). That is, theinputs are equally precise indicators of depth. Themean of the depth distribution based on both inputsis, therefore, an equally weighted average of the meansbased on the individual inputs. In the bottom panel,however, the variance of the distribution based onvision is smaller than the variance based on haptics(σv2 σh2 ), meaning that vision is a more preciseindicator of depth. In this case, the mean of depthbased on both inputs is also a weighted average of themeans based on the individual inputs, but now theweight assigned to the visual mean is large and theweight assigned to the haptic mean is small.Do human observers perform sensory integration in a manner consistent with this statisticallyoptimal and intuitively appealing framework? Several studies have now shown that, in many cases,the answer is yes. Ernst and Banks25 recorded subjects’ judgments of the height of a block when theysaw the block, when they grasped the block, andwhen they both saw and grasped the block. Theseinvestigators found that the framework accuratelypredicted subjects’ multisensory judgments based ontheir unisensory judgments. This was true when visualsignals were corrupted by small amounts of noise, inwhich case the visual signal was more reliable, and alsowhen visual signals were corrupted by large amountsof noise, in which case the haptic signal was morereliable. Knill and Saunders26 used the frameworkto predict subjects’ judgments of surface slant when 2010 Jo h n Wiley & So n s, L td.13

OverviewOptimal depthestimate givenboth signalsProbabilityProbability ofdepth givenvisual signalwires.wiley.com/cogsciProbability ofdepth givenhaptic signalProbabilityDepthProbability ofdepth givenvisual signalOptimal depthestimate givenboth signalsProbability ofdepth givenhaptic signalDepthFIGURE 4 Bayesian model of sensory integration. (Top) A situationin which visual and haptic percepts are equally good indicators ofdepth. (Bottom) A situation in which the visual percept is a morereliable indicator of depth.surfaces were defined by visual stereo and texture cuesbased on their slant judgments when surfaces weredefined by just the stereo cue or just the texture cue.It was found that the framework provided accuratepredictions when surface slants were small, meaning that stereo was a more reliable cue, and whenslants were large, meaning that texture was a morereliable cue. Other studies showing that people’s multisensory percepts are consistent with the frameworkinclude Alais and Burr,27 Battaglia et al.,28 Ghahramani et al.,29 Jacobs,30 Körding and Wolpert,31 Landyet al.,32 Maloney and Landy,33 and Young et al.34PARAMETER LEARNINGParameter learning occurs when one of the conditionaldistributions inside a Bayesian network is adapted.For example, in the Bayesian network on sensoryintegration in Figure 3, let us suppose that anobserver’s probability distribution of visual inputs,given its internal representation of the visual features,is a normal distribution whose mean is equal toa weighted sum of the values of the visual featurevariables. If we use Bayes’ rule to update the posteriordistribution of the weight values based on new data,then this would be an instance of parameter learning.Do people perform parameter learning inthe same manner as Bayesian networks? Variousaspects of human learning are captured by Bayesianmodels. Perhaps the simplest example is a behavioral14p

Importantly, Bayesian modeling provides a unifying framework that has made important contributions to our understanding of nearly all areas of cognition, including perception, language, motor control, reasoning, learning, memory, and development. In this article, we describe some advantages offered by Bayesian modeling relative to previous .