Bayesian Reasoning And Artificial Intelligence

Transcription

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATIONDOI: 10.37394/232010.2020.17.12Michael Gr. VoskoglouBayesian Reasoning and Artificial IntelligenceMICHAEL GR. VOSKOGLOUMathematical Sciences, School of Technological ApplicationsUniversity of Peloponnesus (ex T. E. I. of Western Greece}Meg. Alexandrou 1 – 26334 PatrasGREECEAbstract: - The present article studies the importance of Bayesian Reasoning in everyday lifesituations and for the whole science. Examples are also given to illustrate our results.Key-Words: - Bayes’ Theorem, Bayesian Reasoning, Artificial Intelligence (AI), ScientificMethod, Trial and Error.Received: March 3, 2020. Revised: August 16, 2020. Accepted: August 31, 2020.Published: September 10, 2020.result, they are tackling effectively only thecases of the existing in real worlduncertainty due to randomness and notthose due to imprecision. In cases ofimprecision, the Zadeh’s Fuzzy Logic (FL)comes to bridge the existing gap [5, 6].However, as we shall see in the nextsection, although probabilities have beendefined and developed on the basis ofprinciples of the bivalent logic, theBayesian rule calculating the value of theconditional probabilities, introduces a kindof multi-valued logic tackling the existingdue to imprecision uncertainty in a wayanalogous to fuzzy logic!The present work focuses on illustratingthe importance of Bayesian reasoning toeveryday life and science and in extensionto AI. The rest of the article is formulated asfollows: The Bayes’ rule is presented innext section and its importance for AI isjustified. The third section includesapplications of this rule to everyday lifesituations. In fourth section, the argumentthat the whole science could be consideredas a Bayesian process is discussed and thearticle closes with the general conclusionpresented in section five.1. IntroductionArtificial Intelligence (AI) is the branch ofComputer Science that focuses on thetheory and practice of creating intelligentmachines, which work and react likehumans. The term AI was first coined byJohn McCarthy in 1956, when he heldthe first academic conference on thesubject. in Dartmouth college, USA [1].However, the journey to understand ifmachines can truly think began muchearlier; e.g. Alan Turin’s universal machinein 1936 [2].AIhasrootsinmathematics,engineering, technology and science and asa synthesis of ideas from all those fieldshas created a new situation that is onlyjust beginning to generate enormouschanges and benefits to the human society.Probability t h e o r y i s o n e o f themain mathematical tools used in AIapplications. Edwin T. Jaynes (1922-1998),Professor of Physics at the University ofWashington, was the first who argued thatProbability theory could be considered as ageneralization of the bivalent logic reducingto it in the special case where ourhypothesis is either absolutely true orabsolutely false [3]. Many eminentscientists have been inspired by the ideas ofJanes, like the expert in AlgebraicGeometry David Mumford, who believesthat Probability and Statistics are emergingas a better way for building scientificmodels [4].Nevertheless, both Probability andStatistics have been developed on the basisof the principles of the bivalent logic. As aE-ISSN: 2224-34102. The Bayes’ TheoremLet A and B be two intersecting events.Then it is straightforward to check [7, 8]that the conditional probability for the eventA to happen when the event B has alreadyhappened is calculated byP(A/B) 92P(A B)P(B)(1).Volume 17, 2020

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATIONDOI: 10.37394/232010.2020.17.12The Bayes’ rule was first appeared in thework “An Essay towards a Problem in theDoctrine of Chances” of the 18th centuryBritish mathematician and theologianThomas Bayes (Figure 1).In case of finite sample spaces ,forexample, with equally probable singletonevents, the mathematical definition ofprobability gives that P(A/B) NA B : NB,where NA B and NB denote the numbers ofappearance of the events A B and Brespectively .Therefore, if N is thecardinality of the sample space of B, thenP(A/B) (NA B : N) : (NB : N), which proves(1).In the same way one finds thatP(B/A) Michael Gr. VoskoglouP(A B) or P(A B) P(B/A) P(A).P(A)Therefore (1) can be written in the formP(A/B) P(B/A)P(A)P(B)Figure 1: Thomas Bayes (1701-1761)(2).This essay was published by RichardPrice in 1763, after the Bayes’ death, in the“Philosophical Transactions of the RoyalSociety of London”. The famous y from Bayes, pioneered andpopularized the Bayesian probabilities. TheBayes’ theorem is frequently used togetherwith the theorem of total probability [7] forthe solution of more composite problems(e.g. see Example 5 of the next section).In general, although the Bayes’ rule is asimple consequence of the equationcalculating the value of a conditionalprobability, Bayesian reasoning has beenproved to be very important to everyday lifesituations [9] and for the whole science aswell [10]. Recent researches give evidencethat even the mechanisms under which thehuman brain works are Bayesian [11]!Consequently, Bayesian reasoning is veryuseful for Machine Learning, the sector ofAI focusing on the design and constructionof machines that mimic the humanbehavior.In fact, the smart machines ofAI are supplied with Bayesian algorithms inorder to be able to recognize thecorresponding structures and to makeautonomous decisions. The physicist andNobel prize winner John Mather was one ofthe first who expressed his uneasiness aboutthe possibility that the Bayesian machinescould become too smart, so that to makehumans to look useless [12]!Sir Harold Jeffreys (1891-1989), aBritish mathematician who introduced theconcept of the Bayesian algorithm andplayed an important role in the revival ofthe Bayesian view of probability, hassuccessfully characterized the Bayesian ruleEquation (2), which calculates theconditional probability P(A/B) with the helpof the inverse in time conditionalprobability P(B/A), the prior probabilityP(A}and the posterior probability P(B), isknown as the Bayes’ theorem (or rule, orlaw).In other words, the Bayes’ theoremcalculates the probability of an event basedon prior knowledge of conditions related tothat event. However, when applied inpractice, the Bayes’ theorem may haveseveral interpretations.In social sciences, for example, itdescribes how a degree of belief expressedas a probability P(A) is rationally changedaccording to the availability of relatedevidence. In that case, the probabilitiesinvolved in the Bayes’ theorem arefrequentlyreferredasBayesianprobabilities, although, mathematicallyspeaking, Bayesian and conditionalprobabilities are actually the same thing.The value of the prior probability P(A) isfixed before the experiment, whereas thevalue of the posterior probability is derivedfrom the experiment’s data. Usually,however, there exists an uncertainty aboutthe exact value of P(A). In such cases,considering all the possible values of P(A),we obtain different values for theconditional probability P(A/B). Therefore,the Bayes’ rule introduces a kind of multivalued logic tackling the existing, due to theimprecision of the value of the priorprobability, uncertainty. Consequently, onecould argue that Bayesian Reasoningconstitutes an interface between bivalentand fuzzy logic.E-ISSN: 2224-341093Volume 17, 2020

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATIONDOI: 10.37394/232010.2020.17.12as the “Pythagorean Theorem of ProbabilityTheory” [13].corresponding disease, makes a diagnostictest, the statistical accuracy of which is97%. The test is positive. What is theprobability for Mr. X to be a carrier of thevirus?Solution: Consider the following events: A: The subject is a carrier of thevirus. B: The test is positive.On the basis of the given data it turns outthat P(A) 0.02 and P(B/A) 0.97.Among 100 inhabitants of the country, 2on average are carriers and 98 arenoncarriers of the virus. Assuming that allthose people make the test, we should haveon average 2x97% 1.94 positive tests fromthe carriers and 98x3% 2.94 positive testsfrom the noncarriers of the virus, i.e.4.88 intotal positive tests. Therefore, P(B) 0.488.Replacing the values of P(A), P(B/A) andP(B) in equation (2) one finds thatP(A/B) 0.398. Therefore, the probabilityfor Mr. X to be a carrier of the virus is only39.8% and not 97%, as it could be thoughtthrough a first, rough estimation!This means that Mr. X has to make asecond test to see what really happens withhis health condition. Further, if the secondtest is negative, a third test will be alsorequired. At the same time, however, thereis an urgent need for other people to makethe test. This becomes evident by the nextexample.Example 3: Assume that Mr. X hassome suspicious symptoms and that 85% ofthe people presenting such symptoms havebeen infected by the virus. Mr. X makes thetest, which is positive. What is now theprobability for Mr, X to be a carrier of thevirus?Solution: Let A and B be the eventsdefined in Example 2. Here we have thatP(A) 0.85 and P(B/A) 0.97. Further,assuming that 100 people having suspicioussymptoms make the test, we should have onaverage 85x97% 82.45 positive tests fromthe carriers and 15x0.3% 0.45 from thenoncarriers of the virus, i.e. 82.9 in totalpositive tests. Therefore, P(B) 0.829.Replacing the values of P(A), P(B/A) andP(B) in equation (2) one finds thatP(A/B) 0.995. In this case, therefore, theprobability for Mr. X to be a carrier of thevirus is 99.5%, i.e. exceeds the statisticalaccuracy of the test!In general, the sensitivity of the solutionis great, depending on the values of the3. Applications of BayesianReasoning to Everyday LifeSituationsConditional probabilities and Bayesianreasoning have been proved very useful forsolving problems appearing in everyday lifesituations. Some representative examplesare presented in this section.Example 1: A market’sresearch isperformed on the population of a townconsisting 45% of men and 55% of women.Find the probability of the random choiceof: i) Three men for the first threeinterviews, and ii) Four women for the nextfour interviews.Solution: i) Let Ai be the event that aman is chosen for the i-th interview, i 1,2, 3. Then P(A1) 45:100, P(A2/A1) 44:99 and P(A3/A1 A2) 43:98. Therefore,writing P(A1 A2 A3) P[(A1 A2) A3]and applying two times equation (1) onefinds thatP(A1 A2 A3) P(A1 A2)P(A3/A1 A2) P(A1)P(A2/A1)P(A3/A1 A2) 0.088 or 8.8%.ii) Given a finite number n of events, onecan show by induction thatP(A1 A2 . An) P(A1)P(A2/A1)P(A3/A1 A2) . P(An/A1 A2 An-1)(3).Let A1, A2 and A3 be the events definedin case (i) and let Ai be the event that awoman is chosen for the i-th interview, i 4, 5, 6, 7. ThenP(A4/A1 A2 A3) 55:97 0.567,P(A5/A1 A2 A3 A4) 54:96 0.562,P(A6/A1 A2 A3 A4 A5) 53:95 0.558,and P(A7/A1 A2 A3 A4 A5 A6) 52:94 0.553.Therefore, applying equation (3) for n 7one finds that P(A1 A2 . A7) 0.0086or 0.86%.Bayesian reasoning is frequently used inmedical paradigms the outcomes of whichare not always compatible to the commonbeliefs. The following three timelyexamples, due to the current COVID-19pandemic, concern the creditability of theviruses’ diagnostic tests.Example 2: The statistical data showthat 2% of the inhabitants of country havebeen infected by a dangerous virus. Mr. X,who has not any symptoms of theE-ISSN: 2224-3410Michael Gr. Voskoglou94Volume 17, 2020

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATIONDOI: 10.37394/232010.2020.17.12prior probability P(A). The greater the valueof P(A), the higher the creditability of thetest.The next example examines whathappens, if the test is negative.Example 4: Assume that Mr. X makes adiagnostic test, which is negative. Find theprobability to be a carrier of the virus:i) Under the conditions of Example 2, andii) under the conditions of Example 3.Solution: Consider the following events: A: The subject is a carrier of thevirus. B: The test is negative.i) In this case we have P(A) 0.02 andP(B/A) 0.03. Assuming that 100 peoplemake the test, we should have on average98x97% 95.06 negative tests from thenoncarriers and 2x3% 0.06 from thecarriers of the virus, i.e. an average of 95.12in total negative tests. Therefore,P(B) 0.9512.Replacing the values of P(A), P(B) andP(B/A) to equation (1) one finds thatP(A/B) 0.0006. Therefore, the probabilityfor Mr. X to be a carrier of the virus is only0.06%.ii) Here we have P(A) 0.85 andP(B/A) 0.03. Further, assuming that 100people make the test, we shall have onaverage 15x97% 14.55 negative tests fromthe noncarriers and 85x3% 2.55 from thecarriers of the virus, i.e. an average of 17.1in total negative tests. Therefore,P(B) 0.171.Replacing the values of P(A), P(B) andP(B/A) to equation (1) one finds thatP(A/B) 0.1491. Therefore, the probabilityfor Mr. X to be a carrier of the virus is14.91%. One observes here that the greaterthe value of the prior probability P(A), thelower the creditability of the test.Remark: The outcomes of the previousthree examples support the view of manyepidemiologists that, at the initial stage, the“blind” diagnostic tests for COVID-19performed on the general population are noteffective, burdening purposeless thehealthcare system of the correspondingcountry.To check this from another optical angle,one has to take into account the statisticalestimation that the existing diagnostic testsfor COVID-19 give 30% incorrectlynegative (IN) results and 10% incorrectlypositive (IP) results. Assume that 2% of thepopulation of a country has been infectedE-ISSN: 2224-3410Michael Gr. Voskoglouby the coronavirus of COVID-19 and thatthe government decides to undergo theheavy cost of performing one million“blind“ tests on the general population.Among those people, 20000 on averageshould be carriers and 980000 noncarriersof the virus. Therefore, we should have20000x30% 6000 on average IN resultsand 14000 correctly positive (CP) resultsfrom the carriers and 980000x10% 98000IP results from the noncarriers of the virus.This means that 6000 people infected by thevirus with IN tests will not take the requiredprecautions, therefore transmitting easilythe virus to the other people.Further, denote, for simplicity, by CPand IP the numbers of CP and IP results ofthe tests respectively. Then, the probabilityP(CP) of a positive test to be correct isequal toP(CP) CP : (CP IP) (4)Inourcase,P(CP) 14000:(14000 98000) 0.125, i.e. only12.5%! Therefore, there is an urgent needfor the 112000 in total people with positivetests to make a second test in order to checktheir real health condition, etc.Equation (4) shows that P(CP) increases,either if the number CP increases or if thenumber IP decreases. The former happens ifmore people are infected by the virus,whereas the latter will happen if the qualityof the diagnostic tests will be improved.When, for example, 20% of thepopulation is infected by the virus, it isstraightforward to check that the probabilityP(CP) will be approximately equal to63.6%. Consequently, the more people areinfected by the virus, the higher thecreditability (and therefore the usefulness)of the diagnostic tests in detecting thepositive cases.Ourlastexampleconcernsacombination of the Bayes’ rule and thetheorem of total probability for the solutionof the corresponding problemExample 5: A country consists of threeconfederate districts, say D1, D2 and D3,where it lives the 20%, 25% and 55% of itstotal population respectively. A percentageof 60%, 45% and 10% respectively of thepopulation of each one of those districts isagainst the confederation wanting for itsdistrict to be an independent country. Whatis the probability that one of those people,chosen randomly, lives in district D3?95Volume 17, 2020

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATIONDOI: 10.37394/232010.2020.17.12Solution: Consider the events Ai: A person lives in district Di, i 1,2, 3, and B: A person is against theconfederationOn the basis of the given data it turns outthat P(A1) 0.2, P(A2) 0.25, P(A3) 0.55and P(B/A1) 0.6, P(B/A2) 0.45, P(B/A3) 0.1. We want to calculate the probabilityreasoning, is graphically represented inFigure 2, retrieved from [10]. The wholeprocess for explaining a phenomenon startswith the humans’ observations a1, a2, , anof the real world connected to it, which laidbyinduction(intuitively)tothedevelopment of theory T1 about thisphenomenon. Theory T1 is verified bydeductivereasoningandadditionaldeductive inferences K1, K2, ., Ks arreobtained. Next, a new series of observationsb1, b2, ,bm follow. If some of thoseobservations are not compatible to the lawsof theory T1, a new theory T2 is developedto replace/extend T1, and so on. In each casethe new theory extends or rejects theprevious one approaching more and more tothe objective truth related to thecorresponding phenomenon.This procedure is known as the scientificmethod. The term was introduced during the19th century, when significant terminologiesappeared establishing clear boundariesbetween science and non science. However,the scientific method characterizes thedevelopment of science since at least the17th century. Aristotle (384-322 BC) isrecognized as the inventor of the scientificmethod due to his refined analysis of thelogicalimplicationscontainedindemonstrative discourse. The first book inthe history of human civilization written onthe basis of the principles of the scientificmethod is, according to the existingwitnesses, the “Elements” of Euclid (365300 BC) addressing the axiomaticfoundation of Geometry.The scientific method is highly based onthe Trial and Error procedure, a termintroduced by C. Lloyd Morgan (18521936) [14]. This procedure is characterizedby repeated attempts, which are continueduntil success or until the subject stopstrying.As an example, the geocenrtic theory(Almagest) of Ptolemy of Alexandria (100170), being able to predict satisfactorily themovements of the planets and the moon,was considered to be true for centuries.However, it was finally proved to be wrongand has been replaced by the heliocentrictheory of Copernicus (1473-1543). TheCopernicus’ theory was supported andenhanced a hundred years later by theobservations/studies of Kepler and Galileo,but it faced many obstacles for a longP(A3/B) [P(B/A3)P(A3)] : P(B) (4)The Ai’s are obviously pairwise disjointevents and their union is equal to the samplespace X of the inhabitants of the country(mathematically speaking the Ai’s form apartition of X). Therefore, by the theoremof total probability [6] one finds thatP(B) P(A1 B) P(A2 B) P(A3 B) and,by the Bayes’ theorem,P(B) P(B/A1)P(A1) P(B/A2)P(A2) P(B/A3)P(A3) (5).Replacing the values of the probabilitiesinvolved in equation (5) one finds thatP(B) 0.2875. Therefore, equation (4) givesthat P(A3/B) 0.0628 or 6.28%4, BayesianScienceReasoninginMany scientists and philosophers of scienceargue nowadays that the whole sciencecould be considered as a Bayesian process[9-11]. In this section we are going tosupport and justify this view.Figure 2: The scientific methodThe process of scientific thinking, being asynthesis of inductive and deductiveE-ISSN: 2224-3410Michael Gr. Voskoglou96Volume 17, 2020

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATIONDOI: 10.37394/232010.2020.17.12period, especially from the church, beforeits final justification [15].Another characteristic example is theEinstein’sgeneralrelativitytheorydeveloped at the beginning of the 20thcentury. This theory has replaced theNewton’s classical gravitational theory,which was believed to be true for more thantwo centuries. The Einstein’s new approachwas based on the fact that, according to hisspecial theory of relativity (1905) thedistance (r) and the time (t) are changing ina different way with respect to a motionlessand to a moving observer.To support his argument Einsteinintroduced the concept of the 4-dimensionaltime-space and after a series of intensiveefforts (1908-1915) he finally managed toprove that the geometry of this space is nonEuclidean. This can be physically explainedby the distortion created to the time-spacedue to the presence of mass or of anequivalent amount of energy, which looksanalogous to the distortion created by a ballof bowling on the level of a trampoline.Einstein’s theory was experimentallyverified by the irregularity of the Hermes’orbit around the sun and later by themagnitude of the light’s divergence, whichwas calculated during the eclipse of the sunon May 29, 1919. In fact, the eclipse letsome stars, which normally should bebehind the sun, to appear besides it on thesky [16].The previous discussion about thescientific method reveals the importance ofinductive reasoning for scientific thinking.In fact, the premises of all the scientifictheories (with possible exception only forpure mathematics), expressed by axioms,basic principles, etc., are based on humanintuitionandinductivereasoning.Therefore, a deductive inference developedon the basis of a scientific theory, is trueunder the CONDITION that the premises ofthe corresponding theory are true. In otherwords, if H denotes the hypothesis imposedby those premises and I denotes thedeductive inference, then the conditionalprobability P(I/H), which can be calculatedby the Bayes’ rule, expresses the degree oftruth of the deductiveinference.Consequently, the argument that theWHOLE SCIENCE is characterized byBayesian reasoning seems to be true.It must be emphasized that the error ofthe inductive reasoning is transferred to aE-ISSN: 2224-3410Michael Gr. Voskogloudeductive inference through its premises.Therefore, the scientific error in its finalform is actually a deductive and not aninductive error! This means that none of theexisting scientific theories could beconsidered as been absolutely true; it simplycould be considered as approaching thetruth in a better way than the previoustheories, that has replaced, did.5. ConclusionIn the present study was shown thatBayesian Reasoning could be considered asan interface between bivalent and fuzzylogic. Its usefulness to everyday lifesituations was also illustrated by suitableexamples and its importance for the wholescience was studied.References:[1] Moor, J. The Dartmouth CollegeArtificial Intelligence conference: Thenext fifty years, AI Magazine, Vol. 27,2006, pp.87–91.[2] Hodges, A., Alan Turing: The Enigma,CentenaryEdition,PrincetonUniversity Press: Princeton, NJ, USA,2012.[3] Janes, E.T., Probability Theory: TheLogicofScience,CambridgeUniversity Press, UK, 8th Printing,2011 (first published, 2003).[4] Mumford, D., The Dawning of the Ageof Stochasticity, in V. Amoid, M.Atiyah, P. Laxand & B. Mazur (Eds.),Mathematics:FrontiersandPerspectives, AMS, 197-218, 2000.[5] Kosko, B. Fuzzy Thinking: The NewScience of Fuzzy Logic, Hyperion NewYork 1993[6] Shahbazova, S.N., Sugeno, M.,Kacpzyk, J. (Eds.),RecentDevelopments in Fuzzy Logic andFuzzy Sets (dedicated to L.A.Zadeh),Spinger,NY,2020.[7] Schuler, J. & Lipschutz, S., Schaum’sOutline of Probability, 2nd Edition,McGraw-Hill, NY, USA, 2010.[8] Shiryaev, A.N., Probability-1, 3dEdition, Spinger,NY,2016.[9] Horgan, J., Bayes’ Theorem: What isthe Big Deal?”, January m 10] Athanassopoulos, E. and Voskoglou,M.Gr., A Philosophical Treatise on the97Volume 17, 2020

WSEAS TRANSACTIONS on ADVANCES in ENGINEERING EDUCATIONDOI: 10.37394/232010.2020.17.12Creative Commons Attribution License 4.0(Attribution 4.0 International, CC BY 4.0)Connection of Scientific Reasoningwith Fuzzy Logic, Mathematics, 8,article 875, 2020.[11] Bertsch McGrayne, S., The Theory thatwould not die, Yale University Press,New Haven and London, 2012.[12] What do you think about machines -detail/26871[13] Jeffreys, H., Scientific Inference, 3dEdition, Cambridge University Press,UK, 1973.[14] Thrope, W.H., The origins and rise ofethology: The science of the naturalbehavior of animals, Praeger, LondonNY, 1979.[15] Gingerich, O., The Eye of the Heaven –Ptolemy,Copernicus,Kepler,American Institute of Physics, NY,1993.[16] Singh, S., Bing Bang - The Origin ofthe Universe, Harper PerennianPublishers, NY, 2005.E-ISSN: 2224-3410Michael Gr. VoskoglouThis article is published under the terms of the CreativeCommons Attribution License d.en US98Volume 17, 2020

probability, Bayesian reasoning has been proved to be very important to everyday life situations [9] and for the whole science as well [10]. Recent researches give evidence that even the mechanisms under which the human brain works are Bayesian [11]! Consequently, Bayesian reasoning is very useful for Machine Learning, the sector of