Transcription
SuperintelligenceSwiss Study FoundationKaspar Etter, kaspar.etter@gbs-schweiz.orgAdrian Hutter, adrian.hutter@gbs-schweiz.orgSolothurn, 12 June 2015www.superintelligence.ch1
Robin Li, Bill Gates, Elon Musk on AIThe Huge Challenge of Artificial Intelligencewww.youtube.com/watch?v vHzJ AJ34uQSuperintelligenceSwiss Study Foundation2
More Money for Entertainment than ensuring a good outcome!Ex Machina Movieexmachina-movie.comSuperintelligenceSwiss Study Foundation3
Intelligence and RationalityWhat are we talking about?SuperintelligenceSwiss Study Foundation4
What is Intelligence?«Innumerable tests are available formeasuring intelligence, yet no one isquite certain of what intelligence is, oreven just what it is that the availabletests are measuring.» – R. L. Gregor«The capacity to learn or to profit byexperience.» – W. F. enceSuperintelligenceIntelligence and Rationality5
A Useful Definition«Intelligence measures an agent’sability to achieve its goals in a widerange of unknown environments.»Intelligence Universal n PowerComp. ResourcesSuperintelligenceIntelligence and Rationality6
Required IngredientsAgentEnvironmentLearn, predict, rate and plan!Marcus Hutter: Universal Artificial Intelligencewww.youtube.com/watch?v I-vx5zbOOXISuperintelligenceIntelligence and Rationality7
Intelligence is a Big Deal6 million years ago, 96% common DNAFast-Evolving Human DNA Leads to Bigger-Brained Mice 2015/[ ]Intelligence and Rationality8
Technology (Neutral) LeverGreed and laziness drive us to– increase our productivity– make our lives easier which is fine except:Our technological progress faroutperforms our moral progress!Dual-Use Technologyen.wikipedia.org/wiki/Dual-use technologySuperintelligenceIntelligence and Rationality9
Artificial IntelligenceFireWheel ElectricityGunpowderPrinting PressHuman IntelligenceArtificial Intelligence?Intelligence is a technology like no other!Our Final ligence and Rationality10
RationalityIt’s aboutachievingyour goals!Rationality is the science of winning:– Epistemic rationality: accurate beliefs– Instrumental rationality: good strategyWhat do we mean by “rationality”?lesswrong.com/lw/31/what do we mean by [ ]SuperintelligenceIntelligence and Rationality11
Rationality Continued– Normative Rationality: probabilitytheory, decision theory, game theory– Descriptive Rationality: cognitivebiases, system 1 and system 2– Prescriptive Rationality:applied rationality; train:Center for Applied gence and Rationality12
Probability TheoryWhat can we know about the world?SuperintelligenceSwiss Study Foundation13
What are Probabilities?Probabilities can be interpreted as .– the frequency with which an eventoccurs if an experiment is repeateda large number of times (Frequentist)– or as the credence in a statement(Bayesian or subjective probability).What is the P that theth9Probability Interpretationsen.wikipedia.org/wiki/Probability interpretationsdigit of pi is 7?SuperintelligenceProbability Theory14
Relevance of ProbabilitiesWhat bets should we take, given ouruncertainty about the world we live in?Living isacting,acting isbetting!Prediction Marketen.wikipedia.org/wiki/Prediction marketSuperintelligenceProbability Theory15
Properties of Probabilities– For all propositions p: 0 P(p) 1– If p is certainly true: P(p) 1– If p and q are mutually exclusive:P(p q) P(p) P(q)– Consequences: P( p) 1 – P(p)– And: P(p p) P(p) P( p) 1Probability Axiomsen.wikipedia.org/wiki/Probability axiomsSuperintelligenceProbability Theory16
Conditional ProbabilitiesP(A B) is the probability of A given B.Example:Alice has two children. You learn eithera) that the older child is a girl – orb) that at least one of them is a girl.What is the P of two girls in a) and b)?Conditional Probabilityen.wikipedia.org/wiki/Conditional probabilitySuperintelligenceProbability Theory17
Joint ProbabilitiesP(A, B): probability that A and B occurP(A, B) not determined by P(A) & P(B):A “she voted for SP”, B “ for SVP”P(A) 27%, P(B) 19%, P(A, B) 0%A “older than 55”, B “older than 65”P(A) 27%, P(B) 19%, P(A, B) 19%Joint Probability Distributionen.wikipedia.org/wiki/Joint probability distributionSuperintelligenceProbability Theory18
Joint and Conditional Probabil.P(A, B) P(A B) * P(B)P(B, A) P(B A) * P(A)ABP(A, B) P(B, A)P(A B) * P(B) P(B A) * P(A)Euler Diagramen.wikipedia.org/wiki/Euler diagramSuperintelligenceProbability Theory19
Bayes’ Theorem– Thomas Bayes– 1701 – 1761– English StatisticianBayes’ Theoremen.wikipedia.org/wiki/Bayes%27 theoremSuperintelligenceProbability Theory20
Testing for a Disease– 1% of people have a certain disease– 80% with disease are tested positively– 9.6% w/o disease are tested positively– Given a positive test, how likely is “it”?– Out of 100 people, 1 are ill, 99 aren’t– Given positive test, only with 7.8% illBayes’ Ruleen.wikipedia.org/wiki/Bayes%27 ruleSuperintelligenceProbability Theory21
Bayesian InferenceHow to update your beliefs:Pold (e h)Pnew (h) Pold (h) ·Pold (e)The confidence in a hypothesis hincreases if the evidence e is morelikely to happen given h than without.Bayesian Inferenceen.wikipedia.org/wiki/Bayesian inferenceSuperintelligenceProbability Theory22
Base Rate FallacyWe tend to replace P(h e) with P(e h)instead of updating the prior belief!Pold (e h)Pnew (h) Pold (h) ·Pold (e)Extraordinary claims requireextraordinary evidence!Base Rate Fallacyen.wikipedia.org/wiki/Base rate fallacySuperintelligenceProbability Theory23
Broken ScienceNull-HypothesisTesting: P(e h)instead of P(h e)!Trouble at the Labecon.st/19QXzFeSuperintelligenceProbability Theory24
Conservation of Exp. EvidenceAbsence of evidence isevidence of absence!Bayes’ Theorem on Crucial y/bayes-theorem/SuperintelligenceProbability Theory25
Prior ProbabilityWhere do we get the priors from?– Irrelevant given enough evidence– Knowledge about the statistics– Symmetries (even distribution)But:– What is the prior that Zeus exists?– The prior that the universe is finite?Prior Probabilityen.wikipedia.org/wiki/Prior probabilitySuperintelligenceProbability Theory26
Solomonoff InductionRay Solomonoff, 1960: Universal PriorCombines and formalizes ideas by– Epicurus: “Keep all hypotheses thatare consistent with the data.”– Ockham: “Among all hypothesesconsistent with the observations,choose the simplest.” (O.’s Razor)An Intuitive Explanation of Solomonoff Inductionlesswrong.com/lw/dhg/an intuitive explanation of [ ]SuperintelligenceProbability Theory27
Kolmogorov ComplexitySimplicity: Not how easy something isto understand for humans – but ratherthe length of the shortest program thatis able to reproduce the observations.The Solomonoff prior is exponentiallysmall in this length & not computable.(Formal def. involves Turing machines.)Kolmogorov Complexityen.wikipedia.org/wiki/Kolmogorov complexitySuperintelligenceProbability Theory28
Decision Theory (Part 1)How shall we decide in the face of uncertainty?SuperintelligenceSwiss Study Foundation29
Utility FunctionA utility function describes an agent’spreferences over different outcomes.Being nourished is a state of higherutility for animals than being starving.A utility function doesn’t have to be anexplicit or conscious goal of an elligenceDecision Theory30
Marginal UtilityMoney has diminishing marginal utility:UtilityUtility FunctionMoney 0 100 1000 1100Marginal Utilityen.wikipedia.org/wiki/Marginal utilitySuperintelligenceDecision Theory31
Expected UtilityUnder uncertainty (i.e. in reality) wecan only maximize expected utilities.If there are 3 outcomes with utilitiesu1, u2, u3 and probabilities p1, p2, p3,the expected utility is calculated asp1 * u1 p2 * u2 p3 * u3.Expected Utility Hypothesisen.wikipedia.org/wiki/Expected utility hypothesisSuperintelligenceDecision Theory32
Von Neumann–MorgensternVon Neumann and Morgenstern, 1947:Every rational agent (i.e. one satisfyingfour natural axioms) behaves as if ittries to maximize the expected utilityof some utility function.Axioms: Completeness, Transitivity,Continuity and Independence (VNM).Von Neumann-Morgernstern Utility Theoremen.wikipedia.org/wiki/Von Neumann–Morgenstern [ ]SuperintelligenceDecision Theory33
Money-Pump ArgumentDecision Theory FAQlesswrong.com/lw/gu1/decision theory faq/SuperintelligenceDecision Theory34
Loss AversionYou have been given 1000. Choose:– Win 1000 with 50% (risky)– Win 500 with certainty (safe)You have been given 2000. Choose:– Lose 1000 with 50% (risky)– Lose 500 with certainty (safe)TED Talk by Laurie Santoswww.ted.com/talks/laurie santos.htmlSuperintelligenceDecision Theory35
Prospect TheoryProspect Theoryen.wikipedia.org/wiki/Prospect theorySuperintelligenceDecision Theory36
Framing EffectTversky/Kahneman:A: 72%, B: 28%C: 22%, D: 78%Outbreak of disease, choose between:– A: 200 out of 600 people will be saved– B: ⅓ pr. of saving everyone, ⅔ no one– C: 400 out of 600 people will die– D: ⅓ prob. nobody will die, ⅔ 600 dieDon’t trust your (moral) intuitions!Framing (Social Sciences)en.wikipedia.org/wiki/Framing (social sciences)SuperintelligenceDecision Theory37
Universal IntelligenceCan we define an optimally intelligent agent?SuperintelligenceSwiss Study Foundation38
Swiss AI Lab in LuganoWorld-leading in pattern recognitionwith Artificial Neural Networks (ANNs).Theory of optimally intelligent agents:– AIXI (Marcus Hutter)– Gödel Machine (Jürgen Schmidhuber)Former PhD student co-founded DeepMind (sold to Google for 500m, 2014).IDSIAwww.idsia.chSuperintelligenceUniversal Intelligence39
An Optimal AgentActive agents in a known environmentmaximize expected utility.Passive agents in an unknown environmentuse Solomonoff induction to get probabilitydistribution over possible environments.What about active agents in unknownenvironments (like reality)?Intelligent Agenten.wikipedia.org/wiki/Intelligent agentSuperintelligenceUniversal Intelligence40
Universal Intelligence: AIXIDeveloped by Marcus Hutter in 2000At each step, update your probabilitydistribution over all possible worlds(Bayes/Solomonoff) and choose theaction which maximizes expectedutility over all remaining steps.Universal Algorithmic intelligenceUniversal Intelligence41
Monte Carlo AIXIIt is the most intelligent agent possible.Like the Solomonoff prior, AIXI cannotbe computed but only approximated.AIXI is not intended as a proposal forbuilding an AI, but as an upper boundon how intelligent an agent can be.A Monte Carlo AIXI nceUniversal Intelligence42
Shortcomings of AIXICartesian model: The agent and theenvironment are modelled as separateTuring machines. In reality, the agent ispart of the world in which it lives.Wireheading: The most intelligent thingto do for a reinforcement learner is toget control over its reward-channel.Universal Algorithmic nceUniversal Intelligence43
Gödel MachineNamed after Kurt Gödel (1906–1978),developed by J. Schmidhuber (2003).Mathematically rigorous, general, fullyself-referential, self-improving, optimallyefficient problem solver.The GM solves all large enough problemsalmost as quickly as if it already knew thebest (unknown) algorithm for solving them.Gödel Machine by Jürgen Schmidhuberpeople.idsia.ch/ al Intelligence44
Decision Theory (Part 2)What kind of problems can we run into?SuperintelligenceSwiss Study Foundation45
ComplicationsWhat if the payout-structure itselfdepends on the action you take?Causal Decision Theory: Take the actionthat causes the best expected outcome.Evidential Decision Theory: Choose theaction which, conditional on you havingchosen it, gives you the best e. outcome.Evidential Decision Theoryen.wikipedia.org/wiki/Evidential decision theorySuperintelligenceDecision Theory46
Newcomb’s ParadoxNewcomb’s Paradoxen.wikipedia.org/wiki/Newcomb%27s paradoxSuperintelligenceDecision Theory47
Possible AnswersEDT CDT?Causal reasoning: Content of boxes isfixed and not affected by my decision.Taking both gives me more than one.Evidential reasoning: If I take only onebox, the predictor had predicted thisand filled it with 1m. If I take bothboxes, I will walk away with only 1k.Newcomb’s Problemwiki.lesswrong.com/wiki/Newcomb's problemSuperintelligenceDecision Theory48
Where CDT (Arguably) EDT«Solomon is an ancient monarch vaguely reminiscent of the IsraeliteKing. (Every part of this story is Biblically inaccurate.) He is ponderingwhether to summon Bathsheba, another man’s wife. But Solomon isalso fully informed as to the peculiar connection between his choice inthis matter and the likelihood of his eventually suffering a successfulrevolt: “Kings have two basic personality types, charismatic anduncharismatic. A king’s degree of charisma depends on his geneticmake-up and early childhood experiences, and cannot be changed inadulthood. Now charismatic kings tend to act justly and uncharismatickings unjustly. Successful revolts against charismatic kings are rare,whereas successful revolts against uncharismatic kings are frequent.Unjust acts themselves, though, do not cause successful revolts Solomon does not know whether or not he is charismatic; he doesknow that it is unjust to send for another man’s wife.”»Paradoxes in Probability Theory by William rintelligenceDecision Theory49
Where Both (Arguably) Fail– Blackmailing– Parfit’s Hitchhiker– Counterfactual MuggingPossible solution: Updateless DT (“Dowhat you would have precommitted.”)EDT & CDT not reflectively consistent.UDT with Known Search Order by Tsvi .pdfSuperintelligenceDecision Theory50
Machine LearningHow can algorithms separate the signal from the noise?SuperintelligenceSwiss Study Foundation51
Software is eating the world because it is more tertainmentRetailanaloganalogMarc italSuperintelligenceMachine Learning52
ReferenceMachine LearningA Short Introduction53
Computers have just learned how to see, read, write and classify.Jeremy Howardgo.ted.com/bbZCSuperintelligenceMachine Learning54
Google Research Blog, November 2014googleresearch.blogspot.ch/2014/11/[ ]SuperintelligenceMachine Learning55
Superhuman Image Recognition– With convolutional neural networks– 1.2 m training images, 30 layersInside Microsoft Research, February 2015blogs.technet.com/b/inside microsoft research/[ ]SuperintelligenceMachine Learning56
MachineA machine runs an algorithm which is– a procedure for– solving a specified problem– in a finite number of steps– (i.e. eventually producing an output).Turing Machineen.wikipedia.org/wiki/Turing machineSuperintelligenceMachine Learning57
Learning– Building predictive models from data– Algorithms improve with “experience”– Useful if you don’t know the solution– Machine Learning: Learn and predict– Data Mining: Discover new propertiesMachine Learningen.wikipedia.org/wiki/Machine learningSuperintelligenceMachine Learning58
Feedback MechanismCategorization based on feedback:– Supervised Learning: Given a set ofinputs & outputs, learn a general rule– Unsupervised Learning: Learn thestructure in input without training set– Reinforcement Learning: Achieve acertain goal in dynamic environmentSupervised Learningen.wikipedia.org/wiki/Supervised learningSuperintelligenceMachine Learning59
Tasks and Models– Classification: Assign unseen inputto descrete categories (supervised)– Regression: Map the unseen inputto continuous values (supervised)– Clustering: Divide set of inputs intoa number of groups (unsupervised)– Dimensionality Reduction: SimplifyStatistical Classificationen.wikipedia.org/wiki/Statistical classificationSuperintelligenceMachine Learning60
Applications–––––––––Spam filteringSearch enginesComputer visionMedical diagnosisStock market predictionOptical character recognitionRecognizing credit card fraudSpeech-recognition (e.g. SIRI)Autonomous systems (e.g. self-driving cars)Six Novel Machine Learning 06/[ ]SuperintelligenceMachine Learning61
Problem: ttingSuperintelligenceMachine Learning62
Cross-Validation– Helps to prevent overfitting (i.e. findthe optimal number of parameters).– Helps to estimate how good a modelwill generalize to unseen data.Partition data into a “trainingset” and a “validation set”.Cross-Validation (Statistics)en.wikipedia.org/wiki/Cross-validation (statistics)SuperintelligenceMachine Learning63
Exploration-Eploitation Trade-OffYou have 1000 gambling machines infront of you, which give different payouts with different prob. distributions.You have 1000 trials for all machines.How many of your trials do you spendfinding a good machine, how many toexploit the best machine found so far?Reinforcement Learningen.wikipedia.org/wiki/Reinforcement learningSuperintelligenceMachine Learning64
Clustering: ExampleHealth Behaviour Patterns in a German 50 elligenceMachine Learning65
Clustering: k-means AlgorithmHow to find k clusters in N data points?Goal: Minimize the sum of the distancesof each point to the center of its cluster.Solving this problem exactly is hard (NP).Choose k of the N points randomly.Assign each point to the nearest mean.Calculate the new means of each cluster.k-means Clusteringen.wikipedia.org/wiki/K-means clusteringSuperintelligenceMachine Learning66
k-means Clustering Outcomesk-means Clusteringen.wikipedia.org/wiki/K-means clusteringSuperintelligenceMachine Learning67
Dimensionality ReductionDimensionality Reductionen.wikipedia.org/wiki/Dimensionality reductionSuperintelligenceMachine Learning68
EntriesGoal: Reduce DimensionalityOld FeaturesNew FeaturesOriginalDataReducedDataRetain as much information as possiblewith as little information as possible!Principal Component Analysisen.wikipedia.org/wiki/Principal component analysisSuperintelligenceMachine Learning69
Recommendation SystemsExample: If each song is a dimension,a user’s taste can be viewed as vector.Observation: As groups of users ratethe same kind of songs high and low,try to distill the tastes as base vectors.Apply Principal Component Analysis!Recommender Systemen.wikipedia.org/wiki/Recommender systemSuperintelligenceMachine Learning70
Lossy Compression– Downside: New base vectors needto be stored torecover the data.– Solution: Chooseappropriate basevectors (in caseof JPEG: DCT)!Data Compressionen.wikipedia.org/wiki/Data compressionSuperintelligenceMachine Learning71
Support Vector MachinesState-of-the-art supervised learningmodels used for classification.Binary, linear classifier:Find a plane that max.distance to the closestdata points from bothclasses.Support Vector Machineen.wikipedia.org/wiki/Support vector machineSuperintelligenceMachine Learning72
SVM’s ExtensionsIn practice, there will often be no planethat perfectly separates the two classes.Soft margin: Allow violations w. penalty.Kernel Methoden.wikipedia.org/wiki/Kernel methodSuperintelligenceMachine Learning73
Reinforcement Learning (CNN)Google develops self-learning computer [ ]SuperintelligenceMachine Learning74
Outlook: Huge Responsibility in the near-term. For the long-term, see www.superintelligence.ch.Train CNNCampaign to Stop Killer Robotswww.stopkillerrobots.orgDeploy CNNSuperintelligenceMachine Learning75
1 Superintelligence Swiss Study Foundation Kaspar Etter, kaspar.etter@gbs-schweiz.org Solothurn, 12 June 2015 Adrian Hutter, adrian.hutter@gbs-schweiz.org www.superintelligence.ch