Superintelligence

Transcription

SuperintelligenceSwiss Study FoundationKaspar Etter, kaspar.etter@gbs-schweiz.orgAdrian Hutter, adrian.hutter@gbs-schweiz.orgSolothurn, 12 June 2015www.superintelligence.ch1

Robin Li, Bill Gates, Elon Musk on AIThe Huge Challenge of Artificial Intelligencewww.youtube.com/watch?v vHzJ AJ34uQSuperintelligenceSwiss Study Foundation2

More Money for Entertainment than ensuring a good outcome!Ex Machina Movieexmachina-movie.comSuperintelligenceSwiss Study Foundation3

Intelligence and RationalityWhat are we talking about?SuperintelligenceSwiss Study Foundation4

What is Intelligence?«Innumerable tests are available formeasuring intelligence, yet no one isquite certain of what intelligence is, oreven just what it is that the availabletests are measuring.» – R. L. Gregor«The capacity to learn or to profit byexperience.» – W. F. enceSuperintelligenceIntelligence and Rationality5

A Useful Definition«Intelligence measures an agent’sability to achieve its goals in a widerange of unknown environments.»Intelligence Universal n PowerComp. ResourcesSuperintelligenceIntelligence and Rationality6

Required IngredientsAgentEnvironmentLearn, predict, rate and plan!Marcus Hutter: Universal Artificial Intelligencewww.youtube.com/watch?v I-vx5zbOOXISuperintelligenceIntelligence and Rationality7

Intelligence is a Big Deal6 million years ago, 96% common DNAFast-Evolving Human DNA Leads to Bigger-Brained Mice 2015/[ ]Intelligence and Rationality8

Technology (Neutral) LeverGreed and laziness drive us to– increase our productivity– make our lives easier which is fine except:Our technological progress faroutperforms our moral progress!Dual-Use Technologyen.wikipedia.org/wiki/Dual-use technologySuperintelligenceIntelligence and Rationality9

Artificial IntelligenceFireWheel ElectricityGunpowderPrinting PressHuman IntelligenceArtificial Intelligence?Intelligence is a technology like no other!Our Final ligence and Rationality10

RationalityIt’s aboutachievingyour goals!Rationality is the science of winning:– Epistemic rationality: accurate beliefs– Instrumental rationality: good strategyWhat do we mean by “rationality”?lesswrong.com/lw/31/what do we mean by [ ]SuperintelligenceIntelligence and Rationality11

Rationality Continued– Normative Rationality: probabilitytheory, decision theory, game theory– Descriptive Rationality: cognitivebiases, system 1 and system 2– Prescriptive Rationality:applied rationality; train:Center for Applied gence and Rationality12

Probability TheoryWhat can we know about the world?SuperintelligenceSwiss Study Foundation13

What are Probabilities?Probabilities can be interpreted as .– the frequency with which an eventoccurs if an experiment is repeateda large number of times (Frequentist)– or as the credence in a statement(Bayesian or subjective probability).What is the P that theth9Probability Interpretationsen.wikipedia.org/wiki/Probability interpretationsdigit of pi is 7?SuperintelligenceProbability Theory14

Relevance of ProbabilitiesWhat bets should we take, given ouruncertainty about the world we live in?Living isacting,acting isbetting!Prediction Marketen.wikipedia.org/wiki/Prediction marketSuperintelligenceProbability Theory15

Properties of Probabilities– For all propositions p: 0 P(p) 1– If p is certainly true: P(p) 1– If p and q are mutually exclusive:P(p q) P(p) P(q)– Consequences: P( p) 1 – P(p)– And: P(p p) P(p) P( p) 1Probability Axiomsen.wikipedia.org/wiki/Probability axiomsSuperintelligenceProbability Theory16

Conditional ProbabilitiesP(A B) is the probability of A given B.Example:Alice has two children. You learn eithera) that the older child is a girl – orb) that at least one of them is a girl.What is the P of two girls in a) and b)?Conditional Probabilityen.wikipedia.org/wiki/Conditional probabilitySuperintelligenceProbability Theory17

Joint ProbabilitiesP(A, B): probability that A and B occurP(A, B) not determined by P(A) & P(B):A “she voted for SP”, B “ for SVP”P(A) 27%, P(B) 19%, P(A, B) 0%A “older than 55”, B “older than 65”P(A) 27%, P(B) 19%, P(A, B) 19%Joint Probability Distributionen.wikipedia.org/wiki/Joint probability distributionSuperintelligenceProbability Theory18

Joint and Conditional Probabil.P(A, B) P(A B) * P(B)P(B, A) P(B A) * P(A)ABP(A, B) P(B, A)P(A B) * P(B) P(B A) * P(A)Euler Diagramen.wikipedia.org/wiki/Euler diagramSuperintelligenceProbability Theory19

Bayes’ Theorem– Thomas Bayes– 1701 – 1761– English StatisticianBayes’ Theoremen.wikipedia.org/wiki/Bayes%27 theoremSuperintelligenceProbability Theory20

Testing for a Disease– 1% of people have a certain disease– 80% with disease are tested positively– 9.6% w/o disease are tested positively– Given a positive test, how likely is “it”?– Out of 100 people, 1 are ill, 99 aren’t– Given positive test, only with 7.8% illBayes’ Ruleen.wikipedia.org/wiki/Bayes%27 ruleSuperintelligenceProbability Theory21

Bayesian InferenceHow to update your beliefs:Pold (e h)Pnew (h) Pold (h) ·Pold (e)The confidence in a hypothesis hincreases if the evidence e is morelikely to happen given h than without.Bayesian Inferenceen.wikipedia.org/wiki/Bayesian inferenceSuperintelligenceProbability Theory22

Base Rate FallacyWe tend to replace P(h e) with P(e h)instead of updating the prior belief!Pold (e h)Pnew (h) Pold (h) ·Pold (e)Extraordinary claims requireextraordinary evidence!Base Rate Fallacyen.wikipedia.org/wiki/Base rate fallacySuperintelligenceProbability Theory23

Broken ScienceNull-HypothesisTesting: P(e h)instead of P(h e)!Trouble at the Labecon.st/19QXzFeSuperintelligenceProbability Theory24

Conservation of Exp. EvidenceAbsence of evidence isevidence of absence!Bayes’ Theorem on Crucial y/bayes-theorem/SuperintelligenceProbability Theory25

Prior ProbabilityWhere do we get the priors from?– Irrelevant given enough evidence– Knowledge about the statistics– Symmetries (even distribution)But:– What is the prior that Zeus exists?– The prior that the universe is finite?Prior Probabilityen.wikipedia.org/wiki/Prior probabilitySuperintelligenceProbability Theory26

Solomonoff InductionRay Solomonoff, 1960: Universal PriorCombines and formalizes ideas by– Epicurus: “Keep all hypotheses thatare consistent with the data.”– Ockham: “Among all hypothesesconsistent with the observations,choose the simplest.” (O.’s Razor)An Intuitive Explanation of Solomonoff Inductionlesswrong.com/lw/dhg/an intuitive explanation of [ ]SuperintelligenceProbability Theory27

Kolmogorov ComplexitySimplicity: Not how easy something isto understand for humans – but ratherthe length of the shortest program thatis able to reproduce the observations.The Solomonoff prior is exponentiallysmall in this length & not computable.(Formal def. involves Turing machines.)Kolmogorov Complexityen.wikipedia.org/wiki/Kolmogorov complexitySuperintelligenceProbability Theory28

Decision Theory (Part 1)How shall we decide in the face of uncertainty?SuperintelligenceSwiss Study Foundation29

Utility FunctionA utility function describes an agent’spreferences over different outcomes.Being nourished is a state of higherutility for animals than being starving.A utility function doesn’t have to be anexplicit or conscious goal of an elligenceDecision Theory30

Marginal UtilityMoney has diminishing marginal utility:UtilityUtility FunctionMoney 0 100 1000 1100Marginal Utilityen.wikipedia.org/wiki/Marginal utilitySuperintelligenceDecision Theory31

Expected UtilityUnder uncertainty (i.e. in reality) wecan only maximize expected utilities.If there are 3 outcomes with utilitiesu1, u2, u3 and probabilities p1, p2, p3,the expected utility is calculated asp1 * u1 p2 * u2 p3 * u3.Expected Utility Hypothesisen.wikipedia.org/wiki/Expected utility hypothesisSuperintelligenceDecision Theory32

Von Neumann–MorgensternVon Neumann and Morgenstern, 1947:Every rational agent (i.e. one satisfyingfour natural axioms) behaves as if ittries to maximize the expected utilityof some utility function.Axioms: Completeness, Transitivity,Continuity and Independence (VNM).Von Neumann-Morgernstern Utility Theoremen.wikipedia.org/wiki/Von Neumann–Morgenstern [ ]SuperintelligenceDecision Theory33

Money-Pump ArgumentDecision Theory FAQlesswrong.com/lw/gu1/decision theory faq/SuperintelligenceDecision Theory34

Loss AversionYou have been given 1000. Choose:– Win 1000 with 50% (risky)– Win 500 with certainty (safe)You have been given 2000. Choose:– Lose 1000 with 50% (risky)– Lose 500 with certainty (safe)TED Talk by Laurie Santoswww.ted.com/talks/laurie santos.htmlSuperintelligenceDecision Theory35

Prospect TheoryProspect Theoryen.wikipedia.org/wiki/Prospect theorySuperintelligenceDecision Theory36

Framing EffectTversky/Kahneman:A: 72%, B: 28%C: 22%, D: 78%Outbreak of disease, choose between:– A: 200 out of 600 people will be saved– B: ⅓ pr. of saving everyone, ⅔ no one– C: 400 out of 600 people will die– D: ⅓ prob. nobody will die, ⅔ 600 dieDon’t trust your (moral) intuitions!Framing (Social Sciences)en.wikipedia.org/wiki/Framing (social sciences)SuperintelligenceDecision Theory37

Universal IntelligenceCan we define an optimally intelligent agent?SuperintelligenceSwiss Study Foundation38

Swiss AI Lab in LuganoWorld-leading in pattern recognitionwith Artificial Neural Networks (ANNs).Theory of optimally intelligent agents:– AIXI (Marcus Hutter)– Gödel Machine (Jürgen Schmidhuber)Former PhD student co-founded DeepMind (sold to Google for 500m, 2014).IDSIAwww.idsia.chSuperintelligenceUniversal Intelligence39

An Optimal AgentActive agents in a known environmentmaximize expected utility.Passive agents in an unknown environmentuse Solomonoff induction to get probabilitydistribution over possible environments.What about active agents in unknownenvironments (like reality)?Intelligent Agenten.wikipedia.org/wiki/Intelligent agentSuperintelligenceUniversal Intelligence40

Universal Intelligence: AIXIDeveloped by Marcus Hutter in 2000At each step, update your probabilitydistribution over all possible worlds(Bayes/Solomonoff) and choose theaction which maximizes expectedutility over all remaining steps.Universal Algorithmic intelligenceUniversal Intelligence41

Monte Carlo AIXIIt is the most intelligent agent possible.Like the Solomonoff prior, AIXI cannotbe computed but only approximated.AIXI is not intended as a proposal forbuilding an AI, but as an upper boundon how intelligent an agent can be.A Monte Carlo AIXI nceUniversal Intelligence42

Shortcomings of AIXICartesian model: The agent and theenvironment are modelled as separateTuring machines. In reality, the agent ispart of the world in which it lives.Wireheading: The most intelligent thingto do for a reinforcement learner is toget control over its reward-channel.Universal Algorithmic nceUniversal Intelligence43

Gödel MachineNamed after Kurt Gödel (1906–1978),developed by J. Schmidhuber (2003).Mathematically rigorous, general, fullyself-referential, self-improving, optimallyefficient problem solver.The GM solves all large enough problemsalmost as quickly as if it already knew thebest (unknown) algorithm for solving them.Gödel Machine by Jürgen Schmidhuberpeople.idsia.ch/ al Intelligence44

Decision Theory (Part 2)What kind of problems can we run into?SuperintelligenceSwiss Study Foundation45

ComplicationsWhat if the payout-structure itselfdepends on the action you take?Causal Decision Theory: Take the actionthat causes the best expected outcome.Evidential Decision Theory: Choose theaction which, conditional on you havingchosen it, gives you the best e. outcome.Evidential Decision Theoryen.wikipedia.org/wiki/Evidential decision theorySuperintelligenceDecision Theory46

Newcomb’s ParadoxNewcomb’s Paradoxen.wikipedia.org/wiki/Newcomb%27s paradoxSuperintelligenceDecision Theory47

Possible AnswersEDT CDT?Causal reasoning: Content of boxes isfixed and not affected by my decision.Taking both gives me more than one.Evidential reasoning: If I take only onebox, the predictor had predicted thisand filled it with 1m. If I take bothboxes, I will walk away with only 1k.Newcomb’s Problemwiki.lesswrong.com/wiki/Newcomb's problemSuperintelligenceDecision Theory48

Where CDT (Arguably) EDT«Solomon is an ancient monarch vaguely reminiscent of the IsraeliteKing. (Every part of this story is Biblically inaccurate.) He is ponderingwhether to summon Bathsheba, another man’s wife. But Solomon isalso fully informed as to the peculiar connection between his choice inthis matter and the likelihood of his eventually suffering a successfulrevolt: “Kings have two basic personality types, charismatic anduncharismatic. A king’s degree of charisma depends on his geneticmake-up and early childhood experiences, and cannot be changed inadulthood. Now charismatic kings tend to act justly and uncharismatickings unjustly. Successful revolts against charismatic kings are rare,whereas successful revolts against uncharismatic kings are frequent.Unjust acts themselves, though, do not cause successful revolts Solomon does not know whether or not he is charismatic; he doesknow that it is unjust to send for another man’s wife.”»Paradoxes in Probability Theory by William rintelligenceDecision Theory49

Where Both (Arguably) Fail– Blackmailing– Parfit’s Hitchhiker– Counterfactual MuggingPossible solution: Updateless DT (“Dowhat you would have precommitted.”)EDT & CDT not reflectively consistent.UDT with Known Search Order by Tsvi .pdfSuperintelligenceDecision Theory50

Machine LearningHow can algorithms separate the signal from the noise?SuperintelligenceSwiss Study Foundation51

Software is eating the world because it is more tertainmentRetailanaloganalogMarc italSuperintelligenceMachine Learning52

ReferenceMachine LearningA Short Introduction53

Computers have just learned how to see, read, write and classify.Jeremy Howardgo.ted.com/bbZCSuperintelligenceMachine Learning54

Google Research Blog, November 2014googleresearch.blogspot.ch/2014/11/[ ]SuperintelligenceMachine Learning55

Superhuman Image Recognition– With convolutional neural networks– 1.2 m training images, 30 layersInside Microsoft Research, February 2015blogs.technet.com/b/inside microsoft research/[ ]SuperintelligenceMachine Learning56

MachineA machine runs an algorithm which is– a procedure for– solving a specified problem– in a finite number of steps– (i.e. eventually producing an output).Turing Machineen.wikipedia.org/wiki/Turing machineSuperintelligenceMachine Learning57

Learning– Building predictive models from data– Algorithms improve with “experience”– Useful if you don’t know the solution– Machine Learning: Learn and predict– Data Mining: Discover new propertiesMachine Learningen.wikipedia.org/wiki/Machine learningSuperintelligenceMachine Learning58

Feedback MechanismCategorization based on feedback:– Supervised Learning: Given a set ofinputs & outputs, learn a general rule– Unsupervised Learning: Learn thestructure in input without training set– Reinforcement Learning: Achieve acertain goal in dynamic environmentSupervised Learningen.wikipedia.org/wiki/Supervised learningSuperintelligenceMachine Learning59

Tasks and Models– Classification: Assign unseen inputto descrete categories (supervised)– Regression: Map the unseen inputto continuous values (supervised)– Clustering: Divide set of inputs intoa number of groups (unsupervised)– Dimensionality Reduction: SimplifyStatistical Classificationen.wikipedia.org/wiki/Statistical classificationSuperintelligenceMachine Learning60

Applications–––––––––Spam filteringSearch enginesComputer visionMedical diagnosisStock market predictionOptical character recognitionRecognizing credit card fraudSpeech-recognition (e.g. SIRI)Autonomous systems (e.g. self-driving cars)Six Novel Machine Learning 06/[ ]SuperintelligenceMachine Learning61

Problem: ttingSuperintelligenceMachine Learning62

Cross-Validation– Helps to prevent overfitting (i.e. findthe optimal number of parameters).– Helps to estimate how good a modelwill generalize to unseen data.Partition data into a “trainingset” and a “validation set”.Cross-Validation (Statistics)en.wikipedia.org/wiki/Cross-validation (statistics)SuperintelligenceMachine Learning63

Exploration-Eploitation Trade-OffYou have 1000 gambling machines infront of you, which give different payouts with different prob. distributions.You have 1000 trials for all machines.How many of your trials do you spendfinding a good machine, how many toexploit the best machine found so far?Reinforcement Learningen.wikipedia.org/wiki/Reinforcement learningSuperintelligenceMachine Learning64

Clustering: ExampleHealth Behaviour Patterns in a German 50 elligenceMachine Learning65

Clustering: k-means AlgorithmHow to find k clusters in N data points?Goal: Minimize the sum of the distancesof each point to the center of its cluster.Solving this problem exactly is hard (NP).Choose k of the N points randomly.Assign each point to the nearest mean.Calculate the new means of each cluster.k-means Clusteringen.wikipedia.org/wiki/K-means clusteringSuperintelligenceMachine Learning66

k-means Clustering Outcomesk-means Clusteringen.wikipedia.org/wiki/K-means clusteringSuperintelligenceMachine Learning67

Dimensionality ReductionDimensionality Reductionen.wikipedia.org/wiki/Dimensionality reductionSuperintelligenceMachine Learning68

EntriesGoal: Reduce DimensionalityOld FeaturesNew FeaturesOriginalDataReducedDataRetain as much information as possiblewith as little information as possible!Principal Component Analysisen.wikipedia.org/wiki/Principal component analysisSuperintelligenceMachine Learning69

Recommendation SystemsExample: If each song is a dimension,a user’s taste can be viewed as vector.Observation: As groups of users ratethe same kind of songs high and low,try to distill the tastes as base vectors.Apply Principal Component Analysis!Recommender Systemen.wikipedia.org/wiki/Recommender systemSuperintelligenceMachine Learning70

Lossy Compression– Downside: New base vectors needto be stored torecover the data.– Solution: Chooseappropriate basevectors (in caseof JPEG: DCT)!Data Compressionen.wikipedia.org/wiki/Data compressionSuperintelligenceMachine Learning71

Support Vector MachinesState-of-the-art supervised learningmodels used for classification.Binary, linear classifier:Find a plane that max.distance to the closestdata points from bothclasses.Support Vector Machineen.wikipedia.org/wiki/Support vector machineSuperintelligenceMachine Learning72

SVM’s ExtensionsIn practice, there will often be no planethat perfectly separates the two classes.Soft margin: Allow violations w. penalty.Kernel Methoden.wikipedia.org/wiki/Kernel methodSuperintelligenceMachine Learning73

Reinforcement Learning (CNN)Google develops self-learning computer [ ]SuperintelligenceMachine Learning74

Outlook: Huge Responsibility in the near-term. For the long-term, see www.superintelligence.ch.Train CNNCampaign to Stop Killer Robotswww.stopkillerrobots.orgDeploy CNNSuperintelligenceMachine Learning75

1 Superintelligence Swiss Study Foundation Kaspar Etter, kaspar.etter@gbs-schweiz.org Solothurn, 12 June 2015 Adrian Hutter, adrian.hutter@gbs-schweiz.org www.superintelligence.ch