Probability Theory: The Logic Of Science

Transcription

Probability Theory:The Logic of SciencebyE. T. JaynesWayman Crow Professor of PhysicsWashington UniversitySt. Louis, MO 63130, U. S. A.Dedicated to the Memory of Sir Harold Jeffreys,who saw the truth and preserved it.Copyright c 1995 by Edwin T. Jaynes.

iEDITORS FORWARDE. T. Jaynes died April 30, 1998. Before his death he asked me to finish and publish his bookon probability theory. I struggled with this for some time, because there is no doubt in my mindthat Jaynes wanted this book finished. Unfortunately, most of the later Chapters, Jaynes’ intendedvolume 2 on applications, were either missing or incomplete and some of the early also Chaptershad missing pieces. I could have written these latter Chapters and filled the missing pieces, but if Idid so, the work would no longer belong to Jaynes; rather, it would be a Jaynes-Bretthorst hybridwith no way to tell which material came from which author. In the end, I decided that the missingChapters would have to stay missing—the work would remain Jaynes’.There were a number of missing pieces of varying length that Jaynes had marked by insertingthe phrase “MUCH MORE COMING.” I could have left these comments in the text, but theywere ugly and they made the book looks very incomplete. Jaynes intended this book to serve asboth a reference and a text book. Consequently, there are question boxes scattered throughoutmost Chapters. In the end, I decided to replace the “MUCH MORE COMING” comments byintroducing an “editors” question box. If you answer these questions, you will have filled in themissing material. You will be able to identify these questions because I used a shaded box for theeditors questions, while Jaynes’ question boxes are not shaded.Jaynes’ wanted to include a series of computer programs that implemented some of the calculations in this book. I had originally intended to include these programs. But as time went on, itbecame increasingly obvious that many of the programs were not available and the ones that were,were written in a particularly obscure form of BASIC (it was the programs that were obscure, notthe BASIC). Consequently, I removed references to these programs and, where necessary, inserteda few sentences to direct people to the necessary software tools to implement the calculations.Finally, while I am the most obvious person who has worked on getting this book into publication, I am not the only person to do so. Some of Jaynes’ closest friends have assisted me incompleting this work. These include Tom Grandy, Ray Smith, Tom Loredo, Myron Tribus andJohn Skilling, and I would like to thank them for their assistance. I would also like to thank JoeAckerman for allowing me to take the time necessary to get this work published.G. Larry Bretthorst, EditorMay 2002

iiPROBABILITY THEORY – THE LOGIC OF SCIENCEVOLUME I – PRINCIPLES AND ELEMENTARY APPLICATIONSChapter 1 Plausible ReasoningDeductive and Plausible ReasoningAnalogies with Physical TheoriesThe Thinking ComputerIntroducing the RobotBoolean AlgebraAdequate Sets of OperationsThe Basic DesiderataCommentsCommon Language vs. Formal LogicNitpicking113456912151618Chapter 2 The Quantitative RulesThe Product RuleThe Sum RuleQualitative PropertiesNumerical ValuesNotation and Finite Sets PolicyComments“Subjective” vs. “Objective”Gödel’s TheoremVenn DiagramsThe “Kolmogorov Axioms”2121263132383939394243Chapter 3 Elementary Sampling TheorySampling Without ReplacementLogic Versus PropensityReasoning from Less Precise InformationExpectationsOther Forms and ExtensionsProbability as a Mathematical ToolThe Binomial DistributionSampling With ReplacementDigression: A Sermon on Reality vs. ModelsCorrection for CorrelationsSimplificationCommentsA Look Ahead4545525658596061636466717274Chapter 4 Elementary Hypothesis TestingPrior ProbabilitiesTesting Binary Hypotheses with Binary DataNon-Extensibility Beyond the Binary CaseMultiple Hypothesis Testing7777808688

iiiContinuous Probability Distribution Functions (pdf’s)Testing an Infinite Number of HypothesesSimple and Compound (or Composite) HypothesesCommentsEtymologyWhat Have We Accomplished?9597102103103104Chapter 5 Queer Uses For Probability TheoryExtrasensory PerceptionMrs. Stewart’s Telepathic PowersDigression on the Normal ApproximationBack to Mrs. StewartConverging and Diverging ViewsVisual Perception—Evolution into Bayesianity?The Discovery of NeptuneDigression on Alternative HypothesesBack to NewtonHorse racing and Weather ForecastingDiscussionParadoxes of IntuitionBayesian 3124127128128130Chapter 6 Elementary Parameter EstimationInversion of the Urn DistributionsBoth N and R UnknownUniform PriorPredictive DistributionsTruncated Uniform PriorsA Concave PriorThe Binomial Monkey PriorMetamorphosis into Continuous Parameter EstimationEstimation with a Binomial Sampling DistributionDigression on Optional StoppingCompound Estimation ProblemsA Simple Bayesian Estimate: Quantitative Prior InformationFrom Posterior Distribution Function to EstimateBack to the ProblemEffects of Qualitative Prior InformationChoice of a PriorOn With the Calculation!The Jeffreys PriorThe Point of It AllInterval EstimationCalculation of VarianceGeneralization and Asymptotic FormsRectangular Sampling DistributionSmall 56158159160162164166167168170172

ivMathematical TrickeryComments172174Chapter 7 The Central, Gaussian Or Normal DistributionThe Gravitating PhenomenonThe Herschel-Maxwell DerivationThe Gauss DerivationHistorical Importance of Gauss’ ResultThe Landon DerivationWhy the Ubiquitous Use of Gaussian Distributions?Why the Ubiquitous Success?What Estimator Should We Use?Error CancellationThe Near-Irrelevance of Sampling Frequency DistributionsThe Remarkable Efficiency of Information TransferOther Sampling DistributionsNuisance Parameters as Safety DevicesMore General PropertiesConvolution of GaussiansThe Central Limit TheoremAccuracy of ComputationsGalton’s DiscoveryPopulation Dynamics and Darwinian EvolutionEvolution of Humming-Birds and FlowersApplication to EconomicsThe Great Inequality of Jupiter and SaturnResolution of Distributions into GaussiansHermite Polynomial SolutionsFourier Transform RelationsThere is Hope After er 8 Sufficiency, Ancillarity, And All ThatSufficiencyFisher SufficiencyGeneralized SufficiencySufficiency Plus Nuisance ParametersThe Likelihood PrincipleAncillarityGeneralized Ancillary InformationAsymptotic Likelihood: Fisher InformationCombining Evidence from Different SourcesPooling the DataSam’s Broken 31233235Chapter 9 Repetitive Experiments Probability And FrequencyPhysical Experiments241241

vThe Poorly Informed RobotInductionAre There General Inductive Rules?Multiplicity FactorsPartition Function AlgorithmsEntropy AlgorithmsAnother Way of Looking at itEntropy MaximizationProbability and FrequencySignificance TestsComparison of Psi and Chi-SquaredThe Chi-Squared TestGeneralizationHalley’s Mortality TableCommentsSuperstitionsChapter 10 Physics Of “random Experiments”An Interesting CorrelationHistorical BackgroundHow to Cheat at Coin and Die TossingBridge HandsGeneral Random ExperimentsInduction RevisitedBut What About Quantum Theory?Mechanics Under the CloudsMore On Coins and SymmetryIndependence of TossesThe Arrogance of the 71276277279279280281285287289290292293297300

viPROBABILITY THEORY – THE LOGIC OF SCIENCEVOLUME II – ADVANCED APPLICATIONSChapter 11 Discrete Prior Probabilities The Entropy PrincipleA New KindP of2 Prior InformationMinimumpiEntropy: Shannon’s TheoremThe Wallis DerivationAn ExampleGeneralization: A More Rigorous ProofFormal Properties of Maximum-Entropy DistributionsConceptual Problems—Frequency 25Chapter 12 Ignorance Priors And Transformation GroupsWhat Are We Trying to Do?IGNORANCE PRIORSContinuous DistributionsTRANSFORMATION GROUPSLocation and Scale ParametersA Poisson RateUnknown Probability for SuccessBertrand’s er 13 Decision Theory Historical BackgroundInference vs. DecisionDaniel Bernoulli’s SuggestionThe Rationale of InsuranceEntropy and UtilityThe Honest WeathermanReactions to Daniel Bernoulli and LaplaceWald’s Decision TheoryParameter Estimation for Minimum LossReformulation of the ProblemEffect of Varying Loss FunctionsGeneral Decision TheoryCommentsDecision Theory is not FundamentalAnother 71372Chapter 14 Simple Applications Of Decision TheoryDefinitions and PreliminariesSufficiency and InformationLoss Functions and Criteria of Optimum PerformanceA Discrete ExampleHow Would Our Robot Do It?375375377379380385

viiHistorical RemarksThe Widget ProblemComments386388396Chapter 15 Paradoxes Of Probability TheoryHow do Paradoxes Survive and Grow?Summing a Series the Easy WayNonconglomerabilityThe Tumbling TetrahedronsSolution for a Finite Number of TossesFinite vs. Countable AdditivityThe Borel-Kolmogorov ParadoxThe Marginalization ParadoxDiscussionA Useful Result After All?How to Mass-Produce 427428Chapter 16 Orthodox Methods: Historical BackgroundThe Early ProblemsSociology of Orthodox StatisticsRonald Fisher, Harold Jeffreys, and Jerzy NeymanPre-data and Post-data ConsiderationsThe Sampling Distribution for an EstimatorPro-Causal and Anti-Causal BiasWhat is Real; the Probability or the ter 17 Principles And Pathology Of Orthodox StatisticsInformation LossUnbiased EstimatorsPathology of an Unbiased EstimateThe Fundamental Inequality of the Sampling VariancePeriodicity: The Weather in Central ParkA Bayesian Analysis:The Folly of RandomizationFisher: Common Sense at RothamstedMissing DataTrend and Seasonality in Time SeriesThe General 3Chapter 18 The Ap Distribution And Rule Of SuccessionMemory Storage for Old RobotsRelevanceA Surprising ConsequenceOuter and Inner RobotsAn ApplicationLaplace’s Rule of Succession487487489490492494496

viiiJeffreys’ ObjectionBass or Carp?So where does this leave the rule?GeneralizationConfirmation and Weight of EvidenceCarnap’s Inductive MethodsProbability and Frequency in Exchangable SequencesPrediction of FrequenciesOne-Dimensional Neutron MultiplicationThe de Finette apter 19 Physical MeasurementsReduction of Equations of ConditionReformulation as a Decision ProblemThe Underdetermined Case: K is SingularThe Overdetermined Case: K Can be Made NonsingularNumerical Evaluation of the ResultAccuracy of the EstimatesComments519519521523524525526528Chapter 20 Model ComparisonFormulation of the ProblemThe Fair Judge and the Cruel RealistBut Where is the Idea of Simplicity?An Example: Linear Response ModelsCommentsFinal Causes531531533534536541542Chapter 21 Outliers And RobustnessThe Experimenter’s DilemmaRobustnessThe Two-Model ModelExchangeable SelectionThe General Bayesian SolutionPure OutliersOne Receding Datum543543544546547548550551Chapter 22 Introduction To Communication TheoryOrigins of the TheoryThe Noiseless ChannelThe Information SourceDoes the English Language have Statistical Properties?Optimum Encoding: Letter Frequencies KnownBetter Encoding From Knowledge of Digram FrequenciesRelation to a Stochastic ModelThe Noisy ChannelFixing a Noisy Channel553553554559561562565568571571References575

ixAppendix A Other Approaches To Probability TheoryThe Kolmogorov System of ProbabilityThe de Finetti System of ProbabilityComparative ProbabilityHoldouts Against Universal ComparabilitySpeculations About Lattice Theories619619623624626627Appendix B Mathematical Formalities And StyleNotation and Logical HierarchyOur “Cautious Approach” PolicyWilly Feller on Measure TheoryKronecker vs. WeierstraszWhat is a Legitimate Mathematical Function?Counting Infinite Sets?The Hausdorff Sphere Paradox and Mathematical DiseasesWhat Am I Supposed to Publish?Mathematical Courtesy629629630631633635640641643643Appendix C Convolutions And CumulantsRelation of Cumulants and MomentsExamples647649650

xPREFACEThe following material is addressed to readers who are already familiar with applied mathematicsat the advanced undergraduate level or preferably higher; and with some field, such as physics,chemistry, biology, geology, medicine, economics, sociology, engineering, operations research, etc.,where inference is needed.† A previous acquaintance with probability and statistics is not necessary;indeed, a certain amount of innocence in this area may be desirable, because there will be less tounlearn.We are concerned with probability theory and all of its conventional mathematics, but nowviewed in a wider context than that of the standard textbooks. Every Chapter after the first has“new” (i.e. not previously published) results that we think will be found interesting and useful.Many of our applications lie outside the scope of conventional probability theory as currentlytaught. But we think that the results will speak for themselves, and that something like the theoryexpounded here will become the conventional probability theory of the future.History: The present form of this work is the result of an evolutionary growth over many years. Myinterest in probability theory was stimulated first by reading the work of Harold Jeffreys (1939) andrealizing that his viewpoint makes all the problems of theoretical physics appear in a very differentlight. But then in quick succession discovery of the work of R. T. Cox (1946), C. E. Shannon (1948)and G. Pólya (1954) opened up new worlds of thought, whose exploration has occupied my mindfor some forty years. In this much larger and perma

E. T. Jaynes died April 30, 1998. Before his death he asked me to nish and publish his book on probability theory. I struggled with this for some time, because there is no doubt in my mind that Jaynes wanted this book nished. Unfortunately, most of the later Chapters, Jaynes’ intended volume 2 on applications, were either missing or incomplete and some of the early also Chapters had missing .