105 Hash And Prob Pub - Department Of Computer Science

Transcription

Hash tables & probabilityBen LangmeadDepartment of Computer SciencePlease sign guestbook (www.langmead-lab.org/teaching-materials)to tell me briefly how you are using the slides. For original Keynotefiles, email me (ben.langmead@gmail.com).

Hash Table"Hashing with chaining" or "chain hashing"KeyValuePointerNullPointer

Hash FunctionUNnAssumeaccessingtable slot is O(1)Assume hash function operates on anyitem from U (integers, strings, etc) and isO(1) time

Hash TableWhat "abstract data types"can we implement with this?mapsetcounter k1, v1 k2, v2 k3, v3 k4, v4 k1 k2 k3 k4 k1, 7 k2, 4 k3, 8 k4, 5

Hash TableI add m items to an n-bucket hash tableWithout probability, what can I say?QuestionDoes any bucket havemore the one item?Assumption Statementm nCommentYesPigeonholeprincipleIs any bucket empty?m nYes"Emptypigeonhole"principleWhat is the averagebucket occupancy?-m/n-Nothing profound here

Hash TableI have added m items to a n-bucket hash table. What"interesting questions" can I ask about the table's state?How many bucketsare empty?How manyitems are in themedian bucket?How many itemsare in the averagebucket?What's the chanceall buckets arenon-empty?How many itemsare in the fullestbucket?What's the chanceno bucket has 1item?m: # itemsn: # buckets

ProbabilitySample space (Ω) is set of all possibleoutcomesΩE.g. Ω { all possible rolls of 2 dice }An event is a subset of ΩA { rolls where 1st die is odd }B { rolls where 2nd die is even }When outcomes are equally likely, can use"naive definition of probability"Pr(A): fraction of outcomes that are in APr(A) A / Ω 18/36 0.5Die 21 2 3 4 5 6Die 1123456

Probability“Naive definition” of probability fails to applywhen outcomes are not equally probableLoaded coin# goals scored insoccer game

Probability function PrPr : 𝒫(Ω) ℝ , where 𝒫(Ω) is "power set"(set of all subsets) of Ω, satisfies conditions:1. For any event E, 0 Pr(E) 12. Pr(Ω) 13. Probabilities of disjoint events E1, E2, . . . add:Pr i 1Ei i 1Pr(Ei)

Probability function PrSample spaceΩoutcomessetPr01/6 2/6 3/6 4/6 5/6probabilityfunction1reals in [0, 1]

Probability function PrΩ"roll one die"setPrProbabilities of disjointevents add;Pr({,}) 1/301/6probabilityfunction2/63/64/65/61

Random variableRandom variables have two "natures"XFunction, mappingoutcomes from Ω tonumbers (in IR)X() 4Y 3.5 XPotential experiment witha distribution (a Pr for its Ω)and numerical result

Random variableWe use capitals e.g. X, Y to denote arandom variableAbbreviate with "r.v."

Random variable & probability functionΩ"roll one die"setXrandomvariablereals (IR)1234Pr5601/6 2/6 3/6 4/6 5/6probabilityfunction1Pr(X 2) Pr( ) 1/6reals in [0, 1]

Random variable & probability functionΩX1234Pr5601/6 2/6 3/6 4/6 5/61Pr(X 2) Pr( ) 1/6

Random variable & probability functionΩX1234Pr5601/6 2/6 3/6 4/6 5/61Pr(X 4) Pr( ) Pr( ) Pr( ) 1/2

Random variable & probability functionΩX1234Y 3.5 XPr5601/6 2/6 3/6 4/6 5/61

Random variable & probability functionΩY-3 -2 -1 0 1 2 3Y 3.5 XPr01/6 2/6 3/6 4/6 5/61

Expected valueExpectation ("expected value") of a discrete r.v. X,called E[X], is given byE[X] x Pr(X x)xwhere summation is over values in range of X.

Linearity of expectationFor discrete r.v.s X1, X2, . . . , XnEnnXi E[Xi] [ ]i 1i 1True whether or not Xis are independent

Expected valueZZ X YPrwhere X is fair die roll &Y is fair coin flipWhen Z is a linear combination of other r.v.s,E[Z] can be easier to get than PrE[Z] E[X] E[Y] is simple (3.5 0.5 4)

Hash TableI have added m items to a n-bucket hash tableBesides this setup, what else do we need todefine a random variable describing the table?1. Sample space Ω2. Probability func. Pr3. Map X fromoutcomes to realsPossible allocation ofitems to bucketsDepend on questionasked, assumptions madeabout hash function

Balls & binsThrow m balls into n bins uniformly and independentlyΩ2,3"2 balls in 3 bins"

Hash TableI have added m items to a n-bucket hash table. What"interesting questions" can I ask about the table's state?How many bucketsare empty?How manyitems are in themedian bucket?How many itemsare in the averagebucket?What's the chanceall buckets arenon-empty?How many itemsare in the fullestbucket?What's the chanceno bucket has 1item?m: # itemsn: # buckets

Balls & binsI throw m balls into n bins uniformly and independently.What can I ask about the bins and their contents?How many bins areempty?How many ballsare in themedian bin?How many ballsare in the averagebin?What's the chanceall bins are nonempty?How many ballsare in the fullestbin?What's the chanceno bin has 1item?m: # itemsn: # buckets

Balls & binsI throw m balls into n bins uniformly andindependently. What can I ask?CategoryQuestionsEmpty/non emptyHow manybuckets areempty?Collisions /no collisionsHow many throwsuntil there is a 0.5chance of a collision?Local(single bin)occupancyGlobaloccupancyWhat's the chanceall buckets arenon-empty?What is thechance no binhas 1 item?ApproachCouponcollectorBirthdayproblemWhat's the occupancyof a given bucket?What is the chancea given bucket has 2 items?Binomial,Geometric,Poisson r.v.sWhat is themedian bucketoccupancy?What is themaximum bucketoccupancy?Often hard

Empty/ non empty How many buckets are empty? Collisions / no collisions How many throws until there is a 0.5 chance of a collision? What's the chance all buckets are non-empty? Coupon collector Birthday problem Local (single bin) occupancy Binomial, Geometric, Poisson r.v.s Global occupancy What is the median bucket occupancy? What is the .