RELATIONAL ALGEBRA II - California Institute Of Technology

Transcription

RELATIONAL ALGEBRA IICS121: Relational DatabasesFall 2018 – Lecture 3

Last Lecture2 Query languages provide support for retrievinginformation from a databaseIntroduced the relational algebraA procedural query language Six fundamental operations: n select, project, set-union, set-difference,Cartesian product, renameSeveral additional operations, built upon thefundamental operationsnset-intersection, natural join, division, assignment

Extended Operations3 Relational algebra operations have been extendedin various waysMore generalized More useful! Three major extensions:Generalized projection Aggregate functions Additional join operations All of these appear in SQL standards

Generalized Projection Operation4 Would like to include computed results into relationse.g. “Retrieve all credit accounts, computing the current‘available credit’ for each account.” Available credit credit limit – current balance Project operation is generalized to include computedresultsCan specify functions on attributes, as well as attributesthemselves Can also assign names to computed values (Renaming attributes is also allowed, even though this is alsoprovided by the r operator)

Generalized Projection5 Written as: P F , F , , F (E)12n Fiare arithmetic expressions E is an expression that produces a relation Can also name values: Fi as name Can use to provide derived attributes Values are always computed from other attributes stored indatabaseAlso useful for updating values in database (more on this later)

Generalized Projection Example6 “Compute available credit for every creditaccount.”Pcred id, (limit – balance) as available credit(credit acct)cred red idavailable 275credit acct

Aggregate Functions7 Very useful to apply a function to a collection ofvalues to generate a single resultMost common aggregate functions:sumavgcountminmax sums the values in the collectioncomputes average of values in the collectioncounts number of elements in the collectionreturns minimum value in the collectionreturns maximum value in the collectionAggregate functions work on multisets, not sets A value can appear in the input multiple times

Aggregate Function Examples8“Find the total amount owedto the credit company.”Gsum(balance)(credit acct)4275cred 50600350025credit acct“Find the maximum available credit of any account.”Gmax(available credit)(P(limit – balance) as available credit(credit acct))11500

Grouping and Aggregation9 Sometimes need to compute aggregates on aper-item basispuzzle nameBack to the puzzle database:altekrusepuzzle list(puzzle name)completed(person name, puzzle name) Examples:How many puzzles haseach person completed? How many people havecompleted each puzzle? soma cubepuzzle boxpuzzle listperson namepuzzle nameAlexAlexBobCarlBobCarlAlexCarlaltekrusesoma cubepuzzle boxaltekrusesoma cubepuzzle boxpuzzle boxsoma cubecompleted

Grouping and Aggregation (2)10puzzle nameperson namepuzzle namealtekrusesoma cubepuzzle boxpuzzle listAlexAlexBobCarlBobCarlAlexCarlaltekrusesoma cubepuzzle boxaltekrusesoma cubepuzzle boxpuzzle boxsoma cubecompleted“How many puzzles has eachperson completed?”person nameGcount(puzzle name)(completed) First, input relation completed is grouped by unique values ofperson nameThen, count(puzzle name) is applied separately to each group

Grouping and Aggregation (3)11person nameGcount(puzzle name)(completed)Input relation isgrouped by person nameperson namepuzzle nameAlexAlexAlexaltekrusesoma cubepuzzle boxBobBobpuzzle boxsoma cubeCarlCarlCarlaltekrusepuzzle boxsoma cubeAggregate function isapplied to each groupperson nameAlex3Bob2Carl3

Distinct Values12Sometimes want to compute aggregates over sets ofvalues, instead of multisetsExample: Chage puzzle database to include a completed timesrelation, which records multiple solutions of a puzzleHow many puzzles haseach person completed? Using completed timesrelation this timeperson namepuzzle nameAlexAlexBobCarlBobAlexaltekrusesoma cubepuzzle boxaltekrusepuzzle boxaltekruseseconds35045240285215290completed times

Distinct Values (2)13“How many puzzles has each person completed?” Each puzzle appearsmultiple times now.person namepuzzle nameAlexAlexBobCarlBobAlexaltekrusesoma cubepuzzle boxaltekrusepuzzle boxaltekruseseconds35045240285215290completed times Need to count distinct occurrences of each puzzle’snameperson nameGcount-distinct(puzzle name)(completed times)

Eliminating Duplicates14 Can append -distinct to any aggregate function tospecify elimination of duplicatesUsually used with count: count-distinct Makes no sense with min, max

General Form of Aggregates15 General form:G F (A ), F (A ), , F (A ) (E)G1, G2, , Gn1122mmE evalutes to a relation Leading Gi are attributes of E to group on Each Fj is aggregate function applied to attribute Aj of E First, input relation is divided into groups If no attributes Gi specified, no grouping is performed(it’s just one big group)Then, aggregate functions applied to each group

General Form of Aggregates (2)16 General form: G , G , , G GF (A ), F (A ), , F (A )(E)Tuples in E are grouped such that:12n1122mmAll tuples in a group have same values for attributesG1, G2, , Gn Tuples in different groups have different values forG1, G2, , Gn Thus, the values {g1, g2, , gn} in each groupuniquely identify the group {G1, G2, , Gn} are a superkey for the result relation

General Form of Aggregates (3)17 General form: G , G , , G GF (A ), F (A ), , F (A )(E)Tuples in result have the form:12n1122mm{g1, g2, , gn, a1, a2, , am} gi are values for that particular group aj is result of applying Fj to the multiset of values of Ajin that group Important note: Fj(Aj) attributes are unnamed!Informally we refer to them as Fj(Aj) in results, butthey have no name. Specify a name, same as before: Fj(Aj) as attr name

One More Aggregation Example18puzzle nameperson namepuzzle namealtekrusesoma cubepuzzle boxpuzzle listAlexAlexBobCarlBobCarlAlexCarlaltekrusesoma cubepuzzle boxaltekrusesoma cubepuzzle boxpuzzle boxsoma cubecompleted“How many people havecompleted each puzzle?”puzzle nameGcount(person name)(completed) What if nobody has tried a particular puzzle? Won’t appear in completed relation

One More Aggregation Example19puzzle nameperson namepuzzle namealtekrusesoma cubepuzzle boxclutch boxAlexAlexBobCarlBobCarlAlexCarlaltekrusesoma cubepuzzle boxaltekrusesoma cubepuzzle boxpuzzle boxsoma cubecompletedpuzzle list New puzzle added topuzzle list relationWould like to see { “clutch box”, 0 } in result “clutch box” won’t appear in result! Joining the two tables doesn’t help either Natural join won’t produce any rows with “clutch box”

Outer Joins20 Natural join requires that both left and right tableshave a matching tupler Outer join is an extension of join operation s PR È S(sr.A1 s.A1 Ù r.A2 s.A2 Ù Ù r.An s.An(r s))Designed to handle missing informationMissing information is represented by null values inthe result null unknown or unspecified value

Forms of Outer Join21 Left outer join: rsIf a tuple tr Î r doesn’t match any tuple in s,result contains { tr, null, , null } If a tuple ts Î s doesn’t match any tuple in r, it’sexcluded Right outer join: rsIf a tuple tr Î r doesn’t match any tuple in s, it’sexcluded If a tuple ts Î s doesn’t match any tuple in r,result contains { null, , null, ts }

Forms of Outer Join (2)22 Full outer join: r sIncludes tuples from r that don’t match s,as well as tuples from s that don’t match rSummary:rr srattr1attr2abcr1r2r3s s4

Effects of null Values23 Introducing null values affects everything! null means “unknown” or “nonexistent”Must specify effect on results when null is presentThese choices are somewhat arbitrary (Read your database user’s manual! J) Arithmetic operations ( , –, *, /) involving null alwaysevaluate to null (e.g. 5 null null)Comparison operations involving null evaluate tounknownunknown is a third truth-value Note: Yes, even null null evaluates to unknown.

Boolean Operators and unknown24 andtrue Ù unknown unknownfalse Ù unknown falseunknown Ù unknown unknown ortrue Ú unknown truefalse Ú unknown unknownunknown Ú unknown unknown not unknown unknown

Relational Operations25 For each relational operation, need to specifybehavior with respect to null and unknownSelect: sP(E) If P evaluates to unknown for a tuple, that tuple is excludedfrom result (i.e. definition of s doesn't change)Natural join: rsIncludes a Cartesian product, then a select If a common attribute has a null value, tuples are excludedfrom join result Why? nnull (anything) evaluates to unknown

Project and Set-Operations26 Project: P(E)Project operation must eliminate duplicates null value is treated like any other value Duplicate tuples containing null values are also eliminated Union, Intersection, and Differencenull values are treated like any other value Set union, intersection, difference computed as expected These choices are somewhat arbitrarynull means “value is unknown or missing” but in these cases, two null values are considered equal. Technically, two null values aren’t the same. (oh well)

Grouping and Aggregation27 In grouping phase:null is treated like any other value If two tuples have same values (including null) on thegrouping attributes, they end up in same group In aggregation phase: null values are removed from the input multiset before theaggregate function is applied!n Slightly different from arithmetic behavior; it keeps one null valuefrom wiping out an aggregate computation.If the aggregate function gets an empty multiset for input,the result is null n except for count! In that case, count returns 0.

Generalized Projection, Outer Joins28 Generalized Projection operation:A combination of simple projection and arithmeticoperations Easy to figure out from previous rules Outer joins: Behave just like natural join operation, except forpadding missing values with null

Back to Our Puzzle!29“How many peoplehave completedeach puzzle?”puzzle namealtekrusesoma cubepuzzle boxclutch boxperson namepuzzle nameAlexAlexBobCarlBobCarlAlexCarlaltekrusesoma cubepuzzle boxaltekrusesoma cubepuzzle boxpuzzle boxsoma cubecompletedpuzzle nameperson namealtekrusesoma cubepuzzle boxaltekrusesoma cubepuzzle boxpuzzle boxsoma cubeclutch boxAlexAlexBobCarlBobCarlAlexCarlnullpuzzle list Use an outer join to include allpuzzles, not just solved onespuzzle listcompleted

Counting the Solutions30 Now, use grouping and aggregationGroup on puzzle name Count up the people!puzzle nameGcount(person name)(puzzle list completed)puzzle nameperson namepuzzle nameperson namepuzzle namealtekrusesoma cubepuzzle boxaltekrusesoma cubepuzzle boxpuzzle boxsoma cubeclutch kruseAlexCarlsoma cubesoma cubesoma cubeAlexBobCarlaltekrusesoma cubepuzzle boxclutch boxpuzzle boxpuzzle boxpuzzle boxBobCarlAlexclutch boxnull2330

Database Modification31 Often need to modify data in a databaseCan use assignment operator for thisOperations:r rÈE r r–E r P(r) Insert new tuples into a relationDelete tuples from a relationUpdate tuples already in the relationRemember: r is a relation-variableAssignment operator assigns a new relation-value to r Hence, RHS expression may need to include existingversion of r, to avoid losing unchanged tuples

Inserting New Tuples32 Inserting tuples simply involves a union:r rÈE E has to have correct arity Can specify actual tuples to insert:completed completed È{ (“Bob”, “altekruse”), (“Carl”, “clutch box”) } Adds two new tuples to completed relation Can specify constant relations as a set of valuesEach tuple is enclosed with parentheses Entire set of tuples enclosed with curly-braces constantrelation

Inserting New Tuples (2)33 Can also insert tuples generated from an expressionExample:“Dave is joining the puzzle club. He has done everypuzzle that Bob has done.” Find out puzzles that Bob has completed, then constructnew tuples to add to completed

Inserting New Tuples (3)34 How to construct new tuples with name “Dave” andeach of Bob’s puzzles? Could use a Cartesian product:{ (“Dave”) } Ppuzzle name(sperson name “Bob”(completed)) Or, use generalized projection with a constant:P“Dave” as person name, puzzle name(sperson name “Bob”(completed)) Add new tuples to completed relation:completed completed ÈP“Dave” as person name, puzzle name(sperson name “Bob”(completed))

Deleting Tuples35 Deleting tuples uses the – operation:r r–E puzzle nameExample:altekrusesoma cubepuzzle boxGet rid of the “soma cube” puzzle.Problem:completed relation referencesthe puzzle list relationn To respect referential integrityconstraints, should delete fromcompleted first.npuzzle listperson namepuzzle nameAlexAlexBobCarlBobCarlAlexCarlaltekrusesoma cubepuzzle boxaltekrusesoma cubepuzzle boxpuzzle boxsoma cubecompleted

Deleting Tuples (2)36 completed references puzzle listpuzzle name is a key completed shouldn’t have any values for puzzle name thatdon’t appear in puzzle list Delete tuples from completed first. Then delete tuples from puzzle list. completed completed – spuzzle name “soma cube”(completed)puzzle list puzzle list – spuzzle name “soma cube”(puzzle list)Of course, could also write:completed spuzzle name “soma cube”(completed)

Deleting Tuples (3)37 In the relational model, we have to think aboutforeign key constraints ourselves Relational database systems take care of thesethings for us, automatically. Will explore the various capabilities and options in afew weeks

Updating Tuples38 General form uses generalized projection:r P F , F , , F (r)12n Updates all tuples in r Example:acct idbranch namebalanceA-301A-307A-318A-319A-322New YorkSeattleLos AngelesNew YorkLos Angeles35027555080275“Add 5% interest to all bank account balances.”accountaccount Pacct id, branch name, balance*1.05(account)Note: Must include unchanged attributes too Otherwise you will change the schema of account

Updating Some Tuples39 Updating only some tuples is more verboseRelation-variable is set to the entire result of the evaluation Must include both updated tuples, and non-updated tuples,in result Example:“Add 5% interest to accounts with a balance less than 10,000.”account Pacct id, branch name, balance*1.05(sbalance 10000(account)) Èsbalance 10000(account)

Updating Some Tuples (2)40Another example:“Add 5% interest to accounts with a balance less than 10,000, and 6% interest to accounts with a balanceof 10,000 or more.”account Pacct id,branch name,balance*1.05(sbalance 10000(account)) ÈPacct id,branch name,balance*1.06(sbalance 10000(account)) Don’t forget to include any non-updated tuples inyour update operations!

Relational Algebra Summary41 Very expressive query language for retrievinginformation from a relational databaseSimple selection, projection Computing correlations between relations using joins Grouping and aggregation operations Can also specify changes to the contents of arelation-variable Inserts, deletes, updatesThe relational algebra is a procedural querylanguage State a sequence of operations for computing a result

Relational Algebra Summary (2)42 Benefit of relational algebra is that it can be formallyspecified and reasoned aboutDrawback is that it is very verbose!Database systems usually provide much simpler querylanguages Most popular by far is SQL, the Structured Query LanguageHowever, many databases use relational algebra-likeoperations internally! Great for representing execution plans, due to itsprocedural nature

Next Time43 Transition from relational algebra to SQLStart working with “real” databases J

Aggregate Functions Very useful to apply a function to a collection of values to generate a single result Most common aggregate functions: sum sums the values in the collection avg computes average of values in the collection count counts number of elements in the collection min returns minimum value in the collection max returns maximum value in the collection