Lecture 3 SQL -2 - Courses.cs.duke.edu

Transcription

9/3/17Announcements HW1 reminder:CompSci 516– Due on 09/21 (Thurs), 11:55 pm, no late daysData Intensive Computing Systems Project proposal reminder:– Due on 09/20 (Wed), 11:55 pm– Team and initial idea due on 09/14 (Wed), 11:55pm by emailLecture 3SQL - 2 Your piazza, sakai, gradiance accounts shouldbe active– Occasional Pop up quizzes will start from next week– Bring a laptop in class– Part of class participationInstructor: Sudeepa RoyDuke CS, Fall 2017CompSci 516: Database Systems1Duke CS, Fall 2017Recap: Lecture 2 More SQL– semantic– joins– differences with relational model andtransformation– group bys and aggregates– nested queries SQL– Creating/modifying relations– Specifying integrity constraints– Key/candidate key, superkey, primary key, foreignkeyCompSci 516: Database Systems3Duke CS, Fall 2017Basic SQL QuerySELECTFROMWHERE4[DISTINCT] target-list relation-list qualification Semantics of an SQL query defined in terms of the followingconceptual evaluation strategy:–– possibly with a “range variable” after each name–––– (Attr op const) or (Attr1 op Attr2)– where op is one of , , , , combined using AND, OR and NOTDISTINCT is an optional keyword indicating that the answer should notcontain duplicates– Default is that duplicates are not eliminated!CompSci 516: Database SystemsCompSci 516: Database SystemsSELECTFROMWHERE relation-list qualification target-list A list of attributes of relations in relation-list qualification ComparisonsDuke CS, Fall 2017Acknowledgement:The following slides have been created adapting theinstructor material of the [RG] book provided by the authorsDr. Ramakrishnan and Dr. Gehrke.Conceptual Evaluation Strategy[DISTINCT] target-list relation-list A list of relation names 2Today’s topic XML overviewDuke CS, Fall 2017CompSci 516: Database Systems5Compute the cross-product of relation-list Discard resulting tuples if they fail qualifications Delete attributes that are not in target-list If DISTINCT is specified, eliminate duplicate rows This strategy is probably the least efficient way to compute aquery!– An optimizer will find more efficient strategies to compute thesame answersDuke CS, Fall 2017CompSci 516: Database Systems61

9/3/17Example of Conceptual EvaluationExample of Conceptual EvaluationSailorSELECT S.snameFROM Sailors S, Reserves RWHERE S.sid R.sid AND R.bid 103Step 1: Form cross product of Sailor and Reservessidsname ratingage22dustin 74531lubber 858rusty10SailorSELECT S.snameFROM Sailors S, Reserves RWHERE S.sid R.sid AND R.bid 103sidsname ratingage22dustin 7455531lubber 8553558rusty35Step 2: Discard tuples that do not satisfy qualification Reserves10Reservessidsnamerating agesidbiddaysid bid daysidsnamerating agesidbiddaysid bid day22dustin7452210110/10/9622101 10/10/9622dustin7452210110/10/9622101 10/10/9622dustin7455810311/12/9658103 11/12/9622dustin7455810311/12/9658103 Duke CS, Fall 2016CompSci 516: Data Intensive Computing SystemsDuke CS, Fall 2016A Note on “Range Variables”Example of Conceptual EvaluationSailorSELECT S.snameFROM Sailors S, Reserves RWHERE S.sid R.sid AND R.bid 103sidsname ratingage22dustin 74531lubber 85558rusty35Step 3: Select the specified attribute(s)10snamerating agesidbiddaysid bid day22dustin7452210110/10/9622101 10/10/9622dustin7455810311/12/9658103 96Duke CS, Fall 2016 Really needed only if the same relation appears twicein the FROM clause– sometimes used as a short-name The previous query can also be written as:Reservessid58SELECT S.snameFROM Sailors S, Reserves RWHERE S.sid R.sid AND bid 103ORCompSci 516: Data Intensive Computing SystemsSELECT ?FROM Sailors S, Reserves RWHERE S.sid R.sidDuke CS, Fall 2017CompSci 516: Database Systems10Find sailor ids who’ve reservedat least one boatSailorsidsname ratingage22dustin 74531lubber 85558rusty3510SELECT ?FROM Sailors S, Reserves RWHERE S.sid R.sid Would adding DISTINCT to thisquery make a difference?Reservessid bid dayCompSci 516: Database SystemsIt is good style,however, to userange variablesalways!SELECT snameFROM Sailors, ReservesWHERE Sailors.sid Reserves.sidAND bid 103Find sailor ids who’ve reservedat least one boatSailorDuke CS, Fall 2017CompSci 516: Data Intensive Computing Systemssidsname ratingage22dustin 74531lubber 85558rusty3510Reservessid bid day22101 10/10/9622101 10/10/9658103 11/12/9658103 11/12/9611Duke CS, Fall 2017CompSci 516: Database Systems122

9/3/17Joins ht/Full) Outer-JoinDuke CS, Fall 2017Condition/Theta Joinsidsname ratingage22dustin 74531lubber 85558rusty3510SELECT *FROM Sailors S, Reserves RWHERE S.sid R.sid and age 40agedustin 74531lubber 85558rusty3510sid bid daysidsnamerating agesidbiddaysid bid day22101 10/10/9622dustin7452210110/10/9622101 10/10/9658103 11/12/9622dustin7455810311/12/9658103 2/9613Duke CS, Fall 2017CompSci 516: Database SystemsEqui JoinA special case of theta joinJoin condition only has equality predicate sname rating22Form cross product, discard rows that do not satisfy the conditionCompSci 516: Database SystemsSELECT *FROM Sailors S, Reserves RWHERE S.sid R.sid and age 45sid14Natural Joinsidsname ratingage22dustin 74531lubber 85558rusty3510SELECT *FROM Sailors S NATURAL JOIN Reserves RA special case of equi joinEquality condition on ALL common predicates (sid)Duplicate columns are eliminatedsidsname ratingage22dustin 74531lubber 85558rusty3510sidsnamerating agesidbiddaysid bid daysidsnamerating agebiddaysid bid day22dustin7452210110/10/9622101 10/10/9622dustin74510110/10/9622101 10/10/9622dustin7455810311/12/9658103 11/12/9622dustin74510311/12/9658103 sty10355810311/12/9658rusty103510311/12/96Duke CS, Fall 2017CompSci 516: Database Systems15Outer JoinSELECT S.sid, R. bidFROM Sailors S LEFT OUTER JOIN Reserves RON S.sid R.sidbid2210131null58103Duke CS, Fall 2017CompSci 516: Database Systems16Expressions and Stringssidsname ratingage22dustin 74531lubber 85558rusty35Preserves all tuples from the left table whether or not there is a matchif no match, fill attributes from right with nullsid bidSimilarly RIGHT/FULL outer joinsidDuke CS, Fall 201710day22101 10/10/9658103 11/12/96SELECT S.age, age1 S.age-5,FROM Sailors SWHERE S.sname LIKE ‘B %B’2*S.age AS age2 Illustrates use of arithmetic expressions and string pattern matching Find triples (of ages of sailors and two fields defined by expressions)for sailors– whose names begin and end with B and contain at least three characters LIKE is used for string matching. ’ stands for any one characterand %’ stands for 0 or more arbitrary characters– You will need these oftenCompSci 516: Database Systems17Duke CS, Fall 2017CompSci 516: Database Systems183

9/3/17Find sid’s of sailors who’ve reserved a red or aSailors (sid, sname, rating, age)green boatReserves(sid, bid, day)Boats(bid, bname, color) Assume a Boats relation UNION: Can be used tocompute the union of anytwo union-compatible sets oftuples– can themselves be the result ofSQL queries If we replace OR by AND in thefirst version, what do we get? Also available: EXCEPT (Whatdo we get if we replace UNIONby EXCEPT?)Find sid’s of sailors who’ve reserveda red and a green boatSELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE .SELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE UNIONSELECT S.sidFROM Sailors S, Boats B, Reserves RWHERE .Duke CS, Fall 2016Duke CS, Fall 2016CompSci 516: Data Intensive Computing SystemsCompSci 516: Data Intensive Computing SystemsFind sid’s of sailors who’ve reserveda red and a green boat INTERSECT: Can be used tocompute the intersection ofany two union-compatiblesets of tuples.– Included in the SQL/92standard, but some systemsdon’t support itDuke CS, Fall 2016Sailors (sid, sname, rating, age)Reserves(sid, bid, day)Boats(bid, bname, color)Sailors (sid, sname, rating, age)Reserves(sid, bid, day)Boats(bid, bname, color)SELECT S.snameFROM Sailors SWHERE S.sid IN (SELECT R.sidFROM Reserves RWHERE R.bid 103)SELECT S.sidFROM .SELECT S.sidFROM INTERSECTSELECT S.sidFROM .Nested QueriesFind names of sailors who’ve reserved boat #103:Sailors (sid, sname, rating, age)Reserves(sid, bid, day)Boats(bid, bname, color) A very powerful feature of SQL:Key field!– a WHERE/FROM/HAVING clause can itself contain an SQL query To find sailors who’ve not reserved #103, use NOT IN. To understand semantics of nested queries, think of anested loops evaluation– For each Sailors tuple, check the qualification by computing thesubqueryCompSci 516: Data Intensive Computing SystemsDuke CS, Fall 2017CompSci 516: Database Systems22Nested Queries with CorrelationNested Queries with CorrelationFind names of sailors who’ve reserved boat #103:Find names of sailors who’ve reserved boat #103:SELECT S.snameFROM Sailors SWHERE EXISTS (SELECT *FROM Reserves RWHERE R.bid 103 AND S.sid R.sid)SELECT S.snameFROM Sailors SWHERE UNIQUE (SELECT R.bidFROM Reserves RWHERE R.bid 103 AND S.sid R.sid) EXISTS isanother set comparison operator, like IN Illustrates why, in general, subquery must be recomputed for each Sailors tupleDuke CS, Fall 2017CompSci 516: Database Systems If UNIQUE is used, and * is replaced by R.bid, findssailors with at most one reservation for boat #103– UNIQUE23Duke CS, Fall 2017checks for duplicate tuplesCompSci 516: Database Systems244

9/3/17Aggregate OperatorsMore on Set-Comparison Operators We’ve already seen IN, EXISTS and UNIQUE Can also use NOT IN, NOT EXISTS and NOT UNIQUE. Also available: op ANY, op ALL, op INSELECT COUNT (*)FROM Sailors S– where op : , , , , Find sailors whose rating is greater than that of somesailor called Horatio– similarly ALLSELECT *FROM Sailors SWHERE S.rating ANY (SELECT S2.ratingFROM Sailors S2WHERE S2.sname ‘Horatio’)Duke CS, Fall 2017CompSci 516: Database SystemsSELECT AVG (S.age)FROM Sailors SWHERE S.rating 1025Duke CS, Fall 2016 Consider: Find the age of the youngest sailor for eachrating levelIn general, we don’t know how many rating levels exist, andwhat the rating values for these levels are!Suppose we know that rating values go from 1 to 10; we canwrite 10 queries that look like this (need to replace i by num):Duke CS, Fall 2017SELECT MIN (S.age)FROM Sailors SWHERE S.rating iCompSci 516: Database SystemsSELECT[DISTINCT] P BY grouping-listHAVING group-qualificationConceptual EvaluationThe cross-product of relation-list is computedTuples that fail qualification are discarded Unnecessary’ fields are deletedThe remaining tuples are partitioned into groups by the value ofattributes in grouping-list The group-qualification is then applied to eliminate some groups Expressions in group-qualification must have a single value pergroup – The target-list contains– (i) attribute names– (ii) terms with aggregate operations (e.g., MIN (S.age)) The attribute list (i) must be a subset of grouping-list––27First go over the examples in the following slidesThen come back to this slide and study yourself–In effect, an attribute in group-qualification that is not an argument of anaggregate op also appears in grouping-listlike “ GROUP BY bid, sid HAVING bid 3”Intuitively, each answer tuple corresponds to a group, and these attributesmust have a single value per groupHere a group is a set of tuples that have the same value for all attributes ingrouping-listDuke CS, Fall 2017CompSci 516: Database Systems29CompSci 516: Database Systems28Find age of the youngest sailor with age 18, for each rating with atleast 2 such sailors.SELECT S.rating, MIN (S.age) AS minageFROM Sailors SWHERE S.age 18GROUP BY S.ratingHAVING COUNT (*) 1Answer relation:One answer tuple is generated per qualifying groupDuke CS, Fall 2017CompSci 516: Data Intensive Computing SystemsQueries With GROUP BY and HAVING– Sometimes, we want to apply them to each of several groupsof tuplesFor i 1, 2, . , 10:SELECT AVG ( DISTINCT S.age)FROM Sailors SWHERE S.rating 10First go over the examples in the following slidesThen come back to this slide and study yourself So far, we’ve applied aggregate operators to all(qualifying) tuples–single columnSELECT S.snameFROM Sailors SWHERE S.rating (SELECT MAX(S2.rating)FROM Sailors S2)SELECT COUNT (DISTINCT S.rating)FROM Sailors SWHERE S.sname ‘Bob’Motivation for Grouping–COUNT (*)COUNT ( [DISTINCT] A)SUM ( [DISTINCT] A)AVG ( [DISTINCT] A)MAX (A)MIN (A)Check yourself:What do these queries compute?Duke CS, Fall 2017rating378minage25.535.025.5Sailors instance:sid2229313258647174859596CompSci 516: Database Systemssname rating agedustin7 45.0brutus1 33.0lubber8 55.5andy8 25.5rusty10 35.0horatio7 35.0zorba10 16.0horatio9 35.0art3 25.5bob3 63.5frodo3 25.5305

9/3/17Find age of the youngest sailor with age 18, for each rating with atSELECT S.rating, MINleast 2 such sailors.(S.age) AS minageFROM Sailors SWHERE S.age 18GROUP BY S.ratingHAVING COUNT (*) 1Step 1: Form the cross product: FROM clause(some attributes are omitted for 5.035.016.035.025.563.525.5Duke CS, Fall 2017CompSci 516: Database Systems31Find age of the youngest sailor with age 18, for each rating with atSELECT S.rating, MINleast 2 such sailors.Step 3: Apply GROUP BY according to the listed .033.055.525.535.035.016.035.025.563.525.5Duke CS, Fall .525.535.035.0(S.age) AS minageFROM Sailors SWHERE S.age 18GROUP BY S.ratingHAVING COUNT (*) 1CompSci 516: Database Systems33Find age of the youngest sailor with age 18, for each rating with atSELECT S.rating, MINleast 2 such 35.016.035.025.563.525.5Apply the aggregate operatorAt the end, one tuple per 016.035.025.563.525.5Duke CS, Fall .525.545.035.055.525.535.035.0CompSci 516: Database Systems(S.age) AS minageFROM Sailors SWHERE S.age 18GROUP BY S.ratingHAVING COUNT (*) mpSci 516: Database Systems32Find age of the youngest sailor with age 18, for each rating with atSELECT S.rating, MINleast 2 such sailors.Step 4: Apply HAVING clauseThe group-qualification is applied to eliminate some .016.035.025.563.525.5Duke CS, Fall .525.545.035.055.525.535.035.0(S.age) AS minageFROM Sailors SWHERE S.age 18GROUP BY S.ratingHAVING COUNT (*) 1CompSci 516: Database Systems34Null Values Field values in a tuple are sometimes– unknown, e.g., a rating has not been assigned, or– inapplicable, e.g., no spouse’s name–rating378rating7188107109333Duke CS, Fall 2017Find age of the youngest sailor with age 18, for each rating with atSELECT S.rating, MINleast 2 such sailors.Step 5: Apply SELECT clause(S.age) AS minageFROM Sailors SWHERE S.age 18GROUP BY S.ratingHAVING COUNT (*) 1Step 2: Apply WHERE clauseSQL provides a special value null for such situations.minage25.535.025.535Duke CS, Fall 2017CompSci 516: Database Systems366

9/3/17Standard Boolean 2-valued logic True 1, False 0Suppose X 5–––– 2-valued logic does not work for nulls –Intuitively,–––––Suppose rating null, X 5Is rating 8 true or false?What about AND, OR and NOT connectives? (X 100) AND (X 1) is (X 100) OR (X 1) is .(X 100) AND (X 1) is NOT(X 5) is What if we have such a condition in theWHERE clause? T 1, F 0For V1, V2 {1, 0}V1 V2 MIN (V1, V2)V1 V2 MAX(V1, V2) (V1) 1 – V1Duke CS, Fall 2017CompSci 516: Database Systems37(rating 8) AND (X 5)?Duke CS, Fall 2017CompSci 516: Database Systems38New issues for Null3-Valued Logic For Null The presence of null complicates many issues. E.g.: –TRUE ( 1), FALSE ( 0), UNKNOWN ( 0.5)––unknown is treated as 0.5–– Now you can apply rules from 2-valued logic!–––– For V1, V2 {1, 0, 0.5}V1 V2 MIN (V1, V2)V1 V2 MAX(V1, V2) (V1) 1 – V1 –––––––NOT UNKNOWN UNKNOWNUNKNOWN OR TRUE TRUEUNKNOWN AND TRUE UNKNOWNUNKNOWN AND FALSE FALSEUNKNOWN OR FALSE UNKNOWNDuke CS, Fall 2017Meaning of constructs must be defined carefully–Therefore,CompSci 516: Database Systemse.g., WHERE clause eliminates rows that don’t evaluate to trueSo not only FALSE, but UNKNOWNs are eliminated toovery important to remember! But NULL allows new operators (e.g. outer joins) Arithmetic with NULL Can force ”no nulls” while creating a table–––39Special operators needed to check if value IS/IS NOT NULLBe careful!“WHERE X NULL” does not work!Need to write “WHERE X IS NULL”all of , -, *, / return null if any argument is nullsname char(20) NOT NULLprimary key is always not nullDuke CS, Fall 2017Aggregates with NULLsidsname ratingage22dustin 74531lubber 85558rusty351040Aggregates with NULL What do you get for SELECT count(*) from R1? SELECT count(rating) from R1?sidsname ratingage22dustin 74531lubber 85558rusty35R1Duke CS, Fall 2017CompSci 516: Database Systems10 What do you get for SELECT count(*) from R1? SELECT count(rating) from R1?R1CompSci 516: Database Systems41Duke CS, Fall 2017CompSci 516: Database Systems427

9/3/17Aggregates with NULLsidsname ratingage22dustin 74531lubber 85558rusty3510Aggregates with NULL What do you get for SELECT count(*) from R1? SELECT count(rating) from R1?sidsname ratingage22dustin 74531lubber 85558rusty35R1R1sidsname ratingage22dustin 74531lubber null5558rusty3510 What do you get for SELECT count(*) from R2? SELECT count(rating) from R2?sidsname ratingage22dustin 74531lubber null5558rusty35R2CompSci 516: Database Systems43– Discards null values first– Then applies the aggregate– Except count(*) There are also ASSERTIONS tospecify constraints that spanacross multiple tables If only applied to null values, the result is nullsname ratingage22dustin 74531lubber null5558rusty35R2 SELECT sum(rating) from R2? Ans:sidsname ratingage22dustin null4531lubber null5558rusty35nullR3 SELECT sum(rating) from R3? Ans:CompSci 516: Database Systems45 A view is just a relation, but we store a definition, rather thana set of tuplesCREATE VIEW YoungActiveStudents (name, grade)AS SELECT S.name, E.gradeFROM Students S, Enrolled EWHERE S.sid E.sid and S.age 21CREATE TABLE Sailors( sid INTEGER,sname CHAR(10),rating INTEGER,age REAL,PRIMARY KEY (sid),CHECK ( rating 1AND rating 10 )CREATE TABLE Reserves( sname CHAR(10),bid INTEGER, There are TRIGGERS too :day DATE,procedure that startsPRIMARY KEY (bid,day),automatically if specified changesCONSTRAINT noInterlakeResoccur to the DBMSCHECK ( Interlake’ ( SELECT B.bname– see additional slides at the endFROM Boats BDuke CS, Fall 2016CompSci 516: Data Intensive Computing Systems WHERE B.bid bid)))SELECT INTO. FROM. WHERESELECT S.name, E.gradeINTO YoungActiveStudentsFROM Students S, Enrolled EWHERE S.sid E.sid and S.age 21 Views can be dropped using the DROP VIEW command Views and Security: Views can be used to present necessary information(or a summary), while hiding details in underlying relation(s) the above view hides courses “cid” from ECompSci 516: Database Systems44Can create a new table from a queryon other tables tooViewsDuke CS, Fall 2017CompS

Duke CS, Fall 2017 CompSci 516: Database Systems 3 Today’s topic More SQL –semantic –joins –group bysand aggregates –nested queries Duke CS, Fall 2017 CompSci 516: Database Systems 4 Acknowledgement: The following slides have been created adapting the instructor material of the [RG] book provide