Introduction To Database Concepts - Uppsala University

Transcription

Introduction to Database ConceptsPart I: Introduction to DatabasesQPurpose of Database SystemsQView of DataQData ModelsQData Definition LanguageQData Manipulation LanguageKostis SagonasIntroduction to DatabasesDatabase Management System (DBMS)QQDBMS contains information about a particular enterpriseDBMS provides an environment that is both convenient andefficient to use.QDatabase Applications:QPurpose of Database SystemCollection of interrelated dataSet of programs to access the dataQQQIn the early days, database applications were built on top offile systemsQDrawbacks of using file systems to store data: Data redundancy and inconsistency Difficulty in accessing data Data isolation — multiple files and formats Integrity problems Banking: all transactions Airlines: reservations, schedules Universities: registration, grades Sales: customers, products, purchases Manufacturing: production, inventory, orders, supply chain Human resources: employee records, salaries, tax deductions Databases touch all aspects of our livesIntroduction to Databases3Introduction to DatabasesPurpose of Database Systems (Cont.)QDrawbacks of using file systems (cont.) Atomicity of updates Failures may leave database in an inconsistent state with partialupdates carried out E.g. transfer of funds from one account to another should eithercomplete or not happen at all Concurrent accessed needed for performance Uncontrolled concurrent accesses can lead to inconsistencies Multiple file formats, duplication of information in different filesNeed to write a new program to carry out each new task Integrity constraints (e.g. account balance 0) become partof program code Hard to add new constraints or change existing ones4Levels of AbstractionQPhysical level describes how a record (e.g., customer) is stored.QLogical level: describes data stored in database, and therelationships among the data.type customer recordname : string;street : string;city : integer;end;Concurrent access by multiple users– E.g. two people reading a balance and updating it at the sametimeQ2QView level: application programs hide details of data types.Views can also hide information (e.g., salary) for securitypurposes.Security problemsDatabase systems offer solutions to all the above problemsIntroduction to Databases5Introduction to Databases61

Instances and SchemasQSimilar to types and variables in programming languagesQSchema – the logical structure of the database QQAnalogous to type information of a variable in a program Physical schema: database design at the physical level Logical schema: database design at the logical level Instance – the actual content of the database at a particular point in timeAnalogous to the value of a variable Applications depend on the logical schema In general, the interfaces between the various levels and components shouldbe well defined so that changes in some parts do not seriously influence others.Entity-Relationship modelQRelational modelQOther models: 7object-oriented modelsemi-structured data modelsOlder models: network model and hierarchical modelIntroduction to DatabasesEntity-Relationship Model8Entity Relationship Model (Cont.)QE-R model of real world Entities (objects) Relationships between entitiesExample of schema in the entity-relationship model Q9E.g. customers, accounts, bank branch E.g. Account A-101 is held by customer Johnson Relationship set depositor associates customers with accountsWidely used for database design Introduction to Databasesdatadata relationshipsdata semanticsdata constraintsQPhysical Data Independence – the ability to modify the physical schemawithout changing the logical schemaIntroduction to DatabasesA collection of tools for describing e.g., the database consists of information about a set of customers andaccounts and the relationship between them) QData ModelsDatabase design in E-R model usually converted to design in therelational model (coming up next) which is used for storage andprocessingIntroduction to DatabasesRelational Model10A Sample Relational DatabaseAttributesQExample of tabular data in the relational 28-3746SmithIntroduction to maPalo AltoA-101NorthRyeA-215AlmaPalo ion to Databases122

Data Manipulation Language (DML)Data Definition Language (DDL)QSpecification notation for defining the database schema E.g.create table account (account-number char(10),integer)balanceQDDL compiler generates a set of tables stored in a datadictionaryQData dictionary contains metadata (i.e., data about data) database schemaQLanguage for accessing and manipulating the data organized bythe appropriate data modelQTwo classes of languages QDML also known as query language Procedural – user specifies what data is required and how to getthose data Nonprocedural – user specifies what data is required withoutspecifying how to get those dataSQL is the most widely used query languageData storage and definition language language in which the storage structure and access methodsused by the database system are specified Usually an extension of the data definition languageIntroduction to Databases13Introduction to Databases14SQLQQSQL: widely used non-procedural language E.g. find the name of the customer with customer-id 192-83-7465select customer.customer-namefrom customerwhere customer.customer-id ‘192-83-7465’ E.g. find the balances of all accounts held by the customer withcustomer-id 192-83-7465select account.balancefrom depositor, accountwhere depositor.customer-id ‘192-83-7465’ anddepositor.account-number account.account-numberPart II: The Relational ModelApplication programs generally access databases through Language extensions that allow embedded SQL Application program interfaces (e.g. ODBC/JDBC) which allow SQLqueries to be sent to a databaseIntroduction to Databases15The Relational ModelQStructure of Relational DatabasesQRelational AlgebraQTuple Relational CalculusQDomain Relational CalculusQExtended Relational-Algebra-OperationsQModification of the DatabaseQViewsIntroduction to Databases17Example of a RelationIntroduction to Databases183

Basic StructureQQFormally, given sets D1, D2, . Dn a relation r is a subset ofD1 x D2 x x DnThus a relation is a set of n-tuples (a1, a2, , an) whereai D iExample: ifAttribute TypesQEach attribute of a relation has a nameQThe set of allowed values for each attribute is called the domainof the attributeQAttribute values are (normally) required to be atomic, that is,indivisiblecustomer-name {Jones, Smith, Curry, Lindsay}customer-street {Main, North, Park}customer-city {Harrison, Rye, Pittsfield}Then r { (Jones, Main, Harrison),(Smith, North, Rye),(Curry, North, Rye),(Lindsay, Park, Pittsfield)}is a relation over customer-name x customer-street x customer-cityIntroduction to Databases19 E.g. multivalued attribute values are not atomic E.g. composite attribute values are not atomicIntroduction to DatabasesRelation SchemaQA1, A2, , An are attributesQR (A1, A2, , An ) is a relation schema20Relation InstanceQThe current values (relation instance) of a relation arespecified by a tableQAn element t of r is a tuple, represented by a row in a tableE.g. Customer-schema (customer-name, customer-street, customer-city)Qr(R) is a relation on the relation schema RE.g.attributescustomer (Customer-schema)customer-name stomerIntroduction to Databases21Introduction to DatabasesDatabaseRelations are UnorderedQOrder of tuples is irrelevant (tuples may be stored in an arbitrary order)QE.g. account relation with unordered tuplesQA database consists of multiple relationsQInformation about an enterprise is broken up into parts, with eachrelation storing one part of the informationQQIntroduction to Databases2322E.g.: account : stores information about accountsdepositor : stores information about which customerowns which accountcustomer : stores information about customersStoring all information as a single relation such asbank(account-number, balance, customer-name, .)results in repetition of information (e.g. two customers own an account) the need for null values (e.g. represent a customer without anaccount)Normalization theory deals with how to design relationalschemasIntroduction to Databases244

The customer RelationIntroduction to Databases25The depositor RelationIntroduction to DatabasesE-R Diagram for the Banking EnterpriseKeysQIntroduction to Databases27Determining Keys from E-R SetsQStrong entity set. The primary key of the entity set becomesthe primary key of the relation.QWeak entity set. The primary key of the relation consists of theunion of the primary key of the strong entity set and thediscriminator of the weak entity set.QRelationship set. The union of the primary keys of the relatedentity sets becomes a super key of the relation. For binary many-to-one relationship sets, the primary key of the“many” entity set becomes the relation’s primary key. For one-to-one relationship sets, the relation’s primary key can bethat of either entity set. For many-to-many relationship sets, the union of the primary keysbecomes the relation’s primary keyIntroduction to Databases2926Let K RQK is a superkey of R if values for K are sufficient to identify aunique tuple of each possible relation r(R) by “possible r” wemean a relation r that could exist in the enterprise we aremodeling.Example: {customer-name, customer-street} and{customer-name}are both superkeys of Customer, if no two customers canpossibly have the same name.QK is a candidate key if K is minimalExample: {customer-name} is a candidate key for Customer,since it is a superkey {assuming no two customers can possiblyhave the same name), and no subset of it is a superkey.Introduction to Databases28Schema Diagram for the Banking EnterpriseIntroduction to Databases305

Query LanguagesRelational AlgebraQLanguage in which user requests information from the database.QProcedural languageQCategories of languagesQSix basic operators procedural select non-procedural project union“Pure” languages:Q Relational Algebra set difference Tuple Relational Calculus Cartesian product Domain Relational Calculus renamePure languages form underlying basis of query languages thatpeople use.QIntroduction to Databases31QThe operators take two or more relations as inputs and give anew relation as a result.Introduction to Databases32Select Operation – Example Relation rABCDαα17αβ57ββ123ββ23 10ABαβ σA B D 5 (r)CDα17β23 10Introduction to Databases33Select OperationNotation:Qp is called the selection predicateQDefined as:σp(r) {t t r and p(t)}Where p is a formula in propositional calculus consistingof terms connected by : (and), (or), (not)Each term is one of: attribute op attribute or constant Qwhere op is one of: , , , . . Example of selection:σ branch-name “Perryridge”(account)Introduction to DatabasesProject Operation – ExampleQRelation r: A,C (r)Introduction to DatabasesABCα101α201β301β402ACα1α1β1β2 34Project OperationQQσ p(r)QNotation: A1, A2, , Ak (r)where A1, A2 are attribute names and r is a relation name.ACα1β1β235QThe result is defined as the relation of k columns obtained byerasing the columns that are not listedQDuplicate rows removed from result, since relations are setsQE.g. To eliminate the branch-name attribute of account account-number, balance (account)Introduction to Databases366

Union Operation – ExampleQRelations r, s:ABABα1α2α2β3β1Union OperationQNotation: r sQDefined as:r s {t t r or t s}QsFor r s to be valid:1. r, s must have the same arity (same number of attributes)rr s:ABα1α2β1βIntroduction to Databases2. The attribute domains must be compatible (e.g., 2nd columnof r deals with the same type of values as does the 2ndcolumn of s)QE.g., to find all customers with either an account or a loan customer-name (depositor) customer-name (borrower)337Introduction to DatabasesSet Difference Operation – ExampleQRelations r, s:ABABα1α2α2β3β1Set Difference OperationQNotation r – sQDefined as:QSet differences must be taken between compatible relations.r – s {t t r and t s}srr – s:ABα1β1Introduction to Databases39 r and s must have the same arity attribute domains of r and s must be compatibleIntroduction to DatabasesABα1β2CαββγrD10102010sr x s:Introduction to duct OperationCartesian-Product Operation-ExampleRelations r, s:38QNotation r x sQDefined as:QAssume that attributes of r(R) and s(S) are disjoint. (That is,R S ).QIf attributes of r(R) and s(S) are not disjoint, then renaming mustbe used.r x s {t q t r and q s}Introduction to Databases427

Composition of OperationsQCan build expressions using multiple operationsQExample: σA C(r x s)QrxsQσA C(r x s)Introduction to 2020aab43Rename OperationQAllows us to name, and therefore to refer to, the results ofrelational-algebra expressions.QAllows us to refer to a relation by more than one name.Example:ρ x (E)returns the expression E under the name XIf a relational-algebra expression E has arity n, thenρx (A1, A2, , An) (E)returns the result of expression E under the name X, and with theattributes renamed to A1, A2, ., An.Introduction to DatabasesBanking Example44Example Queriesbranch (branch-name, branch-city, assets)QFind all loans of over 1200σamount 1200 (loan)customer (customer-name, customer-street, customer-only)account (account-number, branch-name, balance)Qloan (loan-number, branch-name, amount)Find the loan number for each loan of an amount greater than 1200 loan-number (σamount 1200 (loan))depositor (customer-name, account-number)borrower

Security problems Q Database systems offer solutions to all the above problems Introduction to Databases 6 Levels of Abstraction Q Physical level describes how a record (e.g., customer) is stored. Q Logical level: describes data stored in database, and the relationships among the data. type customer record name : string; street : string; c i ty : n eg r; end; Q View level: application .