PWSP CLASS DATABASE MANAGEMENT INSTRUCTOR:

Transcription

Introduction to DatabasesPWSP CLASSDATABASE MANAGEMENTINSTRUCTOR: SANJAY GOELFEBRUARY 26-27, 20083/4/2007Sanjay Goel, School of Business, University at Albany1

Databases Learning ObjectivesLearn need for relational databasesCreate Entity-Relationship DiagramsLearn the process of database normalizationLearn the constructs of SQLDesign and Implement relational databases usingMicrosoft Access3/4/2007Sanjay Goel, School of Business, University at Albany2

DatabasesData and Information Data is an individual fact or multiple facts, or avalue, or a set of values, but is not significant toa business in and of itself.– What does this mean: 7/13/2002Giving data context, or meaning, turns it intoinformation.– Date of change of bankruptcy law3/4/2007Sanjay Goel, School of Business, University at Albany7/13/20023

Databases What is a databaseA database is simply a bunch of information (data) stored on acomputer.– If you run a matrimonial agency you probably have a spreadsheetwith all your clients’ names and addresses on your computer––– This could be a list of all your clients, a list of the products you sell, theresults of a chess tournament or everyone in your family tree.The top of the spreadsheet are typically column headings: Name,Address, Telephone Number, Email Address, etc.The spreadsheet might have customer ID numbers and other dataThis is your client table in the databaseIn your filing cabinet, you might have a separate folder for everyproject you’ve worked on. This folder contains the projectnumber, the name (or ID number) of the client, a description ofthe project, the current status, the budget and a completion date.–3/4/2007This is your projects table in the database.Sanjay Goel, School of Business, University at Albany4

DatabasesNeed A database is required to keep track of things––––– The climatic patterns over the last thousand yearsData from celestial microscopes of the night skyNumber of kids born with a mole on their fingerNumber of fishes which spawn in AlaskaNumber of people who also buy crackers along when theybuy milk.As we will see, unlike a list or spreadsheet, a databasecan store complex information more efficiently than asimple list3/4/2007Sanjay Goel, School of Business, University at Albany5

DatabasesRole Database is required to:–– Organize data.Retrieve information.Remember––3/4/2007you store data in a databaseyou retrieve information from the database.Sanjay Goel, School of Business, University at Albany6

DatabasesProperties Persistence–– Sharing–– Data can be stored as long as required (i.e. magnetic disksrather than computer memory)Retrieve information.Can be used by multiple users simultaneouslyUnless two people are trying to change the same data at thesame time they should be able to operate independentlyInterrelated–3/4/2007Link information about different elements to provide acomplete pictureSanjay Goel, School of Business, University at Albany7

DATABASECOMPONENTS3/4/2007Sanjay Goel, School of Business, University at Albany8

Database ComponentsDatabase System The four components of a database system are:––––UsersDatabase ApplicationDatabase Management System anagementSystem(DBMS)Sanjay Goel, School of Business, University at AlbanyDatabase9

Database ComponentsUser A user of a database system will– Use a database application to track things– Use forms to enter, read, delete and query data– Produce reports3/4/2007Sanjay Goel, School of Business, University at Albany10

Database ComponentsDatabase A database is a self-describing collection ofrelated records– The database itself contains the definition of itsstructure– Metadata is data describing the structure of thedatabase data Tables within a relational database are related toeach other3/4/2007Sanjay Goel, School of Business, University at Albany11

Database Components Database Management System (DBMS)A database management system (DBMS) serves as anintermediary between database applications and thedatabaseThe DBMS manages and controls database activitiesThe DBMS creates, processes and administers thedatabases it controlsDatabase management system (DBMS) has two goals.––3/4/2007Add, delete and update data in the database.Provide various ways to view data in a database.Sanjay Goel, School of Business, University at Albany12

Database Components DBMS FunctionsCreate databasesCreate tablesCreate supporting structuresRead database dataModify database data (insert, update, delete)Maintain database structuresEnforce rulesControl concurrencyProvide securityPerform backup and recovery3/4/2007Sanjay Goel, School of Business, University at Albany13

Database ComponentsApplications A database application is a set of one or morecomputer programs that serves as anintermediary between the user and the DBMSFunctions of Database Applications include–––––3/4/2007Create and process formsProcess user queriesCreate and process reportsExecute application logicControl database applicationsSanjay Goel, School of Business, University at Albany14

Database ComponentsDesktop Database Systems Desktop database systems typically supportsingle users– Have one application– Have only a few tables– Have a few (DBMS)Databasee.g. MS Access3/4/2007Sanjay Goel, School of Business, University at Albany15

Database ComponentsOrganizational Database Systems Typically multi-user systems–––Include more than one application and several databasesInvolve multiple computersAre complex in design (large # of tables and tSystem(DBMS)DatabaseApplicatione.g. Oracle,SQL ServerDatabaseApplication3/4/2007VB Sanjay Goel, School of Business, University at AlbanyDatabase16

RELATIONALDATABASE3/4/2007Sanjay Goel, School of Business, University at Albany17

DatabasesExample (Collection of reAlpha Press999-999-9999 12.000-91-335678-7Fairie Queene7Spencer777-777-77771Big House123-456-7890 lpha Press999-999-9999 20.000-99-999999-9Emma1Austen111-111-11111Big House123-456-7890 20.000-55-123456-9Main Street9Smith123-222-22223Small House714-000-0000 22.950-55-123456-9Main Street10Jones123-333-33333Small House714-000-0000 22.950-103-45678-9Iliad3Homer333-333-33331Big House123-456-7890 25.000-12-333433-3On Liberty8Mill888-888-88881Big House123-456-7890 25.001-22-233700-0Visual Basic4Roman444-444-44441Big House123-456-7890 25.001-1111-1111-1C 4Roman444-444-44441Big House123-456-7890 29.950-123-45678-0Ulysses6Joyce666-666-66662Alpha Press999-999-9999 l House714-000-0000 l House714-000-0000 l House714-000-0000 34.000-11-345678-9Moby Dick2Melville222-222-22223Small House714-000-0000 49.000-12-345678-6Jane Eyre1Austen111-111-11113Small House714-000-0000 49.000-99-777777-7King Lear5Shakespeare555-555-55552Alpha Press999-999-9999 49.003/4/2007Sanjay Goel, School of Business, University at Albany18

DatabasesWhy Use a Database? Most databases worth maintaining are quitecomplex.– Library of congress contains 16 million records– Social Security Database– Department of Motor Vehicles Database Why can’t we use a flat file like we had in word?– Redundancy Redundancy is unnecessary repetition of data– Wasted Storage– Database Anomalies3/4/2007Sanjay Goel, School of Business, University at Albany19

DatabasesRedundancy – Book Database Multiple values in the column of a database– e.g. some books are authored by multiple authors.There are three choices–Accommodate multiple authors in multiple rows (one foreach author) –Have multiple columns for the authors in each row. –You have to determine the max number of authors a prioriA lot of the fields will go unusedAdd all the author names in one column 3/4/2007Complete information about a book is repeated as many times asthere are authors. (causing large redundancy)Searching and sorting become very hard.Sanjay Goel, School of Business, University at Albany20

DatabasesRedundancy – Library of Congress Library of Congress Example–––––3/4/200710,000 publishers16 million recordsEach address on average 50 characters longAssuming each character takes 2 bytes, the difference instorage is:(16,000,000 – 1000) * 50 * 2 bytes 1.6 gbytesDuplication of address alone requires 1.6 gigabytes ofstorageSanjay Goel, School of Business, University at Albany21

DatabasesRedundancy – Project Database In a list, each row is intended to stand on its own. So,the same information may be entered several times– E.g. Consider a list of Projects that include the ProjectManager’s Name, ID, and Phone. If the same person ismanaging 5 projects his information would be repeated 10timesIn a list, each row may contain information on morethan one theme. As a result, needed information mayappear in the lists only if information on other themesis also present–3/4/2007E.g. A list of Projects may include Project Managerinformation (Name, ID, and Phone Extension) and Projectinformation (Name, ID, StartDate, Budget) in the same row.Sanjay Goel, School of Business, University at Albany22

DatabasesAnomalies A table anomaly is a structure for which a normaldatabase operation cannot be executed withoutinformation loss or full search of the data table Three types of anjay Goel, School of Business, University at Albany23

DatabasesInsertion Anomalies Insertion anomaly occurs when extra data beyond thedesired data must be added to the database If we need to add a new publisher to the database, butwe do not have any book by that publisher.––3/4/2007we will need to add a new line and put NULL values in allbut publisher related columns.Also ISBN column which is supposed to be unique will havenumerous null values.Sanjay Goel, School of Business, University at Albany24

DatabasesUpdate Anomalies An update anomaly occurs when it is necessary toupdate multiple rows to modify a single fact.–––3/4/2007If the phone number of a publisher changes then all theentries of the publisher need to be changed.For instance changing the phone number of Big Housepublisher requires changing the phone number 6 times.Why is this bad?Sanjay Goel, School of Business, University at Albany25

DatabasesDeletion Anomalies A deletion anomaly occurs whenever deleting a rowinadvertently causes other data to be deleted.––3/4/2007If we lose a book and delete a row containing the book, welose the information of the publisher if it is the only book bythat publisher.If we remove books Macbeth, Hamlet, Ulysses, and KingLear we lose all information about Alpha Press andShakespeare.Sanjay Goel, School of Business, University at Albany26

DatabasesRelational Design To create a relational database–– Each smaller table has–– Break table into a collection of smaller tables.Define relationships among the tablea heading which contains the table definitiona body which contains the contentThe relationships are created by having commoncolumns among tables––3/4/2007Matching values in the rows demonstrate relationshipsThese relationships are used to join tables while designingqueriesSanjay Goel, School of Business, University at Albany27

DatabasesRelational Design3/4/2007Sanjay Goel, School of Business, University at Albany28

DatabasesRelational Design Increased complexity– Relational Integrity–– Instead of simply sorting on columns in a table we need togather information from multiple tables.Relational integrity should be maintained while changingdata.For instance if we delete publishers we can not let the booksby that publisher reside in the books database as danglingreferences.Inadvertent data loss––3/4/2007During the design care must be taken to not lose any dataFor instance without the books author table we will notknow how to relate the books and authors.Sanjay Goel, School of Business, University at Albany29

Databases EntitiesA database stores information about things encountered in reallife i.e. person, places, things, or events.An entity is something of importance to a user that needs to berepresented in a database–– An entity represents one theme or topicIn an entity-relationship model thus entities are restricted to things thatcan be represented by a single tableIn the book database example Books, Authors & Publishers arethe entities?All possible entities for a given entity type constitute the entityclass.The subset of the entities from the entity class contained in adatabase is called an entity set3/4/2007Sanjay Goel, School of Business, University at Albany30

DatabasesEntities Cont’d An entity class stays constant whereas an entity set canchange. For the given book database–––3/4/2007Book is an entitySet of all possible books in the world is the entity classThe subset of the 14 books in the book table is the entity setSanjay Goel, School of Business, University at Albany31

DatabasesRelation A relation is a two-dimensional table that has specificcharacteristics The table dimensions, like a matrix, consist of rows andcolumns Characteristics of a Relation––––––––3/4/2007Rows contain data about an entityColumns contain data about attributes of the entityCells of the table hold a single valueAll entries in a column are of the same kindEach column has a unique nameThe order of the columns is unimportantThe order of the rows is unimportantNo two rows may be identicalSanjay Goel, School of Business, University at Albany32

DatabasesAttributes Properties of the entities that describe their behavior arecalled the attributes.– Attribute values are the actual entries in each cell of adatabase table.The attributes have three main purposes–––3/4/2007Represent the real data in the database.e.g. in the book table the title and price are the describingattributesUniquely identify entities within an entity class.e.g. ISBN in the book table, PubID in the publisher table,and AuID in the author table provide unique identity to abook, publisher or an author.Define relationship of one entity with another entitySanjay Goel, School of Business, University at Albany33

DatabasesAttributes A set of multiple attributes can describe an entityuniquely so it is not absolutely essential to have anidentifying attribute however it is useful to have thesefor the purpose of efficiency.– e.g. for all the U.S. residents Name, race, color, height areattributes that describe the data while Social SecurityNumber is the identifying entity.What are the attributes that we need for our threeentities?–––3/4/2007Books: Title, Price, ISBNAuthors: AuID, AuPhone, AuNamePublishers: PubName, PubPhone, PubIDSanjay Goel, School of Business, University at Albany34

DatabasesAttributes (Observations) From the books attributes there is no way to identify thepublisher and author.– We need to distinguish between the unique attribute for an entityset vs. unique attribute for an entity class.–– We need to add more attributes to describe the relationships.e.g. Book database: Title is a unique attribute for the current set ofbooks, however, there are many books in the world with the same titles.ISBN on the other hand is a unique identifier for the book.e.g. Adult males living in the U.S.: A lot of them have the same name,(probably the same age), however, SSN is a unique identifier.Even though Publisher can probably be uniquely identified bythe Publisher Name & Phone Number, we have added PubID tomake identification more efficient.The attributes and the unique identifier selection is contextdependent and is the job of the database designer.3/4/2007Sanjay Goel, School of Business, University at Albany35

DatabasesKeys & Superkeys A key is one (or more) columns of a relation that is (are)used to identify a row A set of attributes from the set of all the attributes for agiven entity is called the super key for the entity class.––– {ISBN} is the superkey for the Book entity{PubID} or {PubName, PubPhone} are the superkeys forthe Publishers entity class.There can be multiple superkeys for a given entity.Superkeys should be evaluated on the basis of entityclass not the entity set–3/4/2007Unique identifier for books in a database of 14 books maynot work when additional books are added to the tableSanjay Goel, School of Business, University at Albany36

DatabasesCandidate Keys and Primary Key A superkey is called a candidate key if no proper subsetof the superkey is also a key.––– i.e. a candidate key is a minimal superkey.e.g. Both {ISBN} and {ISBN, Title} are superkeys for thebooks table since they both uniquely identify the Book.However it is not necessary to include the Title in thesuperkey.A table may have multiple candidate keysA primary key is a candidate key chosen to be the mainkey for the relationIf you know the value of the primary key, you will beable to uniquely identify a single row3/4/2007Sanjay Goel, School of Business, University at Albany37

Databases Composite KeysA composite key is a key that contains two or moreattributesFor a key to be unique, often it must become acomposite keyTo identify a family member, you need to know aFamilyID, a FirstName, and a Suffix (e.g., Jr.)The composite key is:(FamilyID, FirstName, Suffix)One needs to know the value of all three columns touniquely identify an individual3/4/2007Sanjay Goel, School of Business, University at Albany38

DatabasesRelationships A table may be related to other tablesFor example–– An Employee works in a DepartmentA Manager controls a ProjectRelationships form associationsbetween multiple entities–e.g. Book is written by an authorNumber of entities in a relationshipis called the degree of a relationship––3/4/2007SupplierBinary relationship involves two entitiesTernary relationship involves threeentitiesSanjay Goel, School of Business, University at AlbanyQuotationMotherFatherChild39

DatabasesForeign Keys To preserve relationships, foreign keys are created A foreign key is a primary key from one table placedinto another table The key is called a foreign key in the table that receivedthe key3/4/2007Sanjay Goel, School of Business, University at Albany40

DatabasesBinary Relationships There are three kinds of binary relationships–––3/4/2007One-to-one (1:1): A single entitiy instance of one type isrelated to a single entity instance of another typeOne-to-Many (1:N): A single entity instance of one typerelates to many entity instances of another typeMany-to-Many (N:M): A single entity instance of one typerelates to many entity instances of another type & vice kSanjay Goel, School of Business, University at Albany41

DatabasesBinary Relationships One-to-one relationships are rare since they can besubstituted by adding one or more extra attributes inone of the tables to model the attributes of the other. Astrong justification is required for having such arelationship––3/4/2007e.g. Passwords are kept in a separate table for reasons ofsecurity.e.g. If one of the fields contains a large data set it ismaintained in a separate table for efficiencySanjay Goel, School of Business, University at Albany42

DatabasesBinary Relationships Cardinality specifies (maximum) number of instances ofan entity that relate to one instance of another entity– e.g. Basketball team and starting players have cardinality of 5Ordinality describes the minimum number of instancesof an entity that relate to one instance of another entity––3/4/2007if the minimum number is zero the relationship is optionalif the minimum number is greater than zero the relationshipis mandatorySanjay Goel, School of Business, University at Albany43 page

Database e.g. MS Access 3/4/2007 Sanjay Goel, School of Business, University at Albany 15. 3/4/2007 Sanjay Goel, School of Business, University at Albany 16 Typically multi-user systems – Include more than one appli