Database Management Systems (CS644) - Weebly

Transcription

Chapter TwoDBMS ArchitectureHow should DBS be organized?1

Agenda (Chapter two) Reality, Data & Metadata The Three Levels of DBS Architecture External View, Conceptual View, Internal View Schemas, Mappings, and Instances The Database Management System (DBMS) Functions of DBMS Requirements of DBMS with Respect to DatabaseLanguages (DDL and DML) Roles in Database Environment2

Reality, metadata and data value Reality is what exists in the real world. This reality might be in a physical or logical form. When developing a database or information system,representation of the reality in the computer system is the firststep. The representation would be by considering the properties thereality exhibits. When these properties assume a specific value it would representthe real world object for which data is required to be captured for. Thus any data model will be having a representation mechanismfor the building blocks of data representation.3

Cont Real world can be represented using its building blocks. The building blocks vary based on the kind of reality we wantto represent. When it comes to data representation, the building blocksare: Entities Attributes Relationship4

ENTITIES Real world physical or logical object for which data needto be captured forPersons, places, things etc. which the organization has todeal with.The name given to an entity should always be a singularnoun descriptive of each item to be stored in it. 5E.g.: student NOT students

Cont Existence Dependency: the dependence of an entity on theexistence of one or more entities. Weak entity : an entity that can not exist without the entitywith which it has a relationship – it is indicated by a doublerectangle or dotted line (various modeling approaches) Entity will be realized as relations or tables later Relations can also describe relationships (will be discussed later) Every relation has a schema, which describes the columns,or fields the relation itself corresponds to our familiar notionof a table A relation is a collection of tuples, each of which containsvalues for a fixed number of attributes6

ATTRIBUTES Properties used to describe each Entity or real world object. These attributes will assume a value to represent a singleoccurrence of that real world object class The items of information which characterize and describethese entities. Attributes are pieces of information ABOUT entities. The analysis must of course identify those which are actuallyrelevant to the proposed application. Attributes will give rise to recorded items of data in thedatabase7

Cont At this level we need to know such things as: Attribute name (be explanatory words or phrases) The domain from which attribute values are taken (ADOMAIN is a set of values from which attribute values may betaken.) Each attribute has values taken from a domain. For example, the domain of Name is string and that for salaryis real8

Cont Whether the attribute is part of the entity identifier(attributes which just describe an entity and those which helpto identify it uniquely) Whether it is permanent or time-varying (whichattributes may change their values over time) Whether it is required or optional for the entity (whosevalues will sometimes be unknown or irrelevant)9

Types of Attributes Simple (atomic) Vs Composite attributes Simple : contains a single value (not divided intosub parts) E.g. Age, gender Composite: Divided into sub parts (composed ofother attributes) E.g. Name, address10

Cont Single-valued Vs multi-valued attributes Single-valued : have only single value (the valuemay change but has only one value at one time) Multi-Valued: have more than one value 11E.g. Name, Sex, Id. No. color of eyesE.g. hobby Address, dependent-namePerson may have several college degrees

Cont Stored vs. Derived Attribute Stored : not possible to derive or compute Derived: The value may be derived (computed) from the valuesof other attributes. 12E.g. Name, AddressE.g. Age (current year – year of birth)Length of employment (current date- start date)Profit (earning-cost)G.P.A (grade point/credit hours)

Cont Null Values (Null “Ok” attributes) 13NULL applies to attributes which are not applicable orwhich do not have values.You may enter the value NA (meaning not applicable)Value of a key attribute can not be null.Default value - assumed value if no explicit value

Entity Vs Attributes When designing the conceptual specification of the database, one 14should pay attention to the distinction between an Entity and anAttribute.Consider designing a database of employees for an organization:Should address be an attribute of Employees or an entity(connected to Employees by a relationship)?If we have several addresses per employee, address must be anentity (attributes cannot be set-valued/multi valued)If the structure (city, Woreda, Kebele, etc) is important, e.g. wantto retrieve employees in a given city, address must be modeled asan entity (attribute values are atomic)

RELATIONSHIPS Associations between entities which exist and must be taken intoaccount when processing information. In any business processing one object may be associated withanother object due to some event. Such kind of association is what we call a RELATIONSHIP betweenentity objects A relationship should be named by a word or phrase whichexplains its function For each RELATIONSHIP, one can talk about the Number ofEntities and the Number of Tuples participating in the association. These two concepts are called DEGREE and CARDINALITY of arelationship respectively15

Cont Degree of a Relationship An important point about a relationship is how many entitiesparticipate in it. The number of entities participating in a relationship is calledthe DEGREE of the relationship.16

Cont Among the Degrees of relationship, the following are thebasic: UNARY/RECURSIVE RELATIONSHIP: Tuples/records of a Singleentity are related withy each other. BINARY RELATIONSHIPS: Tuples/records of two entities are associated ina relationship TERNARY RELATIONSHIP: Tuples/records of three different entities areassociated And a generalized one: 17N-NARY RELATIONSHIP: Tuples from arbitrary number of entity sets areparticipating in a relationship.

Cont Cardinality of a Relationship Another important concept about relationship is the number ofinstances/tuples that can be associated with a single instancefrom one entity in a single relationship. The number of instances participating or associated with asingle instance from an entity in a relationship is called theCARDINALITY of the relationship.18

Cont The major cardinalities of a relationship are: ONE-TO-ONE: one tuple is associated with only one othertuple. E.g. Building – Location as a single building will be locatedin a single location and as a single location will onlyaccommodate a single Building. ONE-TO-MANY, one tuple can be associated with many othertuples, but not the reverse. E.g. Department-Student as one department can havemultiple students.19

Cont MANY-TO-ONE, many tuples are associated with one tuplebut not the reverse. E.g. Employee – Department: as many employees belong to asingle department. MANY-TO-MANY: one tuple is associated with many othertuples and from the other side, with a different role nameone tuple will be associated with many tuples E.g. Student – Course as a student can take many courses anda single course can be attended by many students.What do govern the cardinality and degree of a givenrelationship?20

Three levels/views of DBS21

Database Architecture DBMSs do not all confirm to the same architecture. The three-level architecture forms the basis of moderndatabase architectures. Agreement with the ANSI/SPARC (American NationalStandards Institute- Standards Planning and RequirementCommittee) studySystems.22group on Database Management

Cont The architecture for DBMSs is divided intothree general levels:1. External-user2. Conceptual-logical3. Internal-physical23

3 Level ANSI/SPARC Architecture Proposed to support DBMS characteristics of: Program-data independence. Data definition is separated from the application Data Sharing Support of multiple views of the data24

Cont Three levels of abstractions in describing data For better understanding of DB functionalities Made databases more independent ofapplication Became a standard for the organisation ofDBMS25

The Three Levels of DBS Architecture:ANSI-SPARC DB Architecture26

Cont External Level: Users' view of the database. concerned with the way individual users see the data Describes that part of database that is relevant to a particularuser. Different users have their own customized view of the databaseindependent of other users.27

Cont Conceptual Level: Community view of the database. Describes what data is stored in database and relationshipsamong the data can be regarded as a community user view - a formaldescription of data of interest to the organisation, independentof any storage considerations.28

Cont Internal Level: Physical representation of the database on the computer. Concerned with the way in which the data is actually stored Storage spacing allocation for data Record description for storage Describes how the data is stored in the database.29

Schema for the level External (Sub) Schemao defines the external view of data as seen by a user orprogram Conceptual Schemao defines the logical view of data as seen by all users andprograms Physical (Internal) Schemao defines the physical view of data as seen by a DBMS30

Purpose 3-Level DB architecture All users should be able to access same data. Since the database is having a shared data feature whereall the data is stored in one location and all users willhave their own customized way of interacting with thedata. A user's view is unaffected or immune to changes madein other views. Since the requirement of one user is independent of theother, a change made in one user’s view should not affectother users.31

Cont Users should not need to know physical database storagedetails. As there are naïve users of the system, hardware level or physical details should be a black-box for suchusers. DBA should be able to change database storagestructures without affecting the users' views. A change in file organization, access method should notaffect the structure of the data which in turn will have noeffect on the users.32

Cont Internal structure of database should be unaffected by changes tophysical aspects of storage. DBA should be able to change conceptual structure of databasewithout affecting all users. In any database system, the DBA will have the privilege to changethe structure of the database, like adding tables, adding and deleting an attribute, changing the specification of the objects in the database.33

ANSI-SPARC Architecture andDatabase Design Phases34

Example35

Three-schema architecture Different peoplehave differentviews of thedatabase theseare the externalschema The internalschema is theunderlyingdesign andimplementation36

Developing the three-tiered architecture37

Example: University Database Conceptual schema: Students(sid: string, name: string, login: string, age: integer, gpa:real) Courses(cid: string, cname:string, credits:integer) Enrolled(sid:string, cid:string, grade:string) External Schema (View): Course info(cid:string,enrollment:integer) Physical schema: Relations stored as unordered files. Index on first column of Students.View 1View 2Conceptual SchemaPhysical SchemaDB38View 3

External LayerViewViewViewConceptual LayerBase TablesPhysical Layer39Stored Tables .

Physical View The DBMS must know– exact physical location– precise physical structuredatabaseEmployee recordA.B.C. De Silva 222, Galle Road, ColomboName (20 characters) Address (40 characters)650370690V Senior LecturerNID (10 char) Designation (15 char)40

Logical View The user/application mustknow– existence– logical referenceNID 650370690VdatabaseEmployeeNameA.B.C. De SilvaDesignation Senior LecturerAddress41222, Galle Road, Colombo

Conceptual LayerTableTable The conceptual model is a logical representation ofthe entire contents of the database. The conceptual model is made up of base tables. Base tables are “real” in that they contain physicalrecords.42

External View The user/application see– authorised data– own formatdatabaseLecturerName A.B.C. De SilvaDepartmentDept. of Computer ScienceDesignation Senior Lecturer43Age 35

External View cont. External Views Allows to– hide unauthorised datae.g. salary, date of birth– provide user viewe.g. view employee name, designation,department data taken from employee anddepartment files– derive new attributese.g. age derived from date of birth44

External View cont. External Views Allows to– change unit of measuremente.g. show age in years or months– define security levelse.g. update access to employee fileread-only to department file45

ViewExternal LayerView The external model represents how data ispresented to users. It is made up of view tables. View tablesare "virtual"-- they do not exist in physicalstorage, but appear to a user as if they did46

Data Independence This is a prime advantage of a database In conventional systems applications are data-dependent For example, if a file is stored in indexed sequential formthen an application must know that the index exists the file sequence (as defined by the index), The internal structure of the application will be built aroundthis knowledge. If, for example, the file was to be replaced by another filewith different indexing structure major modifications wouldhave to be made to the application.47

Cont. Such an application is data-dependent it is undesirable to allow applications to be data-dependent the DBA must have the freedom to change storage structure oraccess strategy in response to changing requirements withouthaving to modify existing applications. Data independence can be defined as‘The immunity of applications to change in storagestructure and access strategy’. Data independence isthe ability to change the schema atone level of the database system without changing the schema atthe next higher level. There are two types Logical data independence Physical data independence48

Data Independence Applications insulated from howdata is structured and stored. Logical data independence:Protection from changes in logicalstructure of data. Physical data independence:Protection from changes in physicalstructure of data. Question:Why are these particularlyimportant for DBMS?49View 1 View 2View 3Conceptual SchemaPhysical SchemaDB

Cont Logical Data Independence: Refers to immunity of external schemas to changes in conceptualschema. Conceptual schema changes e.g. addition/removal of entities shouldnot require changes to external schema or rewrites of applicationprograms. The capacity to change the conceptual schema without having tochange the external schemas and their application programs50

Cont Physical Data Independence The ability to modify the physical schema without changing thelogical schema Applications depend on the logical schema In general, the interfaces between the various levels andcomponents should be well defined so that changes in someparts do not seriously influence others. The capacity to change the internal schema without havingto change the conceptual schema Refers to immunity of conceptual schema to changes in theinternal schema Internal schema changes e.g. using different fileorganizations, storage structures/devices should notrequire change to conceptual or external schemas.51

Schemas, Mappings, and Instances52

Schema Vs data Information in the database is subdivided into two concepts: Schema Data Schema Schema is the concept of how information relates to other pieces ofinformation, and how information should be grouped. Schema describes how data is to be structured - defined at set-up time,rarely changes (part of the "metadata") Comparable to types and variables in programming languages External- different views of data Conceptual- description entities, attributes and r/n ships Internal schema- description of representing fields53

Cont Data Data is actual "instance" of database, may change rapidly Data is the concept of the actual information users want to storein the database. You can only store data in structures which the schemaprovides, so we must ensure that the schema is correct.54

Schemas versus Instances Database Schema: The description of a database. Includes descriptions of the database structure and the constraints thatshould hold on the database. Schema Diagram: A diagrammatic display of (some aspects of) adatabase schema. Schema Construct: A component of the schema or an object withinthe schema, e.g., STUDENT, COURSE. Database Instance: The actual data stored in a database at a particular moment in time. Also called database state (or occurrence).55

Database Schema Vs. Database State Database State: Refers to the content of a database at a moment in time. Initial Database State: Refers to the database when it is loaded Valid State: A state that satisfies the structure and constraints of the database. The database schema changes very rarely. The database state changes every time the database is updated (insertingnew data, changing an existing data or deleting an existing data). Schema is also called intension, whereas state is called extension.56

Mappings Mapping is a process of converting one level to another level. Mappings among schema levels are needed to transformrequests and data. Programs refer to an external schema, and are mapped by theDBMS to the internal schema for execution There are two levels of mapping The conceptual/internal mapping: defines conceptual and internal view correspondence specifies mapping from conceptual records to their storedcounterparts An external/conceptual mapping: defines a particular external and conceptual view correspondence57

Cont A change to the storage structure definition means that theconceptual/internal mapping must be changed accordingly,so that the conceptual schema may remain invariant,achieving physical data independence. A change to the conceptual definition means that theconceptual/external mapping must be changed accordingly,so that the external schema may remain invariant, achievinglogical data independence.58

The Database Management System(DBMS)-Functions and Languages59

The DBMS-Functions and Languages Database Management System (DBMS) is a Software packageused for providing a systematic method for creating, updating, storing,retrieving data in a database. the service of controlling data access, enforcing dataintegrity, managing concurrency control, and recovery. EFFICIENT, CONVENIENT and SAFE MULTI-USER (manypeople/programs accessing same database, or even same data,simultaneously)storage of and access to MASSIVE amounts ofPERSISTENT (data outlives programs that operate on it) data.60

Cont A full scale DBMS should at least have the following servicesto provide to the user 61Data storage, retrieval and update in the databaseA user accessible catalogue (metadata)Transaction support service: ALL or NONE transaction, whichminimize data inconsistency.Concurrency Control Services: access and update on the database bydifferent users simultaneously should be implemented correctly.Recovery Services: a mechanism for recovering the database after a failuremust be available.

Cont 62Authorization Services (Security): must support theimplementation of access and authorization service to databaseadministrator and users.Support for Data Communication: should provide the facilityto integrate with data transfer software or data communicationmanagers.Integrity Services: rules about data and the change that tookplace on the data, correctness and consistency of stored data,and quality of data based on business constraints.

Cont Services to promote data independency between the dataand the applicationUtility services: sets of utility service facilities like 63Importing dataStatistical analysis supportIndex reorganizationGarbage collection

Cont Typical DBMS Functionality (simply) Define a particular database; Construct or Load the initial database Manipulate the database; Process and Share for a set of concurrent users and applicationprograms64

Database languages65

Database languages A DBMS is software package used to develop, manage, andmaintain databases. Each DBMS should have facilities to define the database,manipulate the content of the database and control thedatabase. These facilities will help the designer, the user as well as thedatabase administrator to discharge their responsibility in designing, using and managing the database.66

Cont The two major categories of languages are Data Definition Language (DDL) Data Manipulation Language (DML) But there is also a third category Data Control Language (DCL)67

Cont Data Definition Language (DDL): Language used to define each data element required by theorganization. Commands for setting up schema or the intension of database These commands are used to setup a database, create,drop and alter table with the facility of handling constraints68

Cont Allows DBA or user to describe and name entities, attributes andrelationships required for the application. Specification notation for defining the database schemaExample:create table Student(StudentID eger)69

Cont Some more command examples: CREATE - to create objects in the database ALTER - alters the structure of the database DROP - delete objects from the database TRUNCATE - remove all records from a table, including allspaces allocated for the records are removed COMMENT - add comments to the data dictionary RENAME - rename an object70

Cont Data Manipulation Language (DML): Is a core command used by end-users and programmers tostore, retrieve, and access the data in the database e.g. SQL Since the required data or Query by the user will be extractedusing this type of language, it is also called "Query Language“ Provides basic data manipulation operations on data held in thedatabase. Language for accessing and manipulating the data organized bythe appropriate data model71

Cont DML can be in two forms Procedural DML: user specifies what data is required and how to get thedata. Non-Procedural DML: user specifies what data is required but not how it is tobe retrieved SQL is the most widely used non-procedural language query language72

Cont E.g. find the name of the passenger with passport numberEP0453623select Passenger.Name from passengerwhere passenger.passposrtNo ‘EP0453623’73

Cont Some more command examples: SELECT - retrieve data from the a database INSERT - insert data into a table UPDATE - updates existing data within a table DELETE - deletes all records from a table, the space for the recordsremain CALL - call a PL/SQL or Java subprogram LOCK TABLE - control concurrency74

Cont Still one can consider additional DB languagescalled: Data Control Language (DCL) Database is a shared resource that demands control of dataaccess and usage. The database administrator should have the facility to controlthe overall operation of the system.75

Cont Data Control Languages are commands that will help theDatabase Administrator to control the database. The commands include commands like granting privilegesto access the database or particular object within the databaseand to store or remove database transactions Some examples of Data Control Language (DCL)statements. GRANT - gives user's access privileges to database REVOKE - withdraw access privileges given with the GRANTcommand76

Components of a DB Environment The DBMS is software package that helps to design, manage,and use data using the database approach. Taking a DBMS as a system, one can describe it with respectto it environment or other systems interacting with theDBMS. The DBMS environment has five components. To design and use a database, there will be the interaction orintegration of Hardware, Software, Data, Procedure andPeople.77

Cont Hardware: are components that one can touch and feel. These components are comprised of various typesofpersonal computers,mainframe or any server computers to be used in multiuser system, network infrastructure, and other peripherals required in the system. 78

Cont Software: Collection of commands and programs used tomanipulate the hardware to perform a function. These include components like the DBMS software,application programs, operating systems, networksoftware, language software and other relevantsoftware79

Cont Data: Since the goal of any database system is to have better control ofthe data and making data useful, Data is the most importantcomponent to the user of the database. There are two categories of data in any database system: that isOperational and Metadata. Operational data is the data actually stored in the system to beused by the user. Metadata is the data that is used to store information about thedatabase itself. The structure of the data in the database is called the schema,which is composed of the Entities, Properties of entities, and relationshipbetween entities.80

Cont Procedure: The rules and regulations on how to design and use a database. It includes procedures like how to log on to the DBMS, how to use facilities, how to start and stop transaction, how to make backup, how to treat hardware and software failure, how to change the structure of the database.81

Cont People: thosepeople in the organization that areresponsible or play a role in designing,implementing, managing, administering and usingthe resources in the database. This component includes group of people with highlevel of knowledge about the database and thedesign technology to other with no knowledge ofthe system except using the data in the database.82

Development cycle andRoles in Database Envr’t83

Roles and Development cycle in DBEnvironment Roles in database environment will be at different levels ofdatabase design and development As it is one component in most information systemdevelopment tasks, there are several steps in designing adatabase system. Here more emphasis is given to the design phases of thesystem development life cycle.84

Steps in DB Design Accordingly the major steps in database design are;1. Planning: that is identifying information gap in anorganization and propose a database solution tosolve the problem.2. Analysis: that concentrates more on fact findingabout the problem or the opportunity. Feasibilityanalysis, requirement determination andstructuring, and selection of best design methodare also performed at this phase85

Cont 3.Design: in database designing more emphasis is given to thisphase. The phase is further divided into three sub-phases. Conceptual Logical PhysicalConceptual DesignLogical Design86Physical Design

Cont Conceptual Design concise description of the data, data type, relationship betweendata and constraints on the data. There is no implementation or physical detail consideration. Used to elicit and structure all information requirements Conceptual design is the process of constructing a model of theinformation used in an enterprise, independent of anyphysical considerations. It is the source of information for the logical design phase. Mostly uses an Entity Relationship Model to describe the data atthis level. After the completion of Conceptual Design one has to go forrefinement of the schema, which is verification of Entities,Attributes, and Relationships87

Cont Logical Design: a higher level conceptual abstraction with selected specific data model toimplement the data structure. It is particular DBMS independent and with no other physicalconsiderations. Logical design is the process of constructing a model of the informationused in an enterprise based on a specific data model (e.g. relational,hierarchical or network or object), but independent of a particularDBMS and other physical considerations. It includes Normalization process Collection of Rules to be maintained Discover new entities in the process Revise attributes based on the rules and the discovered Entities88

Cont Physical Design: physical implementation of the upper level design of thedatabase with respect to internal storage and file structure ofthe database for the selected DBMS. To develop all technology and organizational specification. Physical design is the process of producing a description ofthe implementation of the database on secondary storage. -defines specific storage or access methods used by database It Describes the storage structures and access methods usedto achieve efficient access to the data. Tailored to a specific DBMS system -- Characteristics arefunction of DBMS and operating systems Includes estimate of storage space89

Cont Implementation: the testing and deployment of thedesigned database for use.5. Operation and Support: administering andmaintaining the operation of the database system andproviding support to users.4.90

People in DB environment As people are one of the components in DBMS environment,there are group of roles played by different stakeholders at thedifferent levels of the designing and operation of a databasesystem DB Administrator DB designer Application programmer and system analyst User91 page

Reality, metadata and data value Reality is what exists in the real world. This reality might be in a physical or logical form. When developing a database or information system, representation of the reality in the computer system is the first step. The representation would be by considering the properties the reality exhibits. When these properties assume a specific value it would represent