The Three Phases Of Database Design - CS 3200

Transcription

The Three Phases of Database DesignAdapted from Chapter 16, 17, 18 (Connolly & Begg)CS3200

Step 1: Conceptual database designEntity relationship modeling (ER) creates a diagrammaticrepresentation of a conceptual modelConceptual database design: build a conceptual representationof the data domain.Process must identify the necessary entities, relationships andattributesCS3200

Conceptual database design1.2.3.4.5.6.Identify the entity typesIdentify the relationship typesIdentify the attributesIdentify the attribute domainsIdentify the candidate keys and primary keyApply generalization (is-a), aggregation (has-a),composition (part-of)7. Check model for dependency8. Validate conceptual model against usertransactions9. Review model with userCoveredNotCoveredCS3200

Identify entity types1. Identify the nouns in the user requirement specification2. Entities should be major objects NOT properties of otherobjects.3. Objects that have existence in their own right4. Look for entity types that may be synonyms of each othera. Document the synonyms5. All entity names should be well descriptiveCS3200

Identify relationship types1.2.3.4.5.Identify the verbs in the user requirement specificationClassify relationships as complex, binary or recursive.Determine the multiplicity of each relationshipCheck for fan and chasm trapsDocument and assign meaningful names to the relationshipsCS3200

Identify entity and relationship attributes1. Identify the properties or the qualities of the entity types2. Classify each attribute as:a. Simple versus composite attributeb. Single versus multi-valued attributec. Derived attribute (ensure attribute can be derived fromgiven attributes)CS3200

Determine candidate, primary keys1. Identify the candidate keys2. Choose the primary key from the candidate keys that are:a. Candidate key with the minimal set of attributesb. Candidate key that is least likely to be updatedc. Candidate key with the fewest number of bytesd. Candidate key with the lowest maximum valuee. Candidate key that is easiest to manipulate for a user.3. All other candidate keys are designated as alternate keys4. Be willing to add new attributes that provide uniqueness if thecurrent candidate keys are composite5. Make sure that keys are properly identified for weak entityCS3200

EER to represent hierarchical relationships1. Generalization (IS-A) allows us to represent super andsubclasses for an entity type.a. Participation - all members of the superclass must fall intoa subclass {Mandatory Optional}b. Disjoint - subclasses do not share members {And Or}2. Composition (Part-of) allows us to represent an entity type thatcomposes another entity type (strong ownership).3. Aggregation (Has-a) allows us to represent an entity type thathas a collection of another entity typeCS3200

Check model for redundancy1. Review 1-1 relationships to ensure the entity types are reallydifferent entity types and not synonyms.2. Remove redundant relationships: relationships that provide thesame information as another relationship.a. Multiple paths between entity types are a potential sourcefor redundancy3. Consider time and its effect on each relationshipa. Some relationships may seem redundant but really arenecessary due to changes in relationships due to timeCS3200

Validate conceptual model with transactions1. The conceptual data model must provide a response for alluser defined transactions.2. If the model cannot provide an answer, the conceptual modelis not complete.3. Two methods that use two different representations of datamodel:a. Textual description of the user transactionb. Transaction pathway through the conceptual model toretrieve response for the transactionCS3200

Review conceptual model with user1. Must get sign-off from the user that the model capture allnecessary data.2. Implies user has verified all transactions can be answeredCS3200

Summary (Conceptual Design)Creating a conceptual design is an iterative process, whereyour goal is to produce an unambiguous representation of thedata domain and its processes.The goal is communication between a technical team and anontechnical teal to ensure the proposed technical solutionfulfills the needs of the enterprise.CS3200

Step 2: Logical database designOnce the conceptual model is created, we need to target aspecific data model.We will be targeting the relational data model.Logical database design: translate the conceptual model to thelogical structure of the database. For the relational model wemust design the relations for the schema.CS3200

Steps to the logical database design1.2.3.4.5.Derive relations for logical data modelValidate relations using normalizationValidate relations against user transactionsCheck integrity constraintsReview logical data model with userStep 2 will be described in a separate lecture.CS3200

Derive relations for .1.2.3.4.5.6.7.8.9.Strong entity typesWeak entity types1 to many (1:*) binary relationship types1 to 1 (1:1) binary relationship types1 to 1 (1:1) recursive relationship typesSuperclass /subclass relationship typesMany to Many (*:*) binary relationship typesComplex relationship typesMulti-values attributesCS3200

Maps to a relationStrong entityCreate a relation that contains all simple attributesWeak entityCreate a relation that contains all simple attributes - primary key must takeinto account the owner entity’s key*:* binary relationshipCreate a relation for the relationship, including all relationship attributes.Each entity in the relation is a foreign key in the relationship’s relation.1:1 binary relationshipMandatory participationOptional participationBoth entities optionalCombine entities into 1 relationDefine a foreign key for relation associated with mandatory participationYour choice for representation (either can have FK)Multi-valued attributesDefine a relation for the multi-valued attribute and create a foreign key to therelation representing the containing entityComplex relationshipCreate a relation for the relationship, including all relationship attributes.Each entity in the relation is a foreign key in the relationship’s relation.CS3200

Maps to a foreign keyEntity/RelationshipMapping to logical design1:* binary relationshipDefine a foreign key on the “many” side. It points to a candidate key onthe “1” side”. All relationship attributes are stored in the “many”relationship. No relation necessary for relationship.*:* binary relationshipCreate a relation for the relationship, including all relationship attributes.Each entity in the relation is a foreign key in the relationship’s relation.1:1 binary relationshipOptional participationBoth entities optionalMulti-valued attributesDefine a foreign key for relation associated with mandatory participationYour choice for representation (either can have FK)Define a relation for the multi-valued attribute and create a foreign key tothe relation representing the containing entityCS3200

Superclass/Subclass conversionParticipationDisjoint constraintMapping to logical designMandatoryNondisjoint (AND)Single relation with 1 or more attributes acting as adiscriminator for the subclassesMandatoryDisjoint (OR)Many relations one for each subclass/superclasscombinationOptionalNondisjoint (AND)Two relations, 1 relation for the superclass and 1relation for all of the subclasses, subclass needs adiscriminating attribute to differentiate type of subclassOptionalDisjoint (OR)Many relations, one relation for the superclass, onerelation for each subclassCS3200

Classwork: create relations for UMLCS3200

Normalization is coveredIn a separate presentation.CS3200

Check integrity constraintsTypes of integrity constraints1.Identifying attributes that are requireda. For each column decide if it needs to have a value2.Attribute domain constraintsa. List or describe the legal values for each attribute (NULL allowed?)3.Multiplicitya. Ensure the relationship constraints are properly represented4.Entity integritya. Primary key attributes cannot hold a NULL value5.Referential integritya. Foreign key created in the child tuple linking to existing parent tuple6.General constraintsCS3200

Referential integrity defines DB behaviorDefine the desired database behavior to ensure that a child relation NEVERreferences a parent relation instance that does not exist.Review changes to the child relation.1. CREATE a new record in the child relationa.b.2.UPDATE a foreign key attribute in the child relationa.3.If all foreign key attributes are NULL (no check by DB).If not NULL ensure parent tuple existsSame as aboveDELETE a record from the child relationa. Operation cannot violate referential integrity (no check by DB).CS3200

Referential integrity defines DB behaviorReview changes to the parent relation.1. UPDATE a primary key attribute in the relationa.b.c.d.2.DELETE a record from the parent relationa.3.Identify the child tuples in the other table referencing this instanceMay choose to not allow update (ON UPDATE RESTRICT)May choose to allow UPDATE to parent relation to propagate tochild (ON UPDATE CASCADE)May choose to remove the link between the 2 entities (ON UPDATESET NULL or ON UPDATE SET DEFAULT)Same as above except DELETE as oppose to UPDATECREATE a record in the parent relationa.No check to be doneCS3200

Classwork: convert to a logical db designCS3200

Summary (Logical Design)The logical design must represent all entities and relationshipswithin the conceptual design.For the relational model, we have relations and foreign keys torepresent the logical design. All entities, attributes andrelationships must be represented by these relations andforeign keys.CS3200

Step 3: Physical database designOnce the logical model is created, we need to choose thetargeted database management system and determine the bestmethod for physically implementing the logical model.Describes the base relations, file organizations, and indexesand any associated integrity constraints and security measures.Each vendor provides different data storage mechanisms,constraint representation, etc.CS3200

Steps to the physical database design1. Define base relations for the chosen DBMS2. Design representation of derived data3. Design general constraints for the tablea. Not enforced in MySQL but may be provided4. Design file organizations and indexesa. MySQL uses the InnoDB structure a version of B treesb. Indexes: storage mechanism used to speed up data retrievalc. We will study indexes and B trees later in the semesterd. Estimate disk space requirements5. Design user views6. Design security mechanismCS3200

Design base relations1. Specify unique name for each relation2. Specify list of simple attributes and domains, default values,NULLs permitted3. Specify primary key and foreign keys4. Specify referential integrity constraintsCS3200

Design representation of derived data1. Ensure data needed to derive field is present2. If data is difficult to derive, some designers may choose to storethe derived field as a simple field and define rules for it to beupdatedCS3200

Summary (Physical Design)The database schema is created, including support for thederived fields.The file organizations, and indexes and any associated integrityconstraints and security measures are chosen.User views and security constraints are chosen.CS3200

Step 2: Logical database design Once the conceptual model is created, we need to target a specific data model. We will be targeting the relational data model. Logical database design: translate the conceptual model to the logical structure of the database. For the relational model we m