Chapter 6 Database Tables & Normalization

Transcription

Chapter 6Database Tables & Normalization Objectives: to learn Normalization:––––– What normalization is and what role it plays in thedatabase design process– About the normal forms 1NF, 2NF, 3NF, BCNF,and 4NF– How normal forms can be transformed from lowernormal forms to higher normal forms– That normalization and ER modeling are usedconcurrently to produce a good database design– That some situations require denormalization togenerate information efficientlyCS275 Fall 2010A process for assigning attributes to entitiesReduces data redundanciesHelps eliminate data anomaliesProduces controlled redundancies to link tables Normal Forms are a series of stages done inNormalization––––11NF - First normal form,2NF - Second normal form,3NF - Third normal form,4NF - Fourth normal form2CS275 Fall 2010Database Tables & NormalizationThe Need for Normalization Example: Company which manages buildingprojects. The business rules are: Normal Forms (cont’)– 2NF is better than 1NF; 3NF is better than 2NF– For most business database design purposes, 3NFis as high as needed in normalization– Charges its clients by billing hours spent on eachcontract– Hourly billing rate is dependent on employee’sposition Denormalization produces a lower normal formfrom a higher normal form.– Highest level of normalization is not always mostdesirable– Increased performance but greater dataredundancy Periodically, report is generated that containsinformation such as displayed in Table 6.13CS275 Fall 2010CS275 Fall 201041

The Need for NormalizationThe Need for Normalization Data often comes from tabular reports Desired Output - Classic control-break report. Acommon type of report from a database.CS275 Fall 201056CS275 Fall 2010Creating Entities from Tabular DataThe Normalization Process Structure of data set in Figure 6.1 does not handledata very well Relational database environment is suited to helpdesigner avoid data integrity problems– Each table represents a single subject– No data item will be unnecessarily stored in morethan one table– All nonprime attributes in a table are dependenton the primary key– Each table is void of insertion, update, deletionanomalies– Primary key - Project # contains nulls– Table displays data redundancies Report may yield different results depending onwhat data anomaly has occurred– Update - Modifying JOB CLASS– Insertion - New employee must be assignedproject– Deletion - If employee deleted, other vital data lostCS275 Fall 2010 Normalizing table structure will reduce dataredundancies7CS275 Fall 201082

The Normalization ProcessConversion to First Normal Form Objective of normalization is to ensure that alltables are in at least 3NF Normalization works one Entity at a time It progressively breaks table into new set ofrelations based on identified dependencies Normalization from 1NF to 2NF is three-stepprocedure. Step 1: Eliminate the Repeating Groups– Eliminate nulls: each repeating group attributecontains an appropriate data value Step 2: Identify the Primary Key– Must uniquely identify attribute values– New key can be composed of multiple attributes Step 3: Identify All Dependencies– Dependencies are depicted with a diagram9CS275 Fall 2010Step 1: Conversion to 1NFCS275 Fall 201010Step 1 - Eliminate the Repeating Groups Step 1: Eliminate the Repeating Groups– A Repeating group is group of multiple entries ofsame type existing for any single key attributeoccurrence– Present data in tabular format, where each cell hassingle value and there are no repeating groups– Eliminate repeating groups, eliminate nulls bymaking sure that each repeating group attributecontains an appropriate data value Repeatinggroups must be eliminated11CS275 Fall 2010CS275 Fall 2010123

Step 3- Conversion to 1NFStep 2 - Conversion to 1NF Step 2 - Identify the Primary Key Step 3 - Identify All Dependencies– Review (from Chapter 3) Determination and attributedependence.– Depicts all dependencies found within given tablestructure– Helpful in getting bird’s-eye view of allrelationships among table’s attributes1. Draw desirable dependencies based on PKey2. Draw less desirable dependencies– Partial» based on part of composite primary key– All attribute values in the occurrence are ‘determined’by the Primary Key. The Primary Key Must uniquelyidentify the attribute(s).– Resulting Composite Key : PROJ NUM and EMP NUM– Transitive» one nonprime attribute depends on another nonprimeattribute13CS275 Fall 201014CS275 Fall 2010Step 3 - Dependency Diagram (1NF)Resulting First Normal Form The connections above the entity show attributesdependent on the currently chosen Primary Key,the combination of PROJ NUM and EMP NUM. The arrows below the dependency diagramindicate less desirable partial and transitivedependencies First normal form describes tabular format:– All key attributes are defined– No repeating groups in the table– All attributes are dependent on primary key All relational tables satisfy 1NF requirements Some tables contain other dependencies and shouldbe used with caution– Partial dependencies - an attribute dependent ononly part of the primary key– Transitive dependencies – an attribute dependenton another attribute that is not part of the primarykey.15CS275 Fall 2010CS275 Fall 2010164

Conversion to Second Normal FormCompleted Conversion to 2NF Step 1: Eliminate Partial Dependencies Each Key component establishes a new table Table is in second normal form (2NF) when:– Start with 1NF format and convert by: Write each part of the composite key on it’s own line. Write the original (composite) key on last line– It is in 1NF and– It includes no partial dependencies:– Each component will become key in new table Step 2: Assign Dependent Attributes No attribute is dependent on only portion ofprimary key– From the original 1NF determine which attributes aredependent on which key attributes Step 3: Name the tables to reflect its contents & function– Note: it is still possible to exhibit transitivedependency Attributes may be functionally dependent onnonkey attributesPROJECT (PROJ NUM, PROJ NAME)EMPLOYEE (EMP NUM, EMP NAME, JOB CLASS, CHG HOUR)ASSIGN (PROJ NUM, EMP NUM, HOURS)17CS275 Fall 201018CS275 Fall 2010Completed Conversion to 2NFConversion to Third Normal Form Step 1: Eliminate Transitive Dependencies––Write its determinant as PK for new table.And Leave it in the Original Table Step 2: Reassign Corresponding DependentAttributes– Identify attributes dependent on each determinantidentified in Step 1, and list on new table. Step 3: Name the new table(s) to reflect itscontents and functionPROJECT (PROJ NUM, PROJ NAME)EMPLOYEE (EMP NUM, EMP NAME, JOB CLASS)ASSIGN (PROJ NUM, EMP NUM, HOURS)JOB(JOB CLASS, CHG HOUR)CS275 Fall 201019CS275 Fall 2010205

Resulting Third Normal FormImproving the Design A table is in third normal form (3NF) when bothof the following are true: Table structures should be cleaned up toeliminate initial partial and transitivedependencies Normalization cannot, by itself, be relied on tomake good designs It reduces data redundancy and builds controlledredundancy. The higher the NF,– It is in 2NF– It contains no transitive dependencies– the more entities one has,– the more flexible the database will be,– the more joins (and less efficiency) you have.21CS275 Fall 201022CS275 Fall 2010Improving the DesignSurrogate Key Considerations Additional issues to address and possibly change,in order to produce a good normalized set oftables: When primary key is considered to be unsuitable,designers use surrogate keys System-assigned primary keys may not preventconfusing entries, but do prevent violation ofentity integrity. Example: data entries in Table 6.4 areinappropriate because they duplicate existingrecords––––––Evaluate PK AssignmentsEvaluate Naming ConventionsRefine Attribute AtomicityIdentify New AttributesIdentify New RelationshipsRefine Primary Keys as Required for DataGranularity– Maintain Historical Accuracy– Evaluate Using Derived AttributesCS275 Fall 201023CS275 Fall 2010246

Improving the DesignHigher-Level Normal Forms Identifying new attributes Tables in 3NF perform suitably in businesstransactional databases Higher-order normal forms are useful onoccasion Two special cases of 3NF:– Boyce-Codd normal form (BCNF)– Fourth normal form (4NF)CS275 Fall 201025The Boyce-Codd Normal Form (BCNF)26The Boyce-Codd Normal Form (BCNF) Most designers consider the BCNF as a specialcase of 3NF Table is in 3NF when it is in 2NF and there are notransitive dependencies Table can be in 3NF and fail to meet BCNF Every determinant in table is a candidate key– Has same characteristics as primary key, but forsome reason, not chosen to be primary key When table contains only one candidate key, the3NF and the BCNF are equivalent BCNF can be violated only when table containsmore than one candidate key– No partial dependencies, nor does it containtransitive dependencies– A nonkey attribute is the determinant of a keyattribute– example:Section(coursename, sectionno, courseno, time, daysCS275 Fall 2010CS275 Fall 201027CS275 Fall 2010287

The Boyce-Codd Normal Form (BCNF)The Boyce-Codd Normal Form (BCNF) Occurs most often when the wrong attribute waschosen as part of the composite Primary Key. Return to 2NF and correct by: When part of the key is dependent on anothernon-key attribute, ie. another candidate key.– Create a new composite key with C, not B.– Create a new table eliminating the new partialdependency.CS275 Fall 201029The Boyce-Codd Normal Form (BCNF)CS275 Fall 2010The Boyce-Codd Normal Form (BCNF) Resulting BCNF with two entities Non-Boyce-Codd Normal Form– Can only exists with composite Primary Key –– Example Enroll entity:– Enroll, with composite PK Stu ID & Class code.– Class with Class code as it’s PK.Enroll(Stu ID, Staff ID, Class Code, Enroll Grade)31CS275 Fall 201030CS275 Fall 2010328

Fourth Normal Form (4NF)Fourth Normal Form (4NF) Table is in fourth normal form (4NF) when bothof the following are true: Two Examples of multi-valued dependencies StudentID,StName,Phones(Home,Work,Cell,Fax)– It is in 3NF– No multiple sets of multivalued dependencies StudentID,Addresses(permanent, mailing, current) 4NF is largely academic if tables conform tofollowing two rules: Convert multi-valued phones using twoadditional tables in 3NF– All attributes dependent on primary key,independent of each other– No row contains two or more multivalued factsabout an entity Student(StudentID, StName, .) StuPhones(StudentID, PhoneType, Phone#) Phones(PhoneType, Description)33CS275 Fall 201034CS275 Fall 2010Fourth Normal Form (4NF)Denormalization Example: Tracking employee’s volunteer service Creation of normalized relations is importantdatabase design goal Processing requirements should also be a goal If tables are decomposed to conform tonormalization requirements:– Number of database tables expands– Causing additional processing– Loss of system speedCS275 Fall 201035CS275 Fall 2010369

DenormalizationNormalization and Database Design Conflicts are often resolved throughcompromises that may include denormalization Defects of unnormalized tables: Normalization should be part of the designprocess Make sure that proposed entities meet requirednormal form before table structures are created Many real-world databases have been improperlydesigned or burdened with anomalies You may be asked to redesign and modifyexisting databases– Data updates are less efficient because tables arelarger– Indexing is more cumbersome– No simple strategies for creating virtual tablesknown as views Use denormalization cautiously– Understand why—under some circumstances—unnormalized tables are a better choice37CS275 Fall 2010Data-Modeling Checklist38Normalization and Database Design Data modelingtranslates specificreal-worldenvironment into adata model Data-modelingchecklist helpsensure that datamodeling tasks aresuccessfullyperformedCS275 Fall 2010CS275 Fall 2010 ER diagram– Identify relevant entities, their attributes, andtheir relationships– Identify additional entities and attributes Normalization procedures– Focus on characteristics of specific entities– Micro view of entities within ER diagram Difficult to separate normalization process fromER modeling process39CS275 Fall 20104010

SummarySummary Normalization is a technique used to minimizedata redundancies Normalization is an important part of the designprocess Whereas ERD’s provide a macro view,normalization provides micro view of entities First three normal forms (1NF, 2NF, and 3NF) aremost commonly encountered Table is in 1NF when:– Focuses on characteristics of specific entities– May yield additional entities Table is in 2NF when it is in 1NF and contains nopartial dependencies Table is in 3NF when it is in 2NF and contains notransitive dependencies– All key attributes are defined– All remaining attributes are dependent on primarykey Difficult to separate normalization from E-Rdiagramming – do both techniques concurrently.41CS275 Fall 2010CS275 Fall 201042CS275 Fall 201044Summary Table that is not in 3NF may be split into new tablesuntil all of the tables meet 3NF requirements Table in 3NF may contain multivalueddependencies– Numerous null values or redundant data Convert 3NF table to 4NF by:– Splitting table to remove multivalued dependencies Tables are sometimes denormalized to yield lessI/O, which increases processing speedCS275 Fall 20104311

Improving the DesignImproving the Design Contracting Company Example Contracting Company Example45CS275 Fall 2010CS275 Fall 201046Improving the Design Contracting Company ExampleCS275 Fall 20104712

database design process – About the normal forms 1NF, 2NF, 3NF, BCNF, and 4NF – How normal forms can be transformed from lower normal forms to higher normal forms – That normalization and ER modeling are used concurrently to produce a good database design – That some situations r