Fourth normal form (4NF)
Violations of this normal form are much less frequent than of the others. It occurs when there are repetitions of groups of data. Before you can learn how to put a table into 4NF, you must first understand multi-valued determinancy. Consider the table in this figure.
|Table that is 3NF but not in 4NF|
Verify that this table is in 3NF: The primary identifier is [Car, Color, Engine]. None of the attributes are dependent on any of the other attributes — thus, there is no partial key dependency or transitive dependency.
However, this table seems to have some sort of repetition. Each color for each car is paired with each engine size for each car (and vice versa of course). And this continues with every other car in the rest of the table. This causes redundancy to rear its ugly head. You can delete a row (e.g., row 1) in the table and still know that the Mustang comes in red. The problem here is not that the pairing of all the colors and all the engine sizes happens sometimes (or even frequently) — the problem is that it happens all the time for this table. For every car all of its possible colors will be paired with all of its possible engines. In the real world the only way to know this is to ask the person who enters data or who maintains the data in the table. You cannot tell just by looking at a small subset of the data — you have to know how the attributes are populated (i.e., filled in). This type of redundancy (that is, the type discussed here and demonstrated in the above table) is caused by multi-valued determinancy. The following defines this term and provides an algorithmic (i.e., procedural, mechanical, step-by-step) method of detecting this condition.
Suppose there is a relation R with a set of attributes S that contains a multivalued dependency (MVD) X —» Y. The remaining attributes of the relation R are called Z. The relation X —» Y is an MVD if, whenever R contains a pair of rows that contain duplicate values of X, then R also contains the pair of rows obtained by interchanging the Y values in the original pair. Note that X and Y may contain either single or composite attributes. (Adapted from Database Modeling & Design by Teorey, 2nd edition, p. 110.)
Let's apply this definition to the above table. In this first series of steps (in the list below), we're not going to try to solve the problem of multi-valued determinancy, we're simply going to try to understand this definition. It's a bear.
- R is the car relation.
- S is the set of attributes [car, color, engine].
- X is [car].
- Y is [color] (or you could have chosen [engine]).
- Find any pair of rows that have the same value of [car] (i.e., X). To start, look at rows 1 and 6.
- Exchange the values of [color] in these two rows. Now you have [mustang, blue, F3.2L] and [mustang, red, F4.5L].
- Verify that two rows exist with these values. They do (rows 5 and 2).
- Continue this for all other pairs of rows for which the value of [car] is the same.
Since [car] —» [color] in the car relation passes this test (as does [car] —» [engine]), we have verified that this relation does contain an MVD.
Further, it is always the case that if X —» Y, then X —» Z. Because MVDs always go in pairs this way, you will sometimes see them represented as X —» Y | Z.
Apply this observation to the car relation to verify that [car] —» [engine] is, indeed, an MVD.
The MVDs in this case can be fixed by breaking up the original table into two tables (as shown in the following two tables).
|Tables that are in 4NF|
|Car color table|
|Car engine table|
With this table structure you can still determine, for example, that a red Mustang can have either an F3.2L or F4.5L engine. However, when you delete a one piece of information (i.e., one row from one of the tables), you actually do lose that information (as you should) as opposed to when a row is eliminated from the original relation.
Disclaimer: These rules do not ensure that you will get rid of all 4NF violations — just some of the common types. For a more complete discussion of this difficult problem, see either Database Modeling & Design by Teorey (2<SUP>nd</SUP> edition) or An Introduction to Database Systems by Date.
|Rules to get to 4NF|
|In the 3NF relation, look at all primary identifiers with at least two attributes. For each one of these identifiers:|
|1||Determine if there is a multi-valued determinancy of the form F —» M, where all the attributes in F and all the attributes in M are part of the primary identifier.|
|2||Find any attributes in the relation that are multi-determined by the combination of F & M. Call this attribute (or set of attributes) K.|
|3||Move F, M, K to another relation. The representation is [F, M] —» K.|
|4||Go through the process of separating multi-dependencies for this newly-created relation.|
|5||Drop K from the original relation.|
|6||Determine if there are any other attributes in the relation that are determined by the combination of F & M. Call this attribute (or set of attributes) D.|
|7||Create another relation [F, M] —> D.|
|8||Delete M, D from the original relation.|
Consider the table shown in this table.
|A simple table with a violation of 4NF|
What normal form is this in? How do you know this? If it is not already, put it in 3NF. Then put it in 4NF.
Consider a slightly more complicated car example shown in this table.
|Table that is in 3NF but not in 4NF|
The difference here is that the car and color together determine what color of stripes are available for the car and what color the roof can be. The manufacturer is also included. Put this into 4NF.
Continue with conditional dependencies.