This is not strictly a part of the normalization process but can be considered an extension of it that helps improve storage efficiency for the database. Consider a table that stores information about companies from around the world. The table has four fields: co-id, name, ceo, fed-tax-id — and is in 3NF in the following form:
co-id —> [name, ceo, fed-tax-id]
The problem comes about when the company in question is not from the United States. These companies do not have a federal tax identifier. Thus, for a whole range of companies, the fed-tax-id field will be empty — and will always be empty. The link from co-id to fed-tax-id is called a conditional dependency. The solution involves creating a new table so that the original relation is now
co-id —> [name, ceo]
co-id - -> fed-tax-id
The first relation will not have many empty attributes — certainly, you will expect the company will always have a name and it should have a CEO almost all of the time. And, now, only those companies that have federal tax ids will be included in the second relation. These two relations give storage efficiency more weight while the original provides better retrieval performance at the expense of storage efficiency.
Rules to remove conditional dependencies
- In the 3NF relation, find a set of attributes that are conditionally dependent on the primary identifier. Each attribute in the set should have the same underlying condition; that is, each should either be empty or have a value under the same conditions. These relations have the form PI - -> C.
- For each one of these conditional dependencies:
- Create another relation containing attributes PI and C.
- Define PI as the primary identifier of this new relation.
- Remove attributes C from the original 3NF relation.
Suppose that you were given this table.
|A conditional dependency|
The problem here is that many students are not US students so that they do not have Social Security numbers. Change this table so that the conditional dependency problem is alleviated.
Read on about using graphs to represent relationships among data.