Modern relational database management systems (DBMSs) allow you to have only one value per attribute for each entity occurrence. There are a couple ways to violate the single value restriction. The first involves assigning several values to the same attribute:
Emp ID | Children | Gender |
---|---|---|
002 | Nan, Pat, Lee | F, M, M |
006 | Todd | M |
This should be redefined as a relation with attributes Emp ID, Child, and Gender — but with only one value per attribute. The resulting table should have four rows.
Similarly, each attribute should be separated from the other attributes. Thus, the following is not allowed (or, at least, not encouraged):
Emp | Parents |
---|---|
002 | Don, Sharon |
006 | Scott, Nancy |
The Parents attribute should be broken up into two attributes, Father and Mother.
Another common (but incorrect) way of representing the employee and child information above is as follows:
Emp ID | Ch1 | G1 | Ch2 | G2 | Ch3 | G3 | Ch4 | G4 |
---|---|---|---|---|---|---|---|---|
002 | Nan | F | Pat | M | Lee | M | ||
006 | Todd | M |
The above series of attributes ChX and GenderX is known as a repeating group. This is a bad way to design a relation (and, hence, a data base table). For now, consider this: What if someone has six children? Well, you could find out what the greatest number of children anyone has ever had and define that many ChildX and GenderX attributes. That would be a little inefficient. Suppose that this number is 30. What if you wanted to find all children with the name of Todd? You would have to look in 30 different fields. This is also very inefficient. It is quite simple to fix this problem by simply creating a relation with Emp ID, Child, and Gender (and four rows). What would that table, and its data, look like? Write down the first two rows of that table.
Derived data are values that can be computed from values in other attributes.
A relation should not contain any derived data. For example, an attribute should not hold the multiplicative product of two other attributes or the grand total of some other attribute. Suppose that a table lists the registrants for a seminar. Nowhere in the data base should there be an attribute that holds the number of registrants for a particular seminar. This value can be calculated by counting the number of registrants (records) for that seminar. Conversely, the maximum number of allowed registrations would be stored because there is no way of calculating this value from other values.
A table is in 1NF if it adheres to the single value restriction and disallows repeating groups and derived data.
Before defining the rules for putting a relation into 1NF, I first define some terms and demonstrate their usage.
Item A is said to functionally determine item B if for any specific item A there is one and only one item B.
If A functionally determines (or, simply, “determines”) B, then A is called the determinant of B.
Thus, if you know A, then you must know the value of B. Conversely, if you know B, you don't know anything about A. The following six lines are equivalent:
- person —> birth date
- person functionally determines birth date
- There is a functional dependency from person to birth date
- If I know the person, then I know his or her birth date.
- There is one and only one birth date for any one person.
- A person can only have one birth date.
This is what functional dependency means. If you know A, then you must know B. In this example person is called the determinant of birth date. Note that you don't know anything about the person if you know a particular birth date; many people can have the same birth date.
Consider the Inventory table shown below.
Inventory table | ||||
---|---|---|---|---|
Part # | Part name | Supplier | Address | Bin |
3 | hammer | Kay Sera | Ann Arbor, MI 48109 | D |
4 | screwdriver | Dicky Gear | Memphis, TN 38138 | E |
5 | saw | Kay Sera | Ann Arbor, MI 48109 | F |
6 | plane | Dicky Gear | Memphis, TN 38138 | E |
4 | screwdriver | Kay Sera | Ann Arbor, MI 48109 | E |
The data shown in this table suggests (but does not conclusively show) that Kay Sera is always located in Ann Arbor. It could be that there is another Kay Sera outlet in Ypsilanti. Well, then, how would you know if this were true? You would ask! Database analysts and designers must do that. The data shown from this table suggests that this fact is true but do not prove it; you do not know that she does not have other locations unless someone tells you. The value of the analyst or the designer lies in knowing that the question must be asked.
Related to the concept of determinancy are those of superkey, candidate key (aka candidate identifier), and primary key (aka primary identifier). (You'll have to read the following definitions and example a couple times to get the idea.)
Suppose that there is some relation R and the set of its attributes A.
A superkey S is any set of attributes (that is some subset of A) that functionally determine all the attributes in A.
A candidate key C (or candidate identifier) is a subset of a superkey that is still a superkey but that cannot be further reduced to another superkey.
A primary key P (or primary identifier) is some arbitrarily chosen candidate key.
Lets go through the process of identifying all of these in an example we're already familiar with.
Though there may be other possible primary identifiers for a relation (that is, there may be many candidate identifiers), the primary identifier is the chosen one. There may be no special reason for choosing one identifier as the primary identifier; there may be good reasons — it does not matter.
Notice that it is always the case that primary identifiers will be an element of the set of candidate identifiers which, in turn, is a subset (not necessarily proper) of the set of superkeys.
Rules to get to 1NF | |
---|---|
1 | Break up all attributes so that they are single-valued. |
2 | Remove all derived data attributes. |
3 | Define a primary identifier for the relation. |
4 | Remove all repeating groups. Put them in another relation. Define a primary identifier for the new relation. |
Note: As you're reading through this, you will come across lots of problems and their associated answers. You should absolutely read the problems and then try to answer them before you read the answer that I provide. This is the absolute best way to learn and remember the material. Don't try to take shortcuts while you're reading this material. It's too difficult and too full of nuances.
Problem 2.1
Suppose you have a relation R with attributes [A, B, C] and you know that A —> B and that B —> C. What are the superkeys of this relation? Candidate identifiers? A primary identifier? Any other primary identifiers?
Note: I am generally not all that interested in superkeys for their own sake. I am interested in candidate keys. Candidate keys are all candidates running for the position of primary key, so each candidate key is a possible choice for the very special role of primary key. I usually talk about superkeys in the context of “that set of attributes is a superkey but not a candidate key” — that is, that set of attributes has an extra field or two in it so it cannot be a candidate key.
Problem 2.2
Suppose that a company has an Employee relation in which it keeps information such as employee id, name, address, and social security number (SSN). The primary identifier for this relation is employee id since the following is true: “If I know the employee's id, then I know the employee's name, address, and SSN.” Then I have the following:
id —> [name, address, SSN]
It is also true that each employee can only have one SSN and that any SSN can only be assigned to one person. Thus, I also have the following:
SSN —> id
Suppose you are considering the relation id, name, address, ssn. What are its determinants? What is (are) the candidate identifier(s)? Name one primary identifier. Could you have chosen any others?
Problem 2.3
Define new relations and their attributes. Also identify the attribute(s) that make up the primary identifier. Create the associated tables and fill in all the data for each. How many rows does each table have?
ID | Price | Component Parts |
---|---|---|
Press | 5 | 134, 145, 165 |
Lathe | 8 | 134, 139, 199 |
Mill | 10 | 135, 137 |
This table can be interpreted as: a product has a given price and is made up of several parts.
Now, if I were you I would go back and start reading from the beginning of this page again — with the sleeping and everything you probably missed a slight nuance here and there. And you gotta get this stuff right or you're going to be in trouble — because mastering the rest of all of this depends on your complete understanding of this section.
Continue with 2nd normal form.