In this previous section I used an example to present an overview of how and why ER modeling is performed. In this section I present more detail on some of the basic concepts.
Relationships define which entity types are directly associated with which other entity types. In the example in an earlier section, we saw that divisions are directly associated with departments and departments are directly associated with employees. No direct association between division and employee was given. This does not mean that there is no relationship between division and employee. In fact, the ER diagram tells us that there is a relationship between the two: > Given any one division, there can be many employees managing departments within that division.
Certainly, this is not earth shattering news. But it is in the ER diagram. The above fact is not represented as a separate relationship between division and employee because it can be inferred from existing relationships. An ER diagram should contain the minimum number of relationships necessary to reflect the situation.
Once a relationship between entity types has been established, the analyst should determine its cardinality.
A relationship's cardinality defines the maximum number of entities of one type that can be associated with an entity of another type.
For relationships between two entity types, there are three basic cardinalities. Each of the following descriptions are given in terms of a relationship between entity type X and entity type Y.
- — one-to-one — One entity of type X can be associated with, at most, one entity of type Y. One entity of type Y can be associated with, at most, one entity of type X. An example: the relationship between car and steering wheel. A car has only one steering wheel and a steering wheel can only be installed in one car.
- — one-to-many — One entity of type X can be associated with many entities of type Y. One entity of type Y can be associated with, at most, one entity of type X. An example: the relationship between building and rooms. A building can have many rooms but a room can be in, at most, one building.
- — many-to-many — One entity of type X can be associated with many entities of type Y. One entity of type Y can be associated with many entities of type X. An example: the relationship between a car and its options (such as air conditioning, ABS brakes). A car can have many options and an option can be installed on many cars.
Determine the cardinality of the relationships between the following four pairs of entity types. For each relationship you have to answer two questions:
- For each entity of type Y there can be, at most, _ entity or entities of type X.
- For each entity of type X there can be, at most, _ entity or entities of type Y.
Answering these two questions gives you the answer to the following questions. For example, if you answered M to the first question and 1 for the second question, then this relationship between entity types X and Y is of cardinality M:1. This is read as “for every X there can be only one Y and for every Y there can be many X.” Realize that you will have to make assumptions about the situations below to clarify some of these relationships.
- Patient under care of primary care physician
- Physician performs operation
- Doctors have speciality in disease
- Needle injected into patient
In the previous section we were concerned with the maximum number of entities of one type that can be associated with an entity of another type. In this section we examine the minimum number of entities in a relationship.
A relationship's existence defines what we know about the existence of any entity on the other side of a relationship from a given entity. Existence is given as optional, mandatory, or unknown.
This is best clarified with an example. Consider again the example discussed in Section 2. Specifically, focus on the manage relationship between department and employee. We know the cardinality is 1:1. This tells us that at most one department is managed by an employee and an employee can manage, at most, one department. (Be sure you understand the distinction between these two phrases.) The existence of this relationship tells us the fewest number of departments that can be managed by an employee and the fewest number of employees that can manage a department. Only one of the following can be true:
- A department need not have any manager.
- A department must have at least one manager.
- It is unknown whether or not a department has to have a manager.
Similarly, only one of the following may be true:
- An employee need not manage any department.
- An employee must manage at least one department.
- It is unknown whether or not an employee must manage a department.
For each set of three above, which ones would you choose?
Going back to the definition of existence, we can also look at this situation in this way:
- Given any (randomly chosen) department, there must be an employee on the other side of the manage relationship. Thus, the relationship is mandatory in this direction. This is indicated by a dash on the line closer to the middle of the relationship line (you'll see this below).
- Given any (randomly chosen) employee, there need not be any department on the other side of the manage relationship. Thus, the relationship is optional in this direction. This is indicated by a circle on the line (you'll also see this below).
I assume that the contains relationship is mandatory in both directions. Given this information, the ER diagram is modified in the following manner:
This diagram is beginning to look a little complicated but remember the following pieces of information and it gets a little easier:
- You “read” information about a relationship in two passes: going left-to-right and going right-to-left (it doesn't matter which pass is read first).
- When “reading” information about a relationship off an ER diagram, when going left-to-right, cover up the left side of the line; when going right-to-left, cover up the right side of the line.
- The marks closest to the middle tell you the minimum number in a relationship. A dash on the line looks like a 1; it tells you the minimum number in the relationship is one so the existence is mandatory. A circle on the line looks line a 0; it tells you the minimum number in the relationship is zero so the existence is optional. If there are not any marks on the line, then the existence is unknown.
- The marks next to the rectangle tell you the maximum number in a relationship. A crow's foot tells you the cardinality is many while a dash next to the box on the relationship line tells you it is one.
Let's practice this. Look at the relationship between the department and employee entity types.
- Go from left-to-right.
- Cover up the left side of the line between department and employee.
- Now all you see are the department entity type, the employee entity type, and the right side of the line with two dashes on it and the words “managed by”.
- This tells us that a department is managed by at least (the first dash) one employee and at most (the second dash) one employee. Rephrasing: A department is managed by one and only one employee.
- Go from right-to-left.
- Cover up the right side of the line between department and employee.
- Now all you see are the department entity type, the employee entity type, and the left side of the line with a circle and dash on it.
- This tells us that an employee does not have to manage any departments (the circle) and may manage at most (the dash) one department. Rephrasing: An employee may manage no more than one department.
For each of the relationships listed in Problem 5:
- Define the existence in both directions.
- Draw the ER diagram for the relationship.
- Write out two sentences that represent what the ER diagram says.
Entity types are things for which it is important that your company capture data. If it is not important, it should not be in the database. In an accounting database you would expect to find entity types for expenses, assets, liabilities, expenditures, deposits, etc. You would not expect to find entity types for color of check, quality of dollar bills received, etc. The database is supposed to reflect reality — but only the part of reality that is important to the company.
Entity types are entities that share a common definition. This allows us to make generalizations about that type. This is a powerful capability; however, sometimes we want to make a generalization only about a certain subset of those entities and another generalization about the rest of the entities. Consider a simple example. Suppose you have an accounting database which keeps track of accounts receivable and accounts payable. Of course the database keeps track of the companies to which you owe money and the companies that owe you money. For all these companies, you keep track of their mailing address and a contact person. For the companies that owe you money you keep track of how much they owe you. For the companies that you owe money you keep track of how much you owe them. What to do? Should we have three entity types: one for the whole set and one for each subset? That would be a mess. That is why the concept of entity subtypes was created.
An entity subtype is a collection of entities of the same type to which a narrower definition and additional attributes and/or relationships apply.
In this company example, what is the entity type? What are the subtypes?
There are many situations in which subtypes can be created but should not be. Only create subtypes
- if the subtype is involved in relationships that the other subtypes are not or
- if the subtype needs to have additional facts stored with it.
If one of these two requirements is not met, then do not create the subtype.
What is the relationship among the following? List a few facts common to all items for each question. List a few facts about each subtype that is not common to the other subtype. Also draw the ER diagram for each. </P>
- book, publication, magazine
- individual, employee, contact person
- laptop, computer, desktop
On the other hand, there are some situations that are not so clear cut. Consider the following figure.
Many students would first suggest the diagram on the right — divide customers into investors and attendees and show that investors buy stocks and attendees register for seminars. I suggest that the figure on the left is better. Here's my thought process:
What is it that makes an investor an investor? She buys stocks. And what is it that makes an attendee an attendee? She registers for seminars. Is there anything about an investor that keeps her from being an attendee? No. Vice versa? No. Do you want to prevent investors from being classified as attendees or vice versa? No and no. So, define relationships as shown in the figure above for the customer entity. Investors can be listed by choosing only those customers that are in the left relationship. Attendees can be listed by choosing only those customers that are in the right relationship.
Thus, if a relationship defines the members of a proposed subtype, then use the relationship instead of the subtype.
For some people this can one of the more difficult concepts to understand, so read carefully. What we are trying to discern here is the difference between a type of a thing and an actual thing. This is a pretty easy concept when comparing people and Joe. People is the type and Joe is the instance. However, modellers generally don't make the type/instance distinction between an entity type and an entity — they generally make it between two entities. For example, think about “CIS320” and “section 2 of CIS320 in Winter 1962.” The second is an instance of the first. The section is an actual class that meets at an actual time with an actual teacher and actual students. CIS320 is a type of thing that is an idea that only becomes real when you come into contact with one of its instances (e.g., section 2 of CIS320 in Winter 1962).
Realize that this is a different distinction than that between entity types and entities. In this example, CIS320 is one specific instance of the entity type Course and section 2 of CIS320 in Winter 1962 is one specific instance of the entity type Section. Thus, both are entities and neither one is an entity type.
Analogously to the CIS320 story above, fill out the story for a Boeing 747 relative to the type/instance distinction and the distinction between entity types and entities.
To this point we have focused on entity types and relationships among them. We have mentioned, in passing, “facts” about entity types and “attributes” of entity types. In this section I hope to make these ideas a little more clear.
Attributes are the characteristics of an entity type that we are interested in.
An attribute is a descriptor whose values are associated with individual entities of a specific type.
The attribute value for any single entity can have only one value at a given time. This value can change over time. An attribute of an employee might be salary. At any one time if you asked for the salary level of a certain employee, then you should get one answer. And if someone else asked the same question about that employee at the exact same time, they would expect to get the same answer. Of course, if you asked this question at a later time you might expect to get a different answer.
Think back to the example in Section 2. Few attributes are mentioned in the description but a few can be inferred. The department entity type has a name attribute, as do the division and employee entity types. Possible attributes for the employee entity type that aren't mentioned include date of hire, home mailing address, work phone, and work address.
Every entity type has an identifier. This identifier uniquely identifies a single (at least one, and no more than one) entity. If you know the value of the identifier, then you know exactly which entity you are dealing with. Further, the identifier's value will never change over time. Thus, if you know the identifier now, then you can be confident that at any time in the future the identifier for that entity will not have changed.
Social security number is a possible identifier for a person. What is a possible problem with using it as an identifier? Why shouldn't we use people's names as an identifier?
For each of the following entity types, come up with a possible identifier:
- UM student
- UM building
Suppose that we have a database that has the table car, and this table has an identifier vin (vehicle id number) and another field called model (e.g., “Ford Mustang GT”, “Ford Mustang SHO”, or “Accord Integra GS-R”). Suppose that we want to add information about the type of the model (e.g., “sporty”, “sedan”, “SUV”) to this database. How would you do it? You have two choices — either add model field to the car table or create another table to hold this information.
Continue with the next section on advanced topics in ER modeling.