원본 출처 : http://www.inconcept.com/JCM/December1999/halpin.html<br><hr noshade width=100%><br>
<HTML><BODY>
<p align="center"><strong><font face="Arial Black">Entity
Relationship Modeling from an ORM Perspective: Part 1</font> -- <em><font face="Arial">By Dr. Terry
Halpin</font></em></strong>
<font face="Arial Black">Introduction</font>
<p><font face="Arial">Entity Relationship modeling (ER) views the
application domain in terms of entities that have attributes and participate
in relationships. For example, the fact that an employee was born on a date
is modeled by assigning a birthdate attribute to the Employee entity type,
whereas the fact that an employee works for a department is modeled as a
relationship between them. This view of the world is quite intuitive, and in
spite of the recent rise of UML for modeling object-oriented applications,
ER is still the most popular data modeling approach for database
applications.</font></p>
<p><font face="Arial">The ER approach was originally proposed by Peter Chen
in 1976, in the very first issue of an influential ACM journal [2]. As shown
in Figure 1, Chen's original notation used rectangles for entity types,
diamonds for relationships, and ellipses for attributes. The double ellipse
indicates unique identifier attributes, and the "n and "1"
indicate the relationship is many to one (each employee works for at most
one department, but many employees may work for the same department).</font></p>
<p><img border="0" src="graphics/halpin1.gif" width="318" height="105"></p>
<p><font size="2"><b>Figure 1</b> <i>An early ER notation used by Chen</i></font></p>
<p><font face="Arial">The direction in which relationship names are to be
read is formally undecided, unless we add some additional marks (e.g.
arrows) or rules (e.g. always read from left to right and from top to
bottom). For example, does the employee work for the department, or does the
department work for the employee? Although we can use our background
knowledge to informally disambiguate this example, it is quite common
nowadays to see ER models with relationships whose intended direction can
only be guessed at by anybody other than the model's creator. For example,
consider the impact of misreading the intended direction for the following:
Person killed Animal; Person is loved by Person. This problem is exacerbated
if the verb phrase used to name the relationship is shortened to one word
(e.g. "work", "love"), unfortunately still a fairly
common practice.</font></p>
<p><font face="Arial">Chen's notation evolved over time. His current
ER-Designer tool uses hexagons instead of diamonds, and supports n-ary
relationships. Outside academia however, Chen's notation seems to be rarely
used nowadays, so I'll say no more about it here. One of the problems with
the ER approach is that there are so many versions of it, with no single
standard. In industrial practice, the most popular versions of ER are the
Oracle and Information Engineering (IE) notations. Another popular data
modeling notation is IDEF1X, but since this is a hybrid of ER and relational
notation, I don't count it as a true ER representative. As discussed in
earlier articles [3], UML class diagrams can be regarded as an extended
version of ER. The rest of this article focuses on basic aspects of the
Oracle notation for ER. Later articles will examine IE and IDEF1X.</font></p>
<p><font face="Arial Black">Oracle ER: the basics</font></p>
<p><font face="Arial">The Oracle ER notation has been used for many years in
CASE tools from Oracle Corporation, and the discussion here is based on the
classic treatment by Richard Barker [1]. Oracle's Object Designer tool now
supports UML as an alternative to its traditional ER notation. For database
applications, many modelers still prefer Oracle's ER notation in preference
to UML, and it will be interesting to see whether this changes over time.
Dave Hay, an experienced modeler and ardent fan of the Oracle ER notation,
argues that "there is no such thing as 'object-oriented analysis'"
[5], only object-oriented design, and that "UML is ?not suitable for
analyzing business requirements in cooperation with business
people"[6].</font></p>
<p><font face="Arial">While I agree with Dave Hay that UML class diagrams
are less than ideal for data modeling, I feel that his preferred ER notation
shares some of UML's weaknesses in being attribute-based. As I've discussed
before in a UML context [3, 4], using attributes in a base conceptual model
adds complexity and instability, while making it harder to validate models
with domain experts using verbalization and sample populations. Attributes
are great for logical design, since they allow compact diagrams that
directly represent the data structures (e.g. relations or object-relations)
used for the actual design. However when I'm performing conceptual analysis,
I just want to know what the facts and rules are about the business, and I
want to communicate this information in sentences, so that the model can be
understood by the domain experts. I sure don't want to bother about how
facts are grouped into multi-fact structures. Whether some fact will end up
in the design as an attribute is not a conceptual issue to me. As Ron Ross
says, "Sponsors of business rule projects must sign off on the
sentences-not on graphical data models. Most methodologies and CASE tools
have this more or less backwards" [6, p.15]. The ORM reporting
facilities in Visio Enterprise allow the domain expert to inspect ORM models
fully verbalized into sentences with examples, making validation much easier
and safer.</font></p>
<p><font face="Arial">Now that I've stated my bias up front, let's examine
the Oracle ER notation itself. The basic conventions are illustrated in
Figure 2. Entity types are shown as soft rectangles (rounded corners) with
their name in capitals. Attributes are written below the entity type name.
Some constraint information may appear before an attribute name. A
"#" indicates that the attribute is the primary identifier of the
entity type, or at least a component of its primary identification scheme. A
"*" or heavy dot "?quot; indicates the attribute is
mandatory (i.e. each instance in the database population of the entity type
must have a non-null value recorded for this attribute). A "?quot;
indicates the attribute is optional. Some modelers also use a period
"." to indicate the attribute is not part of the identifier.</font></p>
<p><img border="0" src="graphics/halpin2.gif" width="280" height="85"></p>
<p><font size="2"><b>Figure 2</b> <i>The basic ER notation used by Oracle</i></font></p>
<p><font face="Arial">Relationships are restricted to binaries (no unaries,
ternaries or longer relationships), and are shown as lines with a
relationship name at the end from which that relationship name is to be
read. This name placement overcomes the ambiguous direction problem
mentioned earlier. Both forward and inverse readings may be displayed for a
binary relationship, one on either side of the line. This makes the notation
superior to UML for verbalizing relationships.</font></p>
<p><font face="Arial">From an ORM perspective, each end (or half) of a
relationship line corresponds to a role. Like ORM, Oracle ER treats role
optionality and cardinality as distinct, orthogonal concepts, instead of
lumping them together into a single concept (e.g. multiplicity in UML). A
solid line-half denotes a mandatory role, and a dotted line-half indicates
an optional role. For cardinality, a fork or crow's foot intuitively
indicates "many", by its many "toes". The absence of a
crow's foot intuitively indicates "one". The crow's foot notation
was invented by Gordon Everest, who originally used the term "inverted
arrow" but now just calls it a "fork". The basic
correspondence with the ORM notation for uniqueness and mandatory role
constraints is shown in Figure 3.</font></p>
<p><img border="0" src="graphics/halpin3.gif" width="290" height="154"></p>
<p><font size="2"><b>Figure 3</b> <i>The ER diagram (a) is equivalent to the
ORM diagram (b)</i></font></p>
<p><font face="Arial">If each of the two roles in a binary association may
be assigned one of optional/mandatory and one of many/one, there are sixteen
patterns. The equivalent Oracle ER and ORM diagrams for the first eight of
these cases are shown in Figure 4.</font></p>
<p><img border="0" src="graphics/halpin4.gif" width="455" height="470"></p>
<p><font size="2"><b>Figure 4</b> <i>Some equivalent cases</i></font></p>
<p><font face="Arial">The other eight cases are shown in Figure 5. Although
all eight are legal in ORM, the last case where both roles of a many:many
relationship are mandatory is considered illegal in Oracle.</font></p>
<p><img border="0" src="graphics/halpin5.gif" width="455" height="478"></p>
<p><font size="2"><b>Figure 5</b> <i>Other equivalent cases</i></font></p>
<p><font face="Arial">Ring associations that are illegal in Oracle are shown
in Figure 6(a). Although rare, they sometimes occur in reality, so should be
allowed at the conceptual level, as permitted in ORM. As an exercise, you
may wish to invent satisfying populations for the ORM associations in Figure
6 (b).</font></p>
<p><img border="0" src="graphics/halpin6.gif" width="454" height="189"></p>
<p><font size="2"><b>Figure 6</b> <i>Illegal ring associations in Oracle (a)
that are rare but allowed in ORM (b)</i></font></p>
<p><font face="Arial">In Oracle ER, a bar "|" across one end of a
relationship indicates that the relationship is a component of the primary
identifier for the entity type at that end. In Figure 7 for example,
Employee and Building have simple identifiers, but Room has a composite
reference scheme, being identified partly by its room number and partly by
the building in which it is included.</font></p>
<p><img border="0" src="graphics/halpin7.gif" width="377" height="85"></p>
<p><font size="2"><b>Figure 7</b> <i>Room is identified by combining its
room nr and its relationship to Building</i></font></p>
<p><font face="Arial">The use of identification bars provides some of the
functionality afforded by external uniqueness constraints in ORM. For
example, the schemas in Figure 8 are equivalent. The other attributes of
Room and Building in ORM would be modeled in ORM as relationships. ORM's
external uniqueness notation seems to me to convey more intuitively the idea
that each RoomNr, Building combination is unique (i.e. refers to at most one
room). But maybe I'm biased. At any rate, this constraint (as well as any
other graphic constraint) can be automatically verbalized in natural
language.</font></p>
<p><img border="0" src="graphics/halpin8.gif" width="463" height="127"></p>
<p><font size="2"><b>Figure 8</b> <i>Composite identification in Oracle ER
(a) and ORM (b)</i></font></p>
<p><font face="Arial">Some people misread the bar notation for composite
identification as a "1", since this is what the symbol means in
many other ER notations. But this isn't a problem if you don't have to work
with multiple versions of ER. The main problem with the bar notation is that
it doesn't generalize, since it can't be used for declaring composite
identification schemes involving only attributes (no relationships). Because
ORM always uses relationships instead of attributes, it doesn't have this
problem. As an example, suppose we wanted to model the information shown in
Table 1, as well as other facts about rooms.</font></p>
<p><font size="2"><span style="mso-bidi-font-family: Arial"><a name="_Ref469051116"><b><span style="mso-bidi-font-family:
Arial">Table <span style="mso-field-code: SEQ Table \* ARABIC \s 1; mso-bidi-font-family: Arial">1</span></span></b></a><span style="mso-tab-count: 2; mso-bidi-font-family: Arial">
</span><span style="mso-bidi-font-family:Arial;font-weight:normal"><i>A
simple data use case for room scheduling</i></span></span></font></p>
<table border="1" width="59%" cellspacing="0" cellpadding="0">
<tr>
<td width="23%" bgcolor="#C0C0C0">Room</td>
<td width="28%" bgcolor="#C0C0C0">Time</td>
<td width="23%" bgcolor="#C0C0C0">ActivityCode</td>
<td width="59%" bgcolor="#C0C0C0">ActivityName</td>
</tr>
<tr>
<td width="23%">20</td>
<td width="28%">Mon 9 am</td>
<td width="23%">VMC</td>
<td width="59%">VisioModeler class</td>
</tr>
<tr>
<td width="23%">20</td>
<td width="28%">Tue 2 pm</td>
<td width="23%">VMC</td>
<td width="59%">VisioModeler class</td>
</tr>
<tr>
<td width="23%">33</td>
<td width="28%">Mon 9 am</td>
<td width="23%">AQD</td>
<td width="59%">ActiveQuery demo</td>
</tr>
<tr>
<td width="23%">33</td>
<td width="28%">Fri 5 pm</td>
<td width="23%">SP</td>
<td width="59%">Staff party</td>
</tr>
<tr>
<td width="23%">...</td>
<td width="28%">...</td>
<td width="23%">...</td>
<td width="59%">...</td>
</tr>
</table>
<p><font face="Arial">The table suggests that rooms can be simply identified
by room numbers, so let's accept that. One way of modeling the situation in
Oracle ER is shown in Figure 9. Here the bar notation is used to show that
RoomTimeSlot is identified by combining its time and room number.</font></p>
<p><img border="0" src="graphics/halpin9.gif" width="307" height="178"></p>
<p><font size="2"><b>Figure 9</b> <i>An ER diagram for room scheduling</i></font></p>
<p><font face="Arial">The use of attributes in this model makes it hard to
verbalize and populate the schema for validation purposes. Moreover, there
is at least one constraint missing. Compare this with the populated ORM
model for the same situation. Here the facts are naturally verbalized as a
ternary (Room at Time is used Activity) and a binary (Activity has
ActivityName). The associated fact tables include the original facts, as
well as counter-facts (italicized) to test the constraints. The first
counter-row (20, Mon 9 am, AQD) tests the uniqueness constraint that a room
at a time is used for at most one activity. The second counter-row tests the
uniqueness constraint that at most one room can be used for a given activity
at a given time. This constraint may well be wrong, but at least we can
express it and test it in ORM. With the ER model there is no way of even
specifying the constraint, much less testing it. The counter rows (SP, Sales
phonecalls) and (PTY, Staff party) are designed to check the uniqueness
constraints that each Activity has at most one Activity name and vice versa.
If these are rejected, the association really is 1:1, as its basic
population suggests. Since the ER notation being discussed doesn't include a
way of indicating that attributes other than the primary identifier are
unique, it isn't very helpful here. As a small point, the Y2K row has been
added to the original population to indicate that it is possible for some
listed activities to be unscheduled.</font></p>
<p><img border="0" src="graphics/halpin10.gif" width="413" height="213"></p>
<p><font size="2"><b>Figure 10</b> <i>An ORM diagram for room scheduling,
with sample and counter data</i></font></p>
<p><font face="Arial">In case it looks like I'm just bashing attribute-based
approaches like ER in this article, let me say again that I find
attribute-based models useful for compact overviews and for getting closer
to the implementation model. However I generate these by mapping from ORM,
which I use exclusively for conceptual analysis. This makes it easier to get
the model right in the first place, and to modify it as the underlying
domain evolves. Unlike ER (and UML for that matter), ORM was built from a
linguistic basis, and its graphic notation was carefully chosen to exploit
the potential of sample populations. To reap the benefits of verbalization
and population for communication with and validation by domain experts, it's
better to use a language that was designed with this in mind. An added
benefit of ORM is that its graphic notation can capture many more business
rules than popular ER notations.</font></p>
<p><font face="Arial Black">Next issues</font></p>
<p><font face="Arial">Later articles in this series will consider more
advanced aspects of the Oracle ER notation, including exclusion constraints,
frequency constraints, subtyping and non-transferable relationships, and
then examine the Information Engineering notation for ER, before concluding
with a discussion of IDEF1X.</font></p>
<p><font face="Arial Black">References</font></p>
<ol>
<li><font face="Arial">Barker, R. 1990, CASE*Method: Tasks and
Deliverables, Addison-Wesley, Wokingham, England.</font></li>
<li><font face="Arial">Chen, P.P. 1976, 'The entity-relationship
model-towards a unified view of data', ACM Transactions on Database
Systems, vol. 1, no. 1, pp. 9-36.</font></li>
<li><font face="Arial">Halpin, T.A. 1998-9, 'UML data models from an ORM
perspective: Parts 1-10', Journal of Conceptual Modeling, InConcept,
Minneapolis USA.</font></li>
<li><font face="Arial">Halpin, T.A. & Bloesch, A.C. 1999, 'Data
modeling in UML and ORM: a comparison', Journal of Database Management,
vol. 10, no. 4, Idea group Publishing Company, Hershey, USA, pp. 4-13.</font></li>
<li><font face="Arial">Hay, D.C. 1999, 'There is no object-oriented
analysis', DataToKnowledge Newsletter, vol. 27, no. 1, Business Rule
Solutions, Inc., Houston TX, USA.</font></li>
<li><font face="Arial">Hay, D.C. 1999, 'Object orientation and information
engineering: UML', The Data Administration Newsletter, no. 9, (June
1999), ed. R.S. Reiner, available online at <a href="http://www.tdan.com">www.tdan.com</a>.</font></li>
<li><font face="Arial">Ross, R.G. 1998, Business Rule Concepts, Business
Rule Solutions, Inc., Houston TX, USA.</font></li>
</ol>
</td>
</tr>
<tr>
<td width="100%" valign="top" style="text-align: justify">
<p><em>Dr Terry Halpin, BSc, DipEd, BA, MLitStud, PhD, is a Program Manager in
Database Modeling for the Enterprise Frameworks and Tools Unit, Microsoft Corporation, USA., Seattle WA, USA.
During a lengthy career as an academic in computer science, he also worked in industry on
database modeling technology and as a data modeling consultant. His recent positions
include head of database research at Asymetrix Corporation, and research director of
InfoModelers Inc., which was acquired by Visio Corporation. For several years, his
research has focused on conceptual modeling and conceptual query technology for
information systems, using a business rules approach. Dr Halpin has presented papers and
tutorials at many international conferences. His doctoral thesis provided the first full
formalization of Object-Role Modeling (ORM/NIAM), and his publications include over
ninety technical papers, as well as four books, including Information Modeling
and Relational Databases (Morgan Kaufmann, 2001).</em></p>
<p><em>Contact Information:</em></p>
<p><em>Dr Terry Halpin
<br>
Program Manager, Database Modeling <br>
Enterprise Framework & Tools Unit, Microsoft Corporation
<br>
One Microsoft Way<br>
Redmond WA 98052-6399 (USA)<br>
<a href="mailto:terryha@microsoft.com">terryha@microsoft.com</a><br>
(425) 705 9190<br>
fax: (425) 936 7329<br>
<a href="http://www.orm.net">http://www.orm.net</a></em></p>
<p>
</BODY>
</HTML>
|