database.sarang.net
UserID
Passwd
Database
ㆍDBMS
MySQL
PostgreSQL
Firebird
Oracle
Informix
Sybase
MS-SQL
DB2
Cache
CUBRID
LDAP
ALTIBASE
Tibero
DB 문서들
스터디
Community
공지사항
자유게시판
구인|구직
DSN 갤러리
도움주신분들
Admin
운영게시판
최근게시물
DBMS Columns 576 게시물 읽기
 News | Q&A | Columns | Tutorials | Devel | Files | Links
No. 576
Datamodeling and Relational Database Design (1)
작성자
정재익(advance)
작성일
2002-09-29 09:17
조회수
5,679

Datamodeling and Relational Database Design

 

원본출처 : http://www.utexas.edu/cc/database/datamodeling/index.html

 

Data Modeling

 

Windows Enterprise Support Database Services provides the following documentation about relational database design, the relational database model, and relational database software.

 

Introduction to Data Modeling

 

A brief overview of developing a conceptual data model as the first step in creating a relational database.

 

Overview of the Relational Model

 

Discusses data structures, relational operators, and normalization.

 

Introduction to Data Modeling

 

This document is an informal introduction to data modeling using the Entity-Relationship (ER) approach. It is intended for someone who is familiar with relational databases but who has no experience in data modeling. The basic techniques described are applicable to the development of microcomputer based relational database applications as well as those who use relational database servers such as MS SQL Server or Oracle.

 

The document is a practical guide, not an academic paper on either relational database design or data modeling. Readers interested in a rigorous treatment of these topics should consult the bibliography.

 

Topics

 

Overview

The Entity-Relationship Model

Data Modeling As Part of Database Design

Identifying Data Objects

Developing the Basic Schema

Refining the Entity-Relationsip Diagram

Primary and Foreign Keys

Adding Attributes to the Model

Generalization Hierarchies

Adding Integrity Rules

Bibliography

 

Overview

 

What is A Data Model

 

A data model is a conceptual representation of the data structures that are required by a database. The data structures include the data objects, the associations between data objects, and the rules which govern operations on the objects. As the name implies, the data model focuses on what data is required and how it should be organized rather than what operations will be performed on the data. To use a common analogy, the data model is equivalent to an architect's building plans.

 

A data model is independent of hardware or software constraints. Rather than try to represent the data as a database would see it, the data model focuses on representing the data as the user sees it in the "real world". It serves as a bridge between the concepts that make up real-world events and processes and the physical representation of those concepts in a database.

 

Methodology

 

There are two major methodologies used to create a data model: the Entity-Relationship (ER) approach and the Object Model. This document uses the Entity-Relationship approach.

 

Data Modeling In the Context of Database Design

Database design is defined as: "design the logical and physical structure of one or more databases to accommodate the information needs of the users in an organization for a defined set of applications". The design process roughly follows five steps:

 

1. planning and analysis

2. conceptual design

3. logical design

4. physical design

5. implementation

 

The data model is one part of the conceptual design process. The other, typically is the functional model. The data model focuses on what data should be stored in the database while the functional model deals with how the data is processed. To put this in the context of the relational database, the data model is used to design the relational tables. The functional model is used to design the queries which will access and perform operations on those tables.

 

Components of A Data Model

 

The data model gets its inputs from the planning and analysis stage. Here the modeler, along with analysts, collects information about the requirements of the database by reviewing existing documentation and interviewing end-users.

 

The data model has two outputs. The first is an entity-relationship diagram which represents the data strucures in a pictorial form. Because the diagram is easily learned, it is valuable tool to communicate the model to the end-user. The second component is a data document. This a document that describes in detail the data objects, relationships, and rules required by the database. The dictionary provides the detail required by the database developer to construct the physical database.

 

Why is Data Modeling Important?

 

Data modeling is probably the most labor intensive and time consuming part of the development process. Why bother especially if you are pressed for time? A common response by practitioners who write on the subject is that you should no more build a database without a model than you should build a house without blueprints.

 

The goal of the data model is to make sure that the all data objects required by the database are completely and accurately represented. Because the data model uses easily understood notations and natural language , it can be reviewed and verified as correct by the end-users.

 

The data model is also detailed enough to be used by the database developers to use as a "blueprint" for building the physical database. The information contained in the data model will be used to define the relational tables, primary and foreign keys, stored procedures, and triggers. A poorly designed database will require more time in the long-term. Without careful planning you may create a database that omits data required to create critical reports, produces results that are incorrect or inconsistent, and is unable to accommodate changes in the user's requirements.

 

Summary

 

A data model is a plan for building a database. To be effective, it must be simple enough to communicate to the end user the data structure required by the database yet detailed enough for the database design to use to create the physical structure.

 

The Entity-Relation Model (ER) is the most common method used to build data models for relational databases. The next section provides a brief introduction to the concepts used by the ER Model.

 

E-R Model

 

The Entity-Relationship Model

 

The Entity-Relationship (ER) model was originally proposed by Peter in 1976 [Chen76] as a way to unify the network and relational database views. Simply stated the ER model is a conceptual data model that views the real world as entities and relationships. A basic component of the model is the Entity-Relationship diagram which is used to visually represents data objects. Since Chen wrote his paper the model has been extended and today it is commonly used for database design For the database designer, the utility of the ER model is:

 

it maps well to the relational model. The constructs used in the ER model can easily be transformed into relational tables.

it is simple and easy to understand with a minimum of training. Therefore, the model can be used by the database designer to communicate the design to the end user.

In addition, the model can be used as a design plan by the database developer to implement a data model in a specific database management software.

 

Basic Constructs of E-R Modeling

 

The ER model views the real world as a construct of entities and association between entities.

 

Entities

 

Entities are the principal data object about which information is to be collected. Entities are usually recognizable concepts, either concrete or abstract, such as person, places, things, or events which have relevance to the database. Some specific examples of entities are EMPLOYEES, PROJECTS, INVOICES. An entity is analogous to a table in the relational model.

 

Entities are classified as independent or dependent (in some methodologies, the terms used are strong and weak, respectively). An independent entity is one that does not rely on another for identification. A dependent entity is one that relies on another for identification.

 

An entity occurrence (also called an instance) is an individual occurrence of an entity. An occurrence is analogous to a row in the relational table.

 

Special Entity Types

 

Associative entities (also known as intersection entities) are entities used to associate two or more entities in order to reconcile a many-to-many relationship.

 

Subtypes entities are used in generalization hierarchies to represent a subset of instances of their parent entity, called the supertype, but which have attributes or relationships that apply only to the subset.

 

Associative entities and generalization hierarchies are discussed in more detail below.

 

Relationships

 

A Relationship represents an association between two or more entities. An example of a relationship would be:

        employees are assigned to projects

        projects have subtasks 

        departments manage one or more projects 

Relationships are classified in terms of degree, connectivity, cardinality, and existence. These concepts will be discussed below.

 

Attributes

 

Attributes describe the entity of which they are associated. A particular instance of an attribute is a value. For example, "Jane R. Hathaway" is one value of the attribute Name. The domainof an attribute is the collection of all possible values an attribute can have. The domain of Name is a character string.

 

Attributes can be classified as identifiers or descriptors. Identifiers, more commonly called keys, uniquely identify an instance of an entity. A descriptor describes a non-unique characteristic of an entity instance.

 

Classifying Relationships

 

Relationships are classified by their degree, connectivity, cardinality, direction, type, and existence. Not all modeling methodologies use all these classifications.

 

Degree of a Relationship

 

The degree of a relationship is the number of entities associated with the relationship. The n-ary relationship is the general form for degree n. Special cases are the binary, and ternary ,where the degree is 2, and 3, respectively.

 

Binary relationships, the association between two entities is the most common type in the real world. A recursive binary relationship occurs when an entity is related to itself. An example might be "some employees are married to other employees".

 

A ternary relationship involves three entities and is used when a binary relationship is inadequate. Many modeling approaches recognize only binary relationships. Ternary or n-ary relationships are decomposed into two or more binary relationships.

 

Connectivity and Cardinality

 

The connectivity of a relationship describes the mapping of associated entity instances in the relationship. The values of connectivity are "one" or "many". The cardinality of a relationship is the actual number of related occurences for each of the two entities. The basic types of connectivity for relations are: one-to-one, one-to-many, and many-to-many.

 

A one-to-one (1:1) relationship is when at most one instance of a entity A is associated with one instance of entity B. For example, "employees in the company are each assigned their own office. For each employee there exists a unique office and for each office there exists a unique employee.

 

A one-to-many (1:N) relationships is when for one instance of entity A, there are zero, one, or many instances of entity B, but for one instance of entity B, there is only one instance of entity A. An example of a 1:N relationships is

      a department has many employees 

      each employee is assigned to one department 

A many-to-many (M:N) relationship, sometimes called non-specific, is when for one instance of entity A, there are zero, one, or many instances of entity B and for one instance of entity B there are zero, one, or many instances of entity A. An example is:

       employees can be assigned to no more than two projects at the same time; 

       projects must have assigned at least three employees 

A single employee can be assigned to many projects; conversely, a single project can have assigned to it many employee. Here the cardinality for the relationship between employees and projects is two and the cardinality between project and employee is three. Many-to-many relationships cannot be directly translated to relational tables but instead must be transformed into two or more one-to-many relationships using associative entities.

 

Direction

 

The direction of a relationship indicates the originating entity of a binary relationship. The entity from which a relationship originates is the parent entity; the entity where the relationship terminates is the child entity.

 

The direction of a relationship is determined by its connectivity. In a one-to-one relationship the direction is from the independent entity to a dependent entity. If both entities are independent, the direction is arbitrary. With one-to-many relationships, the entity occurring once is the parent. The direction of many-to-many relationships is arbitrary.

 

Type

 

An identifying relationship is one in which one of the child entities is also a dependent entity. A non-identifying relationship is one in which both entities are independent.

 

Existence

 

Existence denotes whether the existence of an entity instance is dependent upon the existence of another, related, entity instance. The existence of an entity in a relationship is defined as either mandatory or optional. If an instance of an entity must always occur for an entity to be included in a relationship, then it is mandatory. An example of mandatory existence is the statement "every project must be managed by a single department". If the instance of the entity is not required, it is optional. An example of optional existence is the statement, "employees may be assigned to work on projects".

 

Generalization Hierarchies

 

A generalization hierarchy is a form of abstraction that specifies that two or more entities that share common attributes can be generalized into a higher level entity type called a supertype or generic entity. The lower-level of entities become the subtype, or categories, to the supertype. Subtypes are dependent entities.

 

Generalization occurs when two or more entities represent categories of the same real-world object. For example, Wages_Employees and Classified_Employees represent categories of the same entity, Employees. In this example, Employees would be the supertype; Wages_Employees and Classified_Employees would be the subtypes.

 

Subtypes can be either mutually exclusive (disjoint) or overlapping (inclusive). A mutually exclusive category is when an entity instance can be in only one category. The above example is a mutually exclusive category. An employee can either be wages or classified but not both. An overlapping category is when an entity instance may be in two or more subtypes. An example would be a person who works for a university could also be a student at that same university. The completeness constraint requires that all instances of the subtype be represented in the supertype.

 

Generalization hierarchies can be nested. That is, a subtype of one hierarchy can be a supertype of another. The level of nesting is limited only by the constraint of simplicity. Subtype entities may be the parent entity in a relationship but not the child.

 

ER Notation

 

There is no standard for representing data objects in ER diagrams. Each modeling methodology uses its own notation. The original notation used by Chen is widely used in academics texts and journals but rarely seen in either CASE tools or publications by non-academics. Today, there are a number of notations used, among the more common are Bachman, crow's foot, and IDEFIX.

 

All notational styles represent entities as rectangular boxes and relationships as lines connecting boxes. Each style uses a special set of symbols to represent the cardinality of a connection. The notation used in this document is from Martin. The symbols used for the basic ER constructs are:

 

entities are represented by labeled rectangles. The label is the name of the entity. Entity names should be singular nouns.

relationships are represented by a solid line connecting two entities. The name of the relationship is written above the line. Relationship names should be verbs.

attributes, when included, are listed inside the entity rectangle. Attributes which are identifiers are underlined. Attribute names should be singular nouns.

cardinality of many is represented by a line ending in a crow's foot. If the crow's foot is omitted, the cardinality is one.

existence is represented by placing a circle or a perpendicular bar on the line. Mandatory existence is shown by the bar (looks like a 1) next to the entity for an instance is required. Optional existence is shown by placing a circle next to the entity that is optional.

Examples of these symbols are shown in Figure 1 below:

 

Figure 1: ER Notation

 

 

Summary

 

The Entity-Relationship Model is a conceptual data model that views the real world as consisting of entities and relationships. The model visually represents these concepts by the Entity-Relationship diagram. The basic constructs of the ER model are entities, relationships, and attributes. Entities are concepts, real or abstract, about which information is collected. Relationships are associations between the entities. Attributes are properties which describe the entities. Next, we will look at the role of data modeling in the overall database design process and a method for building the data model. To proceed, see Data Modeling As Part of Database Design.

 

Database Design

 

Data Modeling As Part of Database Design

 

The data model is one part of the conceptual design process. The other is the function model. The data model focuses on what data should be stored in the database while the function model deals with how the data is processed. To put this in the context of the relational database, the data model is used to design the relational tables. The functional model is used to design the queries that will access and perform operations on those tables.

 

Data modeling is preceeded by planning and analysis. The effort devoted to this stage is proportional to the scope of the database. The planning and analysis of a database intended to serve the needs of an enterprise will require more effort than one intended to serve a small workgroup.

 

The information needed to build a data model is gathered during the requirments analysis. Although not formally considered part of the data modeling stage by some methodologies, in reality the requirements analysis and the ER diagramming part of the data model are done at the same time.

 

Requirements Analysis

 

The goals of the requirements analysis are:

 

. to determine the data requirements of the database in terms of primitive objects

. to classify and describe the information about these objects

. to identify and classify the relationships among the objects

. to determine the types of transactions that will be executed on the database and the interactions between the data and the transactions

. to identify rules governing the integrity of the data

 

The modeler, or modelers, works with the end users of an organization to determine the data requirements of the database. Information needed for the requirements analysis can be gathered in several ways:

 

. review of existing documents - such documents include existing forms and reports, written guidelines, job descriptions, personal narratives, and memoranda. Paper documentation is a good way to become familiar with the organization or activity you need to model.

. interviews with end users - these can be a combination of individual or group meetings. Try to keep group sessions to under five or six people. If possible, try to have everyone with the same function in one meeting. Use a blackboard, flip charts, or overhead transparencies to record information gathered from the interviews.

. review of existing automated systems - if the organization already has an automated system, review the system design specifications and documentation

 

The requirements analysis is usually done at the same time as the data modeling. As information is collected, data objects are identified and classified as either entities, attributes, or relationship; assigned names; and, defined using terms familiar to the end-users. The objects are then modeled and analysed using an ER diagram. The diagram can be reviewed by the modeler and the end-users to determine its completeness and accuracy. If the model is not correct, it is modified, which sometimes requires additional information to be collected. The review and edit cycle continues until the model is certified as correct.

 

Three points to keep in mind during the requirements analysis are:

 

1. Talk to the end users about their data in "real-world" terms. Users do not think in terms of entities, attributes, and relationships but about the actual people, things, and activities they deal with daily.

2. Take the time to learn the basics about the organization and its activities that you want to model. Having an understanding about the processes will make it easier to build the model.

3. End-users typically think about and view data in different ways according to their function within an organization. Therefore, it is important to interview the largest number of people that time permits.

 

Steps In Building the Data Model

 

While ER model lists and defines the constructs required to build a data model, there is no standard process for doing so. Some methodologies, such as IDEFIX, specify a bottom-up development process were the model is built in stages. Typically, the entities and relationships are modeled first, followed by key attributes, and then the model is finished by adding non-key attributes. Other experts argue that in practice, using a phased approach is impractical because it requires too many meetings with the end-users. The sequence used for this document are:

 

1. Identification of data objects and relationships

2. Drafting the initial ER diagram with entities and relationships

3. Refining the ER diagram

4. Add key attributes to the diagram

5. Adding non-key attributes

6. Diagramming Generalization Hierarchies

7. Validating the model through normalization

8. Adding business and integrity rules to the Model

 

In practice, model building is not a strict linear process. As noted above, the requirements analysis and the draft of the initial ER diagram often occur simultaneously. Refining and validating the diagram may uncover problems or missing information which require more information gathering and analysis

 

Summary

 

Data modeling must be preceded by planning and analysis. Planning defines the goals of the database , explains why the goals are important, and sets out the path by which the goals will be reached. Analysis involves determining the requirements of the database. This is typically done by examining existing documentation and interviewing users.

 

An effective data model completely and accurately represents the data requirements of the end users. It is simple enough to be understood by the end user yet detailed enough to be used by a database designer to build the database. The model eliminates redundant data, it is independent of any hardware and software constraints, and can be adapted to changing requirements with a minimum of effort.

 

Data modeling is a bottom up process. A basic model, representing entities and relationships, is developed first. Then detail is added to the model by including information about attributes and business rules.

[Top]
No.
제목
작성자
작성일
조회
579Datamodeling and Relational Database Design (4)
정재익
2002-09-29
4705
578Datamodeling and Relational Database Design (3)
정재익
2002-09-29
5567
577Datamodeling and Relational Database Design (2)
정재익
2002-09-29
4594
576Datamodeling and Relational Database Design (1)
정재익
2002-09-29
5679
570정보처리기술의 페러다임 - Data WareHousing 과 OLAP (5)
정재익
2002-09-22
5716
569정보처리기술의 페러다임 - Data WareHousing 과 OLAP (4)
정재익
2002-09-22
5735
568정보처리기술의 페러다임 - Data WareHousing 과 OLAP (3)
정재익
2002-09-22
5993
Valid XHTML 1.0!
All about the DATABASE... Copyleft 1999-2023 DSN, All rights reserved.
작업시간: 0.049초, 이곳 서비스는
	PostgreSQL v16.1로 자료를 관리합니다