원본출처 : http://www.cidoc.icom.org/model/relational.model/Guide.txt
CIDOC Relational Data Model
A Guide
by Patricia Ann Reed
April 1995
Copyright (C) 1994-1995, International Documentation Committee of
the International Council of Museums (CIDOC)
The CIDOC Data Model may be reproduced and shared without
restriction as long as this copyright notice is retained, except
that it may not be licensed or sold for profit as a portion of any
software product, and it may not be included in or distributed
with commercial products or otherwise distributed by commercial
concerns to their clients or customers without the written
permission of the Chair of CIDOC's Working Group for the
Development and Distribution of the CIDOC Data Model.
This model was developed by volunteer contributors as a public
service, and is furnished without warranty of any kind. Neither
the International Council of Museums, nor its International
Documentation Committee, nor the individual authors, nor any other
institution or individual that has contributed to its development
and documentation warrant this model in any way.
__________________________________________________________________
Table of Contents
Introduction
I. Purpose of a Relational Data Model
II. Logical Data Model - What It Is, What It Isn't
A. Metadata
B. Principles for Creating Metadata
C. Data Model and Database Schema
D. Logical Data Groups (LDGs) and Logical Data Elements
(LDEs)
III. Standards for Defining and Naming Logical Data
A. Defining Logical Data
B. Naming Logical Data
C. Adapting Standards to Local Environments
IV. Data Dictionary Reports
__________________________________________________________________
INTRODUCTION
The CIDOC Data Model Working Group is creating a relational data
model as a prerequisite to recommending a relational data
structure for the interchange of museum information worldwide.
Advances in database technology and processing offer opportunities
for using information flexibly and efficiently when data is
organized and stored in relational structures.
This guide is for those who wish a better understanding of
relational datamodeling - its purpose, its nature, and the
standards used in creating the CIDOC model. The examples used are
found in the CIDOC model reports.
A relational data model defines what the data is rather than how
it is used, because data is used in multiple applications to serve
multiple functions. For example, data is collected about Object,
not Object-on-loan or Object-being-photographed or Object-
acquired-from-donor. Loan, photograph, and acquire are functional
contexts - the settings in which Object information is used. In
relational technology, each automated function uses the same
Object data.
This is a sea change in thinking for many museum professionals
responsible for the management of their collections. If data was
automated in the past, it was stored in flat file structures where
duplicating the data was the only way to automate multiple
functions or activities. Today's technologies, supported by a
well-defined relational data model, offer better solutions.
I. Purpose of a Relational Data Model
Data is the raw material from which information is produced, and
it can be stored on disk, on tape, or in a file drawer (or in a
brain!). Information is data processed and presented in meaningful
form and context.
Data is collected, modeled, and documented to serve functions. In
other words, data must support what is done and provide the
information needed to perform daily tasks and plan for the future.
Data separated into its smallest discrete parts and defined
precisely can be organized in a structure which achieves the
following objectives:
* Eliminate logical data redundancy, thereby reducing
physical data redundancy.
* Ensure consistency of logical data names and definitions
within and across systems and disciplines.
* Enable multiple use of physical databases.
* Enable greater flexibility of data usage.
* Enhance the capability to deliver decision support
information.
* Provide data structures which enable data interchange
across systems and disciplines.
It is the last objective which is the goal of the CIDOC Data Model
Working Group.
II. Logical Data Model - What It Is, What It Isn't
At the highest level of abstraction, there are five big entities
which can be defined and documented:
People Places Things Events Concepts
These five entities and the relationships among them can document
anything in the entire spectrum of human (or inhuman) experience.
This highest-level model is sometimes called a Conceptual Data
Model. It contains major entities, broadly defined and without
attributes or details.
The task of a Logical Data Model is to particularize the
Conceptual Data Model entities and relate them to each other,
creating a data structure which supports the intellectual and
physical worlds in which work is done.
A logical data model does not contain real data. Rather, it
contains the infrastructure into which real data fits. This
section describes the infrastructure and distinguishes it from the
physical database structure.
A. Metadata
Data in a relational data model is called metadata, i.e., data
about data.
Metadata provides
* a commonly understood body of data which can be used in
multiple applications and
* common data structures which users from diverse process
areas can populate with unique data values.
B. Principles for Creating Metadata
When defining metadata, the following principles apply:
* Logical data is defined in the abstract and without
redundancy.
* Logical data is defined independent of, and outside the
context of, functions, processes, and automated
applications.
* Logical data is defined by users from diverse functional
areas who need the same logical data.
* Logical data element names are consistent and meaningful;
they are created according to naming standards. (See
Section III. Defining and Naming Logical Data)
* Composite data is broken down into its smallest meaningful
parts, each of which is defined separately.
C. Data Model and Database Schema
The logical data model contains the characteristics of real data,
whereas a physical database contains real data. The following
comparative table characterizes the differences between metadata
in a relational data model and data descriptions (also called data
schema or record layouts) for the contents of a physical database.
* Relational Data Model:
Logical, abstract in nature.
Contains metadata, i.e., data about data.
Contains information about the attributes of data
entities and the logical relationships among them.
Stable, reusable product; logical data definitions seldom
change; relationships among data entities seldom change.
Logical data is defined and documented independent of,
and outside the context of, functions, processes, and
automated applications.
Logical data is defined without redundancy.
Composite data is broken down and logically defined at
the level of the smallest meaningful part.
* Physical Database:
Physical in nature.
Contains real data.
Contains a body of data facts which are instances, or
occurrences, of logical data entities.
Technologies change; over time, changes in hardware and
software force migrations to new information systems
implementations.
Physical data is stored and used in the context of one or
more automated or manual processes to satisfy a
functional need.
D. Logical Data Groups (LDGs) and Logical Data Elements (LDEs)
The logical data model contains information about two levels of
data: Logical Data Group (LDG) and Logical Data Element (LDE). In
this discussion, the terms "LDG" and "Element" are used. LDGs are
groups of Elements. Elements are the discrete pieces of data which
describe and define entities.
1. LDGs
LDGs are logical groups of data which define and describe
entities. They can be equated roughly to a physical data record,
database schema, or relational table.
In the CIDOC model, LDGs are designated as primary, repetition,
recursion,type, or intersection in the "LDG TYPE" category.
A primary entity is something which is important to an
organization's work, in this case museum work. There are two
questions to ask in determining whether an entity is primary: "Can
it stand alone, or is it merely an attribute?" and "If it can
stand alone, do we want to define its attributes and document it
as a separate entity?"
Some primary entities originally were thought to be attributes of
another entity. These former attributes became primary entities
because they were not intrinsic to the entity itself, and because
users wanted to keep detailed information about them. An example
is STYLE, which originally was considered an attribute of OBJECT.
However, STYLE is not dependent on OBJECT for its existence - it
can stand alone, has attributes of its own, and users want to
describe it in more detail. New technologies make possible this
discrete separation of entities.
Primary entities in the current CIDOC model are ALPHABET, AWARD,
CALENDAR, CLASSIFICATION, COLOR, CONCEPT, EVENT, LANGUAGE,
MATERIAL, METHOD, OBJECT, OCCUPATION, OPUS, PEOPLE-GROUP, PEOPLE-
PERSON, PLACE, ROLE, STYLE, AND TIME-SPAN.
A repetition entity is created when an attribute can occur more
than one time for any given occurrence of an entity. An example is
OBJECT MARK LDG. MARK is an attribute of OBJECT. Because more than
one mark may appear on any given OBJECT, MARK is removed from the
OBJECT LDG and becomes a repetition entity. OBJECT MARK LDG has
its own repetition entity called OBJECT MARK TRANSCRIPTION LDG
because there can be more than one TRANSCRIPTION for any given
MARK. OBJECT MARK TRANSCRIPTION LDG has its own repetition entity
called OBJECT MARK TRSCRPTN TRANSLN LDG because there can be more
than one TRANSLATION of any given TRANSCRIPTION.
A recursion entity is an entity which is related to itself. It is
indicated by the term "RELATED" in the LDG name. PEOPLE RELATED
LDG is an example of a recursion entity, where two instances of
PEOPLE LDG are associated. In PEOPLE RELATED LDG, there are two
occurrences of the Elements PEOPLE OCC IDN and ROLE OCC IDN which
represent either two persons, two groups of persons, or a person
and a group ofpersons; an Element called PEOPLE PEOPLE
RELATIONSHIP NAM which documents the nature of the association
between the two PEOPLE; and Elements documenting the time during
which the relationship occurred.
An intersection entity is created by linking together two or more
primary, repetition, or type entities. Intersection entities are
indicated in the CIDOC model by an ampersand (&). An example is
OBJECT & EVENT LDG, where an OBJECT is associated with an EVENT.
The intersection entity contains Elements which document the
association of the OBJECT and the EVENT, i.e., the relationship
between them and the time during which the relationship occurred.
A type entity is a subset of a primary entity. It has special
attributes which set it apart from the larger entity.
2. Elements
Although "Element" and "attribute" sometimes are used
interchangeably, in the context of this document there is a
difference: "Element" is a data fact logically defined and
contained within an LDG. "Attribute" is an intrinsic
characteristic of an entity.
Elements define the attributes of entities, answering the question
"What is it?" They can be equated roughly to the data fields in a
flat file or the columns in a relational table.
Elements comprise the contents of LDGs. An Element is dependent on
an entity - it cannot exist apart it. In the CIDOC Model, for
example, "OBJECT LDG" contains the Elements "OBJECT OCC IDN",
"OBJECT CNT", and "OBJECT MEDIUM SUPPORT DISPLAY," which describe
OBJECT and cannot exist apart from OBJECT.
Elements defining many of the attributes of entities are
documented in repetition LDGs. For example, MARK is an attribute
of OBJECT, although no Elements describing MARK appear in the
OBJECT LDG. The Elements describing MARK appear in the repetition
entity OBJECT MARK LDG because there can be more than one MARK for
any given OBJECT.
III. Standards for Defining and Naming Logical Data
Using standards to define and name LDGs and Elements assures
consistency and reliability in metadata retrieval and usage. These
standards are for logical, not physical, data. Standards do not
preclude the use of traditional, familiar data names in data entry
screens, forms, reports, and the like.
A. Defining Logical Data
*** Standard: Logical data is defined without reference to and
outside the context of process, function, or physical information
system.
Relational:
OBJECT & EVENT LDG
OBJECT & EVENT LDG
OBJECT & EVENT LDG
Non-Relational:
OBJECT LOANED
OBJECT ACQUIRED
OBJECT CATALOGUED
In the non-relational example above, the words LOANED, ACQUIRED,
and CATALOGUED describe the context in which an OBJECT was used,
and they do not describe intrinsically the OBJECT itself. They are
EVENTs in which an OBJECT participated.
In the relational example, the OBJECT is stored once in an
information system, each EVENT is stored once, and OBJECTs and
EVENTs are linked together when appropriate.
*** Standard: Differences between data elements and data values
are resolved.
Relational:
PEOPLE PERSON LDG
ROLE LDG
Non-Relational:
CALLIGRAPHER
PAINTER
PRINTER
DONOR
The non-relational examples above are typical of data defined in a
flat-file OBJECT record. In the non-relational examples four
pieces of data are defined as roles, and each will be populated
with a person's name. Conceivably, the same person's name could
populate all four of the non-relation data definitions. In
addition, that same person may be logically related to additional
objects.
Relational modeling and technology solve both these anomalies by
separating a person from a role he plays and creating a data group
for each. Once information about a person is stored in a database,
it can be linked to many roles related to the same object, and it
can be linked to many different objects.
Another benefit occurs when a new ROLE is desired: Instead of
defining a new piece of data, one only need add a new data value
to the ROLE database.
*** Standard: An Element appears in one, and only one, LDG. The
exception is a foreign key, which may appear in multiple
intersection LDGs.
Relational:
OBJECT LDG
OBJECT MARK LDG
Non-Relational:
MARK1
MARK2
SIGNATURE
This example was taken from a flat-file OBJECT record. These three
data elements appeared in every OBJECT record, whether they were
populated or not. Accepting that SIGNATURE is a kind of MARK,
there are three MARK data elements in the flat-file OBJECT record.
By removing the MARKs from the OBJECT record and creating a
Repetition Entity called OBJECT MARK LDG, it is now possible to
document an unlimited number of MARKs without defining additional
data elements. Data elements within the OBJECT MARK LDG describe
the MARK fully, eliminating the need for the SIGNATURE data
element in the flat-file structure.
B. Naming Logical Data
Data dictionary names reflect the abstract, process-independent
nature of a relational data model. The following standards for
naming logical data impose a structure which facilitates
understanding a complex set of data requirements.
*** Standard: Nouns are used in singular form.
Relational:
OBJECT LDG
EVENT ACTION LDG
OBJECT MARK LDG
Non-Relational:
OBJECTS LDG
EVENT ACTIONS LDG
OBJECT MARKS LDG
*** Standard: Logical data names are ordered by facet, or
segment, according to the following formula:
PRIMEWORD MODIFIER(S) CLASSWORD/SUFFIX
The facets are separated by a space.
CLASSWORD applies only to Elements, and SUFFIX applies to LDGs.
The purpose of using CLASSWORD and SUFFIX is to indicate
at-a-glance what kind of dictionary entry one sees. The dictionary
can be expanded to document other kinds of information such as
Users, Applications, Systems, and Modules, for which one might
choose suffixes of USE, APP, SYS, and MOD.
Following are standards for each facet of a logical name:
*** Standard: PRIMEWORD represents the name of a primary entity
to which a LDG or Element belongs. It must be the first facet in a
name.
Relational:
OBJECT LDG
OBJECT CONDITION NAM
OBJECT MEASURE LDG
OBJECT MARK OCC IDN
Non-Relational
LDG OBJECT
NAME CONDITION OBJECT
MEASURE OBJECT LDG
IDN OCC OBJECT MARK
*** Standard: MODIFIER qualifies and further defines a LDG or an
Element emanating from a major entity. Ordering of multiple
modifiers is left to right from general to specific.
Examples:
OBJECT LDG
OBJECT MARK LDG
OBJECT MARK
TRANSCRIPTION LDG
OBJECT MARK TRSCRPTN TRANSLN LDG
(TRANSCRIPTION and TRANSLATION abbreviated in the above
example because of software length constraints)
In the above example the placement of modifiers is left to right
from general to specific. OBJECT MARK LDG indicates that MARK is
an attribute of OBJECT; OBJECT MARK TRANSCRIPTION LDG indicates
that TRANSCRIPTION is an attribute of a MARK on an OBJECT; and
OBJECT MARK TRSCRPTN TRANSLN LDG indicates that TRANSLATION is an
attribute of a TRANSCRIPTION of a MARK on an OBJECT.
The LDGs above are examples of the Repetition Entity.
*** Standard: The key identifier of an LDG is indicated by an
Element containing the standard modifier "OCC". The modifier "OCC"
precedes immediately the Element CLASSWORD "IDN" (see CLASSWORDs
below).
Key Identifier in this context is defined as the unique identifier
by which a computer recognizes a unique occurrence of a data
group. The identifier may be machine-generated to guarantee
uniqueness.
Examples:
EVENT OCC IDN
CLASSIFICATION TERM OCC IDN
PLACE ADDRESS OCC IDN
*** Standard: CLASSWORD defines the intrinsic or inherent nature
of an Element. It is the last facet of an Element name.
The following CLASSWORDs are mutually exclusive categories which
define the nature of an Element and answer the question "What is
it?"
* AMT Amount (numeric) Indicates a monetary amount. (How much?)
* CDE Code (alphanumeric) Predefined values which represent
specific names or terms and are formulated by the systematic
use of symbols, letters, or numbers.
Ex: Codes for country names, i.e., UK is a code for the United
Kingdom, FR for France, etc. Codes may be standard, universal, or
specific to a local system. Multiple code sets may exist for the
same entity, as is the case for country names.
* CNT Count (numeric) Indicates a non-monetary numeric quantity
or accumulation. (How many?)
* FLG Flag (alphanumeric) Indicates a binary state or condition
where only two opposite values are possible, and where the
values have no function other than to indicate a described
state or condition. (YES or NO, ON or OFF, IS or IS NOT)
* IDN Identifier (alphanumeric) Non-coded data which identifies
an entity; not necessarily unique. (Ex: Museum catalog number,
donor catalog number, exhibition catalog number, specimen tag
number, and employee number cannot be guaranteed to be unique
within a database.)
* NAM Name (alphanumeric) Alphanumeric data which documents an
appellation, or name, given to a person or organization, place,
thing, event, or concept. May be a single word or a short
phrase; different in nature from "TXT".
* TME Time (alphanumeric) Identifies a duration or period of
time, including dates, or a specific instant in which something
occurs. (When?)
Format is standard ISO (International Organization for
Standardization) format:
YYYYMMDDHHMMSS.SS
YYYY year
MM month
DD day
HH hour
MM minute
SS second
.SS tenths, hundredths of second
* TXT Text (alphanumeric) Textual data which is imprecisely
defined, has an unpredictable structure, and does not fit into
one of the above classifications. Typically consists of notes,
remarks, descriptions, and comments.
The following examples illustrate how CLASSWORD is used in naming
a data element:
Relational:
OBJECT PART CNT
CALENDAR NAM
CONCEPT APPELLATION NAM
PLACE ADDRESS BUILDING IDN
Non-Relational:
NUMBER OF OBJECT PARTS
NAME OF CALENDAR
NAME GIVEN TO CONCEPT
BUILDING NUMBER
*** Standard: The standard SUFFIX for LDGs is "LDG".
Examples:
OBJECT LDG
OBJECT MARK LDG
*** Standard: The ampersand - "&" - is the standard character for
documenting the linking of one LDG with another, indicating
relationships among entities.
Examples:
OBJECT & EVENT LDG
OBJECT NOTE & PEOPLE PERSON LDG
OBJECT & PEOPLE & ROLE LDG
*** Standard: Each facet in a logical data name is spelled in
full. Abbreviations are used when needed to accommodate the
32-character length limit imposed by the current software which
documents the model.
If abbreviations are necessary, begin with the MODIFIER facets,
from specific to general (right to left), when possible. CLASSWORD
and SUFFIX are not abbreviated.
C. Adapting Standards to Local Environments
While reviewing the standards in this document, there are
considerations to keep in mind, especially if information will be
stored in a commercial software package such as a data dictionary
or a CASE (computer assisted software engineering) tool. A few of
these considerations are listed below:
* Some software does not permit spaces to be used between facets
of a name; a dash or underscore may be required.
Examples:
OBJECT & EVENT LDG
OBJECT-&-EVENT-LDG
OBJECT_&_EVENT_LDG
* The software which produced the CIDOC Data Model documentation
accommodates use of the ampersand (&) to link one LDG to
another. Other software products preclude the use of special
characters. Another single character may be substituted, or the
linking character may be omitted altogether.
Examples:
OBJECT & EVENT LDG
OBJECT A EVENT LDG
OBJECT N EVENT LDG
OBJECT EVENT LDG
* Some software packages allow only upper case or only mixed case
alphabetic characters in a dictionary name, while others allow
a choice of upper case, lower case, mixed case, and special
characters including spaces.
* A dictionary name may be limited in length to a specific number
of characters. The software used in the accompanying reports
allows a maximum of 32 characters, thus forcing abbreviations
in complex names. The abbreviations are predetermined to assure
consistency.
* Become familiar with all the features of a software package
before setting standards for its use.
* If multiple software packages are used, consider compatibility.
IV. Data Dictionary Reports
The term data dictionary is used to describe 1) a repository for
the definition of logical metadata and 2) a DBMS-specific
description of a schema, or record layout, for storing physical
data. It is the first definition which documents the CIDOC data
model.
There are three reports comprising the documentation package:
LIST OF ENTITIES BY TYPE, ENTITY CONTENTS REPORT, and USED-BY
DIRECTLY.
The LIST OF ENTITIES alphabetically lists first the Elements and
then the LDGs.
The ENTITY CONTENTS REPORT contains a full description of Elements
and LDGs, entries appearing together in alphabetical order. The
VALUES attribute (field) in an Element entry is intended to
further define logically the Element by providing examples of real
data values which might appear in a physical implementation. The
CONTAINS attribute (field) in an LDG entry lists the Elements
which comprise the LDG. Other fields are self- explanatory.
The USED-BY DIRECTLY lists alphabetically each Element along with
the LDGs in which it is found.
* Pat Reed - Smithsonian Institution, OIT, A&I 2310, MRC 433 *
* Ph:(202)357-4059 Fax:(202)786-2687 Email:preed@sivm.si.edu *