Data Modeling Essentials 2005 phần 7 potx

10.15 Comparison with the Process Model One of the best means of verifying a data model is to ensure that it includes all the necessary data to support the process model. This is particularly effective if the process model has been developed relatively independently, as it makes available a second set of analysis results as a cross-check. (This is not an argument in favor of data and process modelers working separately; if they work effectively together, the verification will take place progressively as the two models are developed.) There will be little value in checking against the process model if an extreme form of data-driven approach has been taken and processes have been mechanically derived from the data model. There are a number of formal techniques for mapping process models against data models to ensure consistency. They include matrices of processes mapped against entity classes, entity life cycles, and state transition diagrams. Remember however, that the final database may be required to support processes as yet undefined and, hence, not included in the process model. Support for the process model is therefore a necessary but not sufficient criterion for accepting a data model. 10.16 Testing the Model with Sample Data If sample data is available, there are few better ways of communicating and verifying a data model than to work through where each data item would 308 ■ Chapter 10 Conceptual Data Modeling Figure 10.24 A typical guide to notations used in a data model. Mandatory relationship Optional relationship Entity Measurement Unit Claim Payment Claim Payment/ Recovery Type Claim Payment Type Claim Recovery Type Subtype (inner box) inheriting attributes & relationships from supertype (outer box) Simsion-Witt_10 10/11/04 8:49 PM Page 308 be held. The approach is particularly appropriate when the data model represents a new and unfamiliar way of organizing data: fitting some existing data to the new model will provide a bridge for understanding, and may turn up some problems or oversights. We recall a statistical analysis system that needed to be able to cope with a range of inputs in different formats. The model was necessarily highly generalized and largely the work of one specialist modeler. Other partici- pants in its development were at least a little uncomfortable with it. Half an hour walking through the model with some typical inputs was far more effective in communicating and verifying the design than the many hours previously spent on argument at a more abstract level (and it revealed areas needing more work). 10.17 Prototypes An excellent way of testing a sophisticated model, or part of a model, is to build a simple prototype. Useful results can often be achieved in a few days, and the exercise can be particularly valuable in winning support and input from process modelers, especially if they have the job of building the prototype. One of the most sophisticated (and successful) models in which we have been involved was to support a product management database and associated transaction processing. The success of the project owed much to the early production of a simple PC prototype, prior to the major task of developing a system to support fifteen million accounts. A similar design, which was not prototyped, failed at a competitor organization, arguably because of a lack of belief in its workability. 10.18 The Assertions Approach In this section, we look at a rigorous technique for reviewing the detail of data models by presenting them as a list of plain language assertions. In Section 3.5, we saw that if we named a relationship according to some simple rules, we could automatically generate a plain language statement that fully described the relationship, including its cardinality and optionality, and, indeed, some CASE products provide this facility. The technique described here extends the idea to cover the entire data model diagram. It relies on sticking to some fairly simple naming conventions, consistent with those we have used throughout this book. Its great strength is that it presents the entire model diagram in a nondiagrammatic linear form, which does not require any special knowledge to navigate or interpret. We have settled, after some experimentation, on a single 10.18 The Assertions Approach ■ 309 Simsion-Witt_10 10/11/04 8:49 PM Page 309 numbered list of assertions with a check box against each in which reviewers can indicate that they agree with, disagree with, or do not understand the assertion. The assertions cover the following metadata: 1. Entity classes, each of which may be a subtype of another entity class 2. Relationships with cardinality and optionality at each end (the technique is an extension of that described in Section 3.5) 3. Attributes of entity classes (and possibly relationships), which may be marked as mandatory or optional (and possibly multivalued) 4. Intersection entity classes implementing binary “many-to-many” relationships or n-ary relationships 5. Uniqueness constraints on individual attributes or subsets of the attributes and relationships associated with an entity class 6. Other constraints. 10.18.1 Naming Conventions In order to be able to generate grammatically sensible assertions, we have to take care in naming the various components of the model. If you are following the conventions that we recommend, the following rules should be familiar to you: ■ Entity class names must be singular and noncollective, (e.g., Employee or Employee Transaction but not Employees, Employee Table, nor Employee History). ■ Entity class definitions must be singular and noncollective, (e.g., for an entity class named Injury Nature, “a type of injury that can be incurred by a worker,” not “a reference list of the injuries that can be incurred by a worker,” nor “injuries sustained by a worker”). They should also be indefinite, (i.e., commencing with “a” or “an” rather than “the”hence “a type of injury incurred by a worker” rather than “the type of injury incurred by a worker”). ■ Relationship names must be in infinitive form, (e.g., “deliver” rather than “delivers” or “deliverer” and “be delivered by” rather than “is delivered by” or “delivery”). There is an alternative set of assertion forms to support attributes of relationships; if this is used, alternative relationship names must also be provided in the 3rd person singular form (“delivers,” “is delivered by”). ■ Attribute definitions must refer to a single instance, (e.g., for an attribute named Total Price, “the price paid inclusive of tax” not “the prices paid 310 ■ Chapter 10 Conceptual Data Modeling Simsion-Witt_10 10/11/04 8:49 PM Page 310 inclusive of tax”). They should also be definite, (i.e., commencing with “the” rather than “a” or “an” hence “the price paid inclusive of tax” rather than “a price paid inclusive of tax”). ■ Attribute and entity class constraints must start with “must” or “must not” and any other data item referred to should also be qualified so as to make clear precisely which instance of that data item we are referring to, (e.g., “[End Date] must not be earlier than the corresponding Start Date” rather than “must not be earlier than Start Date”). 10.18.2 Rules for Generating Assertions In the assertion templates that follow: 1. The symbols < and > are used to denote placeholders for which the nominated metadata items can be substituted. 2. The symbols { and } are used to denote sets of alternative wordings separated by the | symbol, (e.g., {A|An} indicates that either “A” or “An” may be used). Which alternative is used may depend on: a. The context, (e.g., “A” or “An”is chosen to correspond to the name that follows). b. A property of the component being described, (e.g., “must” or “may” is chosen depending on the optionality of the relationship being described). The examples should make these conventions clear. 10.18.2.1 Entity Class Assertions For each entity class, we can make an assertion of the form: “{A|An} <Entity Class Name> is <Entity Class Definition>.” (e.g., “A Student is an individual person who has enrolled in a course at Smith College.”) For each entity class that is marked as a subtype (subclass) of another entity class, we can make an assertion of the form: “{A|An} <Entity Class Name> is a type of <Superclass Name>, namely <Entity Class Definition>.” (e.g., “A Distance Learning Student is a type of Student, namely a student who does not attend classes in person but who uses the distance learning facilities provided by Smith College.”) 10.18 The Assertions Approach ■ 311 Simsion-Witt_10 10/11/04 8:49 PM Page 311 10.18.2.2 Relationship Assertions For each relationship, we can make an assertion of the form: “Each <Entity Class 1 Name> {must|may} <Relationship Name> {just one <Entity Class 2 Name>|one or more <Entity Class 2 Plural Name>} that {may|must not} 6 change over time.” (e.g., “Each Professor may teach one or more Classes that may change over time.”) For recursive relationships, however, this assertion type reads better if worded as follows “Each <Entity Class 1 Name> {must|may} <Relationship Name> {just one other <Entity Class 2 Name>|one or more other <Entity Class 2 Plural Name>} that {may|must not} change over time.” (e.g., “Each Employee may report to just one other Employee.”) We found in practice that the form of this assertion for optional relationships (i.e., with “may” before the relationship name) was not strong enough to alert reviewers who required that the relationship be mandatory, so an additional assertion was added for each optional relationship: “Not every <Entity Class 1 Name> has to <Relationship Name> {{a|an} <Entity Class 2 Name>|<Entity Class 2 Plural Name>}.” (nonrecursive) or “Not every <Entity Class 1 Name> has to <Relationship Name> {another <Entity Class 2 Name>|other <Entity Class 2 Plural Name>}.” (recursive) (e.g., “Not every Organization Unit has to consist of other Organization Units.”) We have also found that those relationships that are marked as optional solely to cope with population of one entity class occurring before the other (e.g., a new organization unit is created before employees are reassigned to that organization unit) require an additional assertion of the form: “Each <Entity Class 1 Name> should ultimately <Relationship Name> {{a|an} <Entity Class 2 Name>|<Entity Class 2 Plural Name>}.” (e.g., “Each Organization Unit should ultimately be assigned Employees.”) 312 ■ Chapter 10 Conceptual Data Modeling 6 Depending on whether the relationship is transferable or non-transferable. Simsion-Witt_10 10/11/04 8:49 PM Page 312 10.18.2.3. Attribute Assertions For each single-valued attribute of an entity class, we can make assertions 7 of the form: “Each <Entity Class Name> {must|may} have {a|an} <Attribute Name> which is <Attribute Definition>. No <Entity Class Name> may have more than one <Attribute Name>.” (e.g., “Each Student must have a Home Address, which is the address at which the student normally resides during vacations. No Student may have more than one Home Address.”) Note that the must/may choice is based on whether the attribute is marked as optional. Again, the “may” form of this assertion is not strong enough to alert reviewers who required that the attribute be mandatory, so we added for each optional attribute: “Not every <Entity class Name> has to have {a|an} <Attribute Name>.” (e.g., “Not every Service Provider has to have a Contact E-mail Address.”) This particular type of assertion highlights the importance of precise assertion wording. Originally this assertion type read: “{A|An} <Entity Class Name> does not have to have {a|an} <Attribute Name>.” (e.g., “A Service Provider does not have to have a Contact E-mail Address.”) However, that led to one reviewer commenting, “Yes they do have to have one in case they advise us of it.” Clearly that form of wording allowed for confusion between provision of an attribute for an entity class and population of that attribute. If the model includes multivalued attributes, then for each such attribute we can make assertions 8 of the form: “Each <Entity Class Name> {must|may} have <Attribute Plural Name> which are <Attribute Definition>. {A|An} <Entity Class Name> may have more than one <Attribute Name>.” (e.g., “Each Flight may have Operating Days, which are the days on which that flight operates. Each Flight may have more than one Operating Day.”) 10.18 The Assertions Approach ■ 313 7 These are not alternatives; both assertions must be made. 8 Again these are not alternatives; both assertions must be made. Simsion-Witt_10 10/11/04 8:49 PM Page 313 If the model includes attributes of relationships, then for each single- valued attribute of a relationship, we can make assertions of the form: “Each combination of <Entity Class 1 Name> and <Entity Class 2 Name> {must|may} have {a|an} <Attribute Name> which is <Attribute Definition>. No combination of <Entity Class 1 Name> and <Entity Class 2 Name> may have more than one <Attribute Name>.” (e.g., “Each combination of Student and Course must have an Enrollment Date, which is the date on which the student enrolls in the course. No combination of Student and Course may have more than one Enrollment Date.”) Similarly, if the model includes multivalued attributes as well as attributes of relationships, then for each such attribute, we can make assertions 9 of the form: “Each combination of <Entity Class 1 Name> and <Entity Class 2 Name> {must|may} have <Attribute Plural Name> which are <Attribute Definition>. A combination of <Entity Class 1 Name> and <Entity Class 2 Name> may have more than one <Attribute Name>.” (e.g., “Each combination of Student and Course may have Assignment Scores which are the scores achieved by that student for the assignments performed on that course. A combination of Student and Course may have more than one Assignment Score.”) All assertions about relationships we have previously described relied on the relationship being named in each direction using the infinitive form (the form that is grammatically correct after “may” or “must”); if a 3rd person singular form (“is” rather than “be,” “reports to” rather than “report to”) of the name of each relationship with attributes is also recorded, alternative assertion forms are possible. If the attribute is single-valued: “Each <Entity Class 1 Name> that <Relationship Alternative Name> {a|an} <Entity Class 2 Name> {must|may} have {a|an} <Attribute Name> which is <Attribute Definition>. No <Entity Class 1 Name> that <Relationship Alternative Name> {a|an} <Entity Class 2 Name> may have more than one <Attribute Name> for that <Entity Class 2 Name>.” 314 ■ Chapter 10 Conceptual Data Modeling 9 Again these are not alternatives; both assertions must be made. Simsion-Witt_10 10/11/04 8:49 PM Page 314 (e.g., “Each Student that enrolls in a Course must have an Enrollment Date, which is the date on which the student enrolls in the course. No Student that enrolls in a Course may have more than one Enrollment Date for that Course.”) If the attribute is multivalued: “Each <Entity Class 1 Name> that <Relationship Alternative Name> {a|an} <Entity Class 2 Name> {must|may} have <Attribute Plural Name> which are <Attribute Definition>. A <Entity Class 1 Name> that <Relationship Alternative Name> {a|an} <Entity Class 2 Name> may have more than one <Attribute Name> for that <Entity Class 2 Name>.” (e.g., “Each Student that enrolls in a Course may have Assignment Scores, which are the scores achieved by that student for the assignments performed on that course. Each Student that enrolls in a Course may have more than one Assignment Score for that Course.”) Note that each derived attribute should include in its <Attribute Definition> the calculation or derivation rules for that attribute. If the model includes the attribute type of each attribute (see Section 5.4), then for attribute of an entity class we can make an assertion of the form: “The <Attribute Name> of {a|an} <Entity Class Name> is (and exhibits the properties of) {a|an} <Attribute Type Name>.” (e.g., “The Departure Time of a Flight is (and exhibits the properties of) a TimeOfDay.”) The document containing the assertions should then contain in its front- matter a list of all attribute types used and their properties. If these are negotiable with stakeholders they should be included as assertions, (i.e., each should be given a number and a check box). 10.18.2.4. Intersection Assertions There are three types of intersection entity class to consider: 1. Those implementing a binary many-to-many relationship for which only one combination of each pair of instances is allowed (i.e., if implemented in a relational database, the primary key would consist only of the foreign keys of the tables representing the two associated entity classes). The classic example is Enrollment where each Student may only enroll once in each Course. 10.18 The Assertions Approach ■ 315 Simsion-Witt_10 10/11/04 8:49 PM Page 315 2. Those implementing a binary many-to-many relationship for which more than one combination of each pair of instances is allowed (i.e., if implemented in a relational database the primary key would consist not only of the foreign keys of the tables representing the two associated entity classes, but also an additional attribute, usually a date). The classic example is Enrollment where a Student may enroll more than once in each Course. 3. Those implementing an n-ary relationship. For each attribute of an intersection entity class of the first type, we can make assertions 10 of the form: “There can only be one <Data Item Name> for each combination of <Associated Entity Class 1 Name> and <Associated Entity Class 2 Name>. For any particular <Associated Entity Class 1 Name> a different <Data Item Name> can occur for each <Associated Entity Class 2 Name>. For any particular <Associated Entity Class 2 Name> a different <Data Item Name> can occur for each <Associated Entity Class 1 Name>.” (e.g., “There can only be one Conversion Factor for each combination of Input Measurement Unit and Output Measurement Unit. For any particular Input Measurement Unit a different Conversion Factor can occur for each Output Measurement Unit. For any particular Output Measurement Unit a different Conversion Factor can occur for each Input Measurement Unit.”) Note that <Data Item Name> can be: 1. An attribute name 2. The name of an entity class associated with the intersection entity class via a nonidentifying relationship. 11 For each attribute of an intersection entity class of the second or third type, we can make assertions 12 of the form: “There can only be one <Data Item Name> for each combination of <Identifier Component 1 Name>, <Identifier Component 2 Name>, . . . and <Identifier Component n Name>. 316 ■ Chapter 10 Conceptual Data Modeling 10 Again, these are not alternatives; all assertions must be made. 11 For example the intersection entity class Enrollment may have identifying relationships to Student and Course but a nonidentifying relationship to Payment Method and attributes of Enrollment Date and Payment Date. <Data Item Name> can refer to any of those last three. 12 Again these are not alternatives; all assertions must be made. Simsion-Witt_10 10/11/04 8:49 PM Page 316 For any particular combination of <Identifier Component 1 Name> . . . and <Identifier Component n-1 Name> a different <Data Item Name> can occur for each <Identifier Component m Name>.” Note that: 1. There is an <Identifier Component Name> for each part of the identifier of the intersection entity class, and it is expressed as one of: a. The name of an entity class associated with the intersection entity class via an identifying relationship b. The name of the attribute included in the identifier of the intersection entity class. 2. An assertion of the second form above must be produced for each identifier component of each intersection entity class, in which the name of that identifier component is substituted for <Identifier Component m Name>, and all other identifier components appear in the list following “combination of.” Thus, in the case of Enrollment where a Student may enroll more than once in each Course: “There can only be one Achievement Score for each combination of Student, Course, and Enrollment Date. For any particular combination of Course and Enrollment Date, a different Achievement Score can occur for each Student. For any particular combination of Student and Enrollment Date, a different Achievement Score can occur for each Course. For any particular combination of Student and Course, a different Achievement Score can occur for each Enrollment Date.” 10.18.2.5. Constraint Assertions For each attribute of an entity class on which there is a uniqueness constraint, we can make an assertion of the form: “No two <Entity Class Plural Name> can have the same <Attribute Name>.” (e.g., “No two Students can have the same Student Number.”) For each set of data items of an entity class on which there is a uniqueness constraint, we can make an assertion of the form: “No two <Entity Class Plural Name> can have the same combination of <Data Item 1 Name>, <Data Item 2 Name>, . . . and <Data Item n Name>.” 10.18 The Assertions Approach ■ 317 Simsion-Witt_10 10/11/04 8:49 PM Page 317 [...]... Exclusion of Entity Classes from the Database In some circumstances an entity class may have been included in the conceptual data model to provide context, and there is no actual requirement for that application to maintain data corresponding to that entity class It is also possible that the data is to be held in some medium other than the relational database: nondatabase files, XML streams, and so... substitute for careful modeling of subtypes and supertypes, and to consider the appropriate level for implementation Identification of useful data 334 ■ Chapter 11 Logical Database Design classifications is part of the data modeling process, not something that should be left to some later task of view definition If subtypes and supertypes are not recognized in the conceptual modeling stage, we cannot... column 11.4.3 Derivable Attributes Since the logical data model should not specify redundant data, derivable attributes in the conceptual data model should not become columns in the logical data model However, the designer of the physical data model needs to be advised of derivable attributes so as to decide whether they should be stored as columns in the database or calculated “on the fly.” We therefore... the database is ported to another DBMS supporting similar structures (e.g., another relational DBMS or a new version of the same DBMS having different performance properties), the logical data model can be used as a baseline for the new physical data model The task of transforming the conceptual data model to a relational logical model is quite straightforwardcertainly more so than the conceptual modeling. .. Rate/100.0)) Figure 11 .7 A table and a view defining a derivable attribute 11.4 Basic Column Definition ■ 3 37 11.4.5 Complex Attributes In general, unless the target DBMS provides some form of row datatype facility (such as Oracle™’s “nested tables”), built-in complex datatypes (such as foreign currencies or timestamps with associated time zones), or constructors with which to create such datatypes, each... columns 11.4.8 Column Datatypes If the target DBMS and the datatypes available in that DBMS are known, the appropriate DBMS datatype for each domain (see Section 5.4.3) can be identified and documented Each column representing an attribute should be assigned the appropriate datatype based on the domain of the corresponding attribute Each column in a foreign key should be given the same datatype as the corresponding... Specification ■ 3 27 In such cases, if our chosen CASE tool does not allow us to show manyto-many relationships in the conceptual data model without creating a corresponding intersection table in the logical data model, we should delete the relationship on the basis that it is derivable (and hence redundant); we do not want to generate an intersection table that contains nothing but derivable data If you... the conceptual data model are normally replaced by standard relational structures in the logical data model Since we are retaining the documentation of the conceptual data model, we do not lose the business rules and other requirements represented by the subtypes we created in that model This is important since there is more than one way to represent a supertype/subtype set in a logical data model and... Another factor is the ability to present data in alternative ways As mentioned in Chapter 1, we do not always access the tables of a relational database directly Usually, we access them through views, which consist of data from one or more tables combined or selected in various ways We can use the standard facilities available for constructing views to present data at the subtype or supertype level,... well-defined transformations from the conceptual data model, the logical data model reflects business information requirements without being obscured by any changes required for performance; in particular, it embodies rules about the properties of the data (such as functional dependencies, as described in Section 2.8.1) These rules cannot always be deduced from a physical data model, which may have been denormalized . accepting a data model. 10.16 Testing the Model with Sample Data If sample data is available, there are few better ways of communicating and verifying a data model than to work through where each data. combination of < ;Data Item 1 Name>, < ;Data Item 2 Name>, . . . and < ;Data Item n Name>.” 10.18 The Assertions Approach ■ 3 17 Simsion-Witt_10 10/11/04 8:49 PM Page 3 17 (e.g., “No two. that the data is to be held in some medium other than the relational database: nondatabase files, XML streams, and so on. 11.3.3 Classification Entity Classes As discussed in Section 7. 2.2.1,