Tài liệu Managing time in relational databases- P21 doc

20 234 0
Tài liệu Managing time in relational databases- P21 doc

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Query Encapsulation As we have already pointed out, production queries against Asserted Versioning databases do not have to check for TEI or TRI violations. The maintenance processes carried out by the AVF guarantee that asserted version tables will already conform to those semantic requirements. For example, when joining from a TRI child to a TRI parent, these queries do not have to check that the parent object is represented by an effective-time set of contig- uous and non-overlapping rows whose end-to-end time period fully includes that of the child row. Asserted Versioning already guarantees that those parent version rows [meet] within an epi- sode, and that they [ fill -1 ] the effective time period of the child row. Ad hoc queries against Asserted Versioning databases can be written directly against asserted version tables. But as far as pos- sible, they should be written against views in order to simplify the query-writing task of predominately non-technical query authors. So we recommend that a basic set of views be provided for each asserted version table. Additional subject-matter- specific views written against these basic views could also be created. Some basic views that we believe might prove useful for these query authors are: (i) The Conventional Data View, consisting of al l currently asserted current versions in the table. This is a one-row- per-object view. (ii) The Current Versions View, consisting of all currently asserted versions in the table, past, present and future. This is a view that will satisfy all the requirements satisfied by any best practice versioning tables, as described in Chapter 4. (iii) The Episode View, consisting of one current assertion for each episode. That is the current version for current episodes, the last version for past episodes, and the latest version for future episodes. This view is useful because it filters out the “blow-by-blow” history which version tables provide, and leaves only a “latest row” to represent each episode of an object of interest. (iv) The Semantic Log file View, consisting of all no longer asserted versions in the table. This view collects all asserted version data that we no longer claim is true, and should be of particular interest to auditors. (v) The Transaction File View, consisting of all near future asserted versions. These are deferred assertions that will become currently asserted data soon enough that the busi- ness is willing to let them become current by means of the passage of time. Chapter 16 CONCLUSION 387 (vi) The Staging Area View, consisting of all far future asserted versions. These are deferred assertions that are still a work in progress. They might be incomplete data that the busi- ness fully intends to assert once they are completed. They might also be hypothetical data, created to try out various what-if scenarios. We also note that existing queries against conventional tables will execute properly when their target tables are con- verted to asserted version tables. In the conversion, the tables are given new nam es. For example, we use the suffix “_AV ” on asserted version tables and only on those tables. One of the views provided on e ach table, then, is one which selects exactly those columns that made up the original table, and all and only those rows that dynamically remain currently asserted and cur- rently in effect. This dynamic view provides, as a queryable object, a set of data that is row for row and column f or column identical to the original table. The view itself is given the name the original table had. Every column has the same name it orig- inally had. This provides temporal upward compatibility fo r all queries, w hether embedded in application code or free- standing. We conclude that Asserted Versioning does provide query encapsulation for bi-temporal data, and also temporal upward compatibility for queries. The Internalization of Pipeline Datasets Non-current data is often found in numerous nooks and crannies of conventional databases. Surrounding conventional tables whose rows have no time periods explicitly attached to them, and which represent our current beliefs about what their objects are currently like, there may be various history tables, transaction tables, staging area tables and developer-maintained logfile tables. In some cases, temporality has even infiltrated some of those tables themselves, transforming them into one or another of some variation on the four types of version tables which we described in Chapter 4. When we began writing, we knew that deferred transactions and deferred assertions went beyond the standard bi-temporal semantics recognized in the computer science community. We knew that they corresponded to insert, update or delete trans- actions written but not yet submitted to the DBMS. The most familiar collections of transactions in this state, we recognized, are those called batch transaction datasets. 388 Chapter 16 CONCLUSION But as soon as we identified the nine logical categories of bi- temporal data, we realized that deferred transactions and deferred assertions dealt with only three of those nine categories—with future assertions about past, present or future versions. What, then, we wondered, did the three categories of past assertions correspond to? The answer is that past assertions play the role of a DBMS semantic logfile, one specific to a particular production table. Of course, by now we understand that past assertions do not make it possible to fully recreate the physical state of a table as of any point in past time because of deferred assertions which are not, by definition, past assertions. Instead, they make it pos- sible to recreate what we claimed, at some past point in time, was the truth about the past, present and future of the things we were interested in at the time. In this way, past assertions support a semantic logfile, and allow us to recreate what we once claimed was true, as of any point of time in the past. They pro- vide the as-was semantics for bi-temporal data. But Asserted Versioning also supports a table-specific physi- cal logfile. It does so with the row create date. With this date, we can almost recreate everything that was physically in a table as of any past point in time, no matter where in assertion time or effective time any of those rows are located. 2 This leaves us with only three of the nine categories—the cur- rent assertion of past, present and future versions of objects. The current assertions of current versions, of course, are the conven- tional data in an asserted version table. This leaves currently asserted past versions and currently asserted future versions. But these are nothing new to IT professionals. They are what IT best practice version tables have been trying to manage for several decades. Now it all comes together. Instead of conventional physical logfiles, Asserted Versioning suppor ts queries which make both semantic logfile data and physical logfile data available. Instead of batch transaction datasets, Asserted Versioning keeps track of what the database will look like when those transactions are applied—which, for asserted version tables, means when those future assertions pass into currency. Instead of variations on best practice version tables which support some part of the seman- tics of versioning, Asserted Versioning is an enterprise solution which implements versioning, in every case, with the same 2 The exception is deferred assertions that have been moved backwards in assertion time. Currently, Asserted Versioning does not preserve information about the far future assertion time these assertions originally existed in. Chapter 16 CONCLUSION 389 schemas and with support for the full semantics of version ing, whether or not the specific busin ess requirements, at the time, specify those full semantics. With all these various physical datasets internalized within the production tables they are directed to or derived from, Asserted Versioning eliminates the cost of managing them as distinct physical data objects. Asserted Versioning also eliminates the cost of coordinating maintenance to them. There is no latency as updates to produc- tion tables ripple out to downstream copies of that same data, such as separate history tables. On the inward-bound side, there is also no latency. As soon as a transaction is written, it becomes part of its target table. The semantics supported here is, for maintenance transactions, “submit it and forget it”. We conclude that Asserted Versioning does suppor t the semantics of the internalization of pipeline datasets. Performance We have provided techniques on how to index, partition, cluster and query an Asserted Versioning database. We’ve recommended key structures for primary keys, foreign keys and search keys, a nd recommended the placement of temporal columns in indexes for optimal performance. We have also shown how to i mprove performance with the use of currency flags. All these techniques help to provide query performance in Asserted Versioning databases whi ch is nearly equivalent to the query performance in equival ent conventional databases. We conclude that queries against even very large Asserted Versioning databases, especially those queries retrieving cur- rently asserted current versions of persistent objects, will per- form as well or nearly as well as the corresponding queries against a conventional database. Enterprise Contextualization As temporal data has become increasingly important, much of it has migrated from being reconstructable temporal data to being queryable temporal data. But much of that queryable tem- poral data is still isolated in data warehouses or other historical databases, although some of it also exists in production databases as history tables, or as version tables. Often, this queryable tem- poral data fails to distinguish between data which reflects changes in the real world, and data which corrects mistakes in earlier data. 390 Chapter 16 CONCLUSION So business needs for a collection of temporal data against which queries can be written are often difficult to meet. Some of the needed data may be in a data warehouse; the rest of it may be contained in various history tables and version tables in the production database, and the odds of those history tables all using the same schemas and all being updated according to the same rules are not good. As for version tables, we have seen how many different kinds there are, and how difficult it can be to write queries that extract exactly the desired data from them. We need an enterprise solution to the provision of queryable bi-temporal data. We need one consistent set of schemas, across all tables and all databases. We need one set of transactions that update bi-temporal data, and enforce the same temporal integ- rity constraints, acros s all tables and all databases. We need a standard way to ask for uni-temporal or bi-temporal data. And we need a way to remove all temporal logic from application programs, isolate it in a separate layer of code, and invoke it declaratively. Asserted Versioning is that enterprise solution. Asserted Versioning as a Bridge and as a Destination Asserted Versioning, either in the form of the AVF or of a home-grown implementation of its concepts, has value as both a bridge and as a destination. As a bridge to a standards-based, vendor-supported implementation of bi-temporal data manage- ment, Asserted Versioning is a way to begin migrating databases and applications right away, using the DBMSs available today and the SQL available today. As a destination, Asserted Versioning is an implementation of a more complete semantics for bi-temporality than has yet been defined in the academic literature. Asserted Versioning as a Bridge Applications which manage temporal data intermi ngle code expressing subject-matter-specific business rules with code for managing these different forms in which temporal data is stored. Queries which access temporal data in these databases cannot be written correctly without a deep knowledge of the specific schemas used to store the data, and of both the scope and limits of the semantics of that data. Assembling data from two or more Chapter 16 CONCLUSION 391 temporal tables, whether in the same or in different physical databases, is likely to require complicated logic to mediate the discrepancies between different implementations of the same semantics. As a bridge to the new SQL standards and to DBMS support for them, Asserted Versioning standardizes temporal semantics by removing history tables, various forms of version tables, transac- tion datasets, staging areas and logfile data from databases. In their place, Asserted Versioning provides a standard canonical form for bi-temporal data, that form being the Asserted Ver- sioning schema used by all asserted version tables. By implementing Asserted Versioning, businesses can begin to remove temporal logic from their applications, and at each point where often complex temporal logic is hardcoded inside an application program, they can begin to replace that code with a simple temporal insert, update or delete statement. Sometimes this will be difficult work. Some implementations of versioning, for example, are more convoluted than others. The code that suppor ts those implementations will be correspond- ingly difficult to identify, isolate and replace. But if a business is going to avail itself of standards-based temporal SQL and commercial support for those temporal extensions—as it surely will, sooner or later—then this work will have to be done, sooner or later. With an Asserted Versioning Framework available to the business, that work can begin sooner rather than later. It can begin right now. Asserted Versioning as a Destination Even if the primary motivation for using the AVF—ours or a home-grown version—is as a bridge to stan dards-based and vendor implemented bi-temporal functionality, that is certainly not its only value. For as soon as the AVF is installed, hundreds of person hours will typically be saved on every new project to introduce temporal data into a database. Based on our own con- sulting experience, which jointly spans about half a century and several dozen client engagements, we can confidently say, with- out exaggeration, that many large projects involving temporal data will save thousands of person hours. Here’s how. Temporal data modeling work that would other- wise have to be done, will be eliminated. Project-specific designs for history tables or version tables, likely differing in some way from the many other designs that already exist in the databases across the enterprise, will no longer proliferate. Separate code to maintain these idiosyncratically different structures will no 392 Chapter 16 CONCLUSION longer have to be written. Temporal entity integrity ru les and temporal referential integrity rules will no longer be overlooked, or only partially or incorrectly implemented. Special instructions to those who will write the often complex sets of SQL transactions required to carry out what is a single insert, update or delete action from a business user perspective will no longer have to be provided and remembered each time a transaction is written. Special instructions to those who will write queries against these tables, possibly joining them with slightly different temporal tables designed and written by some other project team, will no longer have to be provided and remembered each time a query is written. When the first set of tables is converted to asserted version tables, seamless real-time access to bi-temporal data will be immediately available for that data. This is declaratively specified access, with the procedural complexities encapsulated within the AVF. In addition, the benefits of the internalization of pipeline datasets will also be made immediately available, this being one of the principal areas in which Asserted Versioning extends bi- temporal semantics beyond the semantics of the standard model. We conclude that Asserted Versioning has value both as a bridge and as a destination. It is a bridge to a standards-based SQL that includes support for PERIOD datatypes, Allen relationships and the declarative specification of bi-temporal semantics. It is a desti- nation in the sense that it is a currently available solution which provides the benefits of declaratively specified, seamless real-time access to bi-temporal data, including the extended semantics of objects, episodes and internalized pipeline datasets. Ongoing Research and Development Bi-temporal data is an ongoing research and development topic within the computer science and DBMS vendor com- munities. Most of that research will affect IT professionals only as products delivered to us, specifically in the form of enhancements to the SQL language and to relational DBMSs. But bi-temporal data and its management by means of Asserted Versioning’s conceptual and software frameworks is an ongoing research and development topic for us as well. Some of this ongoing work will appear as future releases of the Asserted Versioning AVF. Some of it will be published on our website, AssertedVersioning.com, and some of it will be made available as seminars. Following is a partial list of topics that we are working on as this book goes to press. Chapter 16 CONCLUSION 393 (i) An Asserted Versioning Ontology. A research topic. We have begun to formalize Asserted Versioning as an ontology by translating our Glossary into a FOPL axiomatic system. The undefined predicates of the system are being collected into a controlled vocabulary. Multiple taxonomies will be identified as KIND-OF threads running through the ontol- ogy. Theorems will be formally proved, demonstrating how automated inferencing can extr act useful information from a collection of statements that are not organized as a database of tables, rows and columns. (ii) Asserted Versioning and the Relational Model. A research topic. Bi-temporal extensions to the SQL language have been blocked for over 15 years, in large part because of objections that those extensions violate Codd’s relational model and, in particular, his Information Principle. We will discuss those objections, especially as they apply to Asserted Versioning, and respond to them. (iii) Deferred Transaction Workflow Management and the AVF. A development topic. When deferred assertion groups are moved backwards in assertion time, and when isolation cannot be maintained across the entire unit of work, vio- lations of bi-temporal semantics may be exposed to the database user. We are developing a solution that identifies semantic components within and across deferred asser- tion groups, and moves those components backwards in a sequence that preserves temporal semantic integrity at each step of the process. (iv) Asserted Versioning and Real-Time Data Warehousing. A methodology topic. Asserted Versioning supports bi- temporal tables in OLTP source system databases and/or Operational Data Stores. It is a better solution to the man- agement of near-term historical data than is real-time data warehousing, for several reasons. First, much near-term historical data remains operationally relevant, and m ust be as accessible to OLTP systems as current data is. Thus, it must either be maintained in ad hoc structures within OLTP systems, or retrieved from the data warehouse with poorly-performing federated queries. Second, data ware- houses, and indeed any collection of uni-temporal data, do not support the important as-was vs. as-is distinction. Third, real-time feeds to data warehouses change the warehousing paradigm. Data warehouses originally kept historical data about persistent objects as a time-series of periodic snapshots. Real-time updating of warehouses for- ces versioning into warehouses, and the mixture of 394 Chapter 16 CONCLUSION snapshots and versions is conceptually confused and confusing. Asserted Versioning makes real-time data warehousing neither necessary nor desirable. (v) Temporalized Unique Indexes. A develop ment topic. Values which are unique to one row in a conventional table may appear on any number of rows when the table is converted to an asserted version table. So uniq ue indexes on conven- tional tables are no longer unique after the conversion. To make those indexes uniq ue, both an assertion and an effective time period must be added to them. This reflects the fact that although those values are no longer unique across all rows in the converted table, they remain unique across all rows in the table at any one point in time, specif- ically at any one combination of assertion and effective time clock ticks. (vi) Instead Of Triggers. A development topic. Instead Of triggers function as updatable views. These updatable views make Asserted Versioning’s temporal transactions look like conventional SQL. When invo ked, the triggered code recognizes insert, update and delete statements as temporal transactions. As described in this book, it will translate them into multiple physical tran sactions, apply TEI and TRI checks, and manage the processing of those physical transactions as atomic and isolated units of work. The utilization of Instead Of triggers by the AVF is ongoing work, as we go to press. (vii) Java and Hibernate. A research and development topic. Hibernate is an object/relational persistence and query service framework for Java. It hides the complexities of SQL, and functions as a da ta access layer supporting object-oriented semantics (not to be confused with the semantics of objects, as Asserted Versioning use s that term). Hibernate and other frameworks can be used to invoke the AVF logic to enforce TEI and TRI while maintaining an Asserted Versioning bi-temporal database. (viii) Archiving. A methodology topic. An important archiving issue is how to archive integral semantic units, i.e. how to archive without leaving “dangling references” to archived data in the source database. Assertions, versions, episodes and objects define integral semantic units, and we are developing an archiving strategy, and AVF support for it, based on those Asserted Versioning concepts. (ix) Star Schema Temporal Data. A methodology topic. Bi- temporal dimensions can make the “cube explosion prob- lem” unmanageable, and bi-temporal semantics do not Chapter 16 CONCLUSION 395 apply to fact tables the same way they apply to dimension tables. We are developing a metho dology for supporting both versioning, and the as-was vs. as-is distinction, in both fact and dimension tables. Going Forward We thank our readers who have stuck with us through an extended discussion of some very complex ideas. For those who would like to learn more about bi-temporal data, and about Asserted Versioning, we recommend that you visit our website, AssertedVersioning.com, and our webpage at Elsevier.com. At our website, we have also created a small sample database of asserted version tables. Registered users can write both main- tenance transactions and queries against that database. Because these tables contain data from all nine temporal categories, we recommend that interested readers first print out the contents of these tables before querying them. It is by comparing the full contents of those tables to query result sets that the work of each query can best be understood, and the semantic richness of the contents of Asserted Versioning databases best be appreciated. Glossary References Glossary entries whose definitions form strong inter- dependencies are grouped together in the following list. The same glossary entries may be grouped together in different ways at the end of diffe rent chapters, each grouping reflecting the semantic perspective of each chapter. There will usually be sev- eral other, and often many other, glossary entries that are not included in the list, and we recommend that the Glossary be consulted whenever an unfamiliar term is encountered. ad hoc query production query Allen relationships time period as-is query as-was query asserted version table assertion assertion time 396 Chapter 16 CONCLUSION [...]... refer to them using the word “date” This is done for the same reason that all examples of points in time in the 405 406 THE ASSERTED VERSIONING GLOSSARY text, unless otherwise noted, are dates This reason is simply convenience Periods of time in either of the two bi-temporal dimensions are delimited by their starting point in time and ending point in time These points in time may be timestamps, dates,... books, introduced and developed Kimball’s event-centric approach to managing historical data Concepts such as dimensional data marts, the fact vs dimension distinction, and star schemas and snowflake schemas are all grounded in Kimball’s work, as is the entire range of OLAP and business intelligence software 2000: Developing Time- Oriented Database Applications in SQL R T Snodgrass Developing Time- Oriented... “Unobvious Redundancies in Relational Data Models, Part 2.” InfoManagement Direct (September 2001) http://www.information-management.com/infodirect/ 20010921/4017-1.html Tom Johnston “Unobvious Redundancies in Relational Data Models, Part 3.” InfoManagement Direct (September 2001) http://www.information-management.com/infodirect/ 20010928/4037-1.html Tom Johnston “Unobvious Redundancies in Relational Data... Business and Information System.” IBM Systems Journal (1988), 27(1) To the best of our knowledge, this article is the origin of data warehousing in just as incontrovertible a sense as Dr E F Codd’s early articles were the origins of relational theory 1996: Building the Data Warehouse William Inmon Building the Data Warehouse, 2nd ed (John Wiley, 1996) (The first edition was apparently published in. .. timestamps, dates, or any other point in time recognizable by the DBMS As defined in this Glossary, they are clock ticks Components Components of a definition are other Glossary entries used in the definition Listing the components of every definition separately makes it easier to pick them out and follow crossreference trails The Components sections of these definitions are also working notes towards a formal... relationships, have an inverse The inverse of an Allen relationship or relationship group, between two time periods which do not both begin and end on the same clock tick, is the relationship in which the two time periods are reversed Following Allen’s original notation, we use a superscript suffix (xÀ1) to denote the inverse relationship Inverse relationships exist in all cases where one of the two time periods... encapsulation in the first article, we did not distinguish between temporal and physical transactions All in all, we do not believe that these articles can usefully be consulted to gain additional insight into the topics discussed in this book Although we intended them as instructions to other modelers and developers on how to implement bi-temporal data in today’s DBMSs, we now look back on them as an on-line... available in PDF form, at no cost, at Dr Snodgrass’s website: http://www.cs.arizona.edu/people/rts/ publications.html 2000: Primary Key Reengineering Projects Tom Johnston “Primary Key Reengineering Projects: The Problem.” Information Management Magazine (February 2000) http://www.information-management.com/issues/20000201/ 1866-1.html Tom Johnston “Primary Key Reengineering Projects: The Solution.” Information... the main focus of Date, Darwen, and Lorentzos’s book is column-level versioning While the main focus of our book and Snodgrass’s is on implementing temporal data management with today’s DBMSs and today’s SQL, the main focus of their book is on describing language extensions that contain new operators for manipulating versioned data 401 402 Appendix BIBLIOGRAPHICAL ESSAY 2007: Time and Time Again This... number of installments, began in the May 2007 issue of DM Review magazine, now Information Management The entire set, amounting to some 50 articles and columns combined, ended in June of 2009 Although we had designed and built bi-temporal databases prior to writing these articles, our ideas evolved a great deal in the process of writing them For example, although we emphasized the importance of maintenance . time in either of the two bi-temporal dimensions are delimite d by their starting point in time and ending point in time. These points in time may be timestamps, dates,. almost recreate everything that was physically in a table as of any past point in time, no matter where in assertion time or effective time any of those rows

Ngày đăng: 26/01/2014, 08:20

Từ khóa liên quan

Tài liệu cùng người dùng

  • Đang cập nhật ...

Tài liệu liên quan