Designing the Operational Data Store

来源:互联网 发布:linux系统查看cpu个数 编辑:程序博客网 时间:2024/06/05 17:05

Information Management Magazine, July 1998

Bill Inmon

Recentlythere has been controversy over the validity and makeup of thearchitectural structure known as the operational data store (ODS). Someskeptics question the existence of the ODS. This argument is quitestrange because the ODS is one of the most pervasive architecturalstructures found in information systems today. The notion that the ODSis not a legitimate structure is news to SAP, Oracle Financials andPeopleSoft--three of the most widely implemented pieces of software inthe 1990s which happen to contain major components which are decidedlyODSs. While some aspects of these software packages are beyond thebounds of an ODS, many of the features of these software packagessquarely fit the paradigm of an ODS.

Likewhat you see? Click here to sign up for Information Management's dailynewsletter to get the latest news, trends, commentary and more.

As further evidence ofthe health of the ODS, in a recent private conference, seveninformation systems directors of large, well-known companies spent timedescribing their environment. The ODS was a prominent feature of eachof these companies' information systems architecture. So it is peculiarthat industry experts are questioning the validity of the ODS. Perhapsthese experts simply do not understand what an ODS is and whatfunctions it performs.

The Architectural Positioning

Inorder to have a discussion about ODSs, the conversation best beginswith a schematic that shows how an ODS is architecturally positioned.Figure 1 shows the classical positioning of the ODS.

Advertisement

<SCRIPT language="JavaScript1.1"SRC="http://ad.doubleclick.net/adj/information-management.com/;abr=!ie;pg=ros;sz=468x60;pos=3;tile=5;ord=13763546?"></SCRIPT>

InFigure 1 the ODS is seen to be an architectural structure that is fedby integration and transformation (i/t) programs. These i/t programscan be the same programs as the ones that feed the data warehouse orthey can be separate programs. The ODS, in turn, feeds data to the datawarehouse.

Some operational data traverses directly into the datawarehouse through the i/t layer while other operational data passesfrom the operational foundation into the i/t layer, then into the ODSand on into the data warehouse.

An ODS is an integrated, subject- oriented, volatile (including update), current-valued structure designedto serve operational users as they do high performance integratedprocessing. (Note: For a comprehensive discussion of the subject ofoperational data stores, refer to the book, Building the Operational Data Store,by W. H. Inmon, Claudia Imhoff and Greg Battas, published by John Wiley& Sons. This article will not try to restate concepts anddescriptions that have been in the public domain for quite a while.)

Theessence of an ODS is the enablement of integrated, collective on-lineprocessing. An ODS delivers consistent high transactionperformance--two to three seconds. An ODS supports on-line update. AnODS is integrated across many applications. An ODS provides afoundation for collective, up-to- the-second views of the enterprise.And, at the same time, the ODS supports decision support processing.

Becauseof the many roles that an ODS fulfills, it is a complex structure. Itsunderlying technology is complex. Its design is complex. Monitoring andmaintaining the ODS is complex.

The ODS takes a long time toimplement (e.g., SAP). The ODS requires changing or replacing oldlegacy systems that are unintegrated.

The Dual Role of the ODS

Thereis a very dual role played by the ODS. On the one hand, the ODS isdecidedly operational. The ODS provides high response time and highavailability and is certainly qualified to act as the basis ofmission-critical systems. On the other hand, the ODS has some veryclear DSS characteristics. The ODS is integrated, subject oriented andsupports some important kinds of decision support.

The Users--Farmers and Explorers

Thisarticle will focus on one of the more misunderstood aspects of theODS--the foundation of the design. In order to understand thefoundation of the design of the ODS, you first need to understand thattwo very different types of users are attracted to the ODS--farmers andexplorers.

The first user of the ODS is a user who can be calleda "farmer." Farmers are those people who do the same task repetitively.Farmers know what they want when they set out to search for something.Farmers look at small amounts of data with each transaction. Farmersalmost always find what they want. Farmers usually find small flakes ofgold, not huge nuggets, at the completion of their transaction. Farmersoperate in a world of structure--structured data, structuredprocessing, structured procedures and so forth.

The other type ofuser that is served by the ODS is the quot;explorer." The explorer isthe antithesis of the farmer. The explorer operates in a random manner.The explorer does not know what he/she is looking for at the outset ofthe analysis. Explorers operate in a heuristic mode. Explorers look atvery large sets of data. Explorers look for associations between typesof data, patterns that are useful and relationships that haveheretofore never been discovered. The explorer often finds nothing as aresult of an analysis, but occasionally the explorer finds huge nuggetsof gold. Explorers operate in a pattern that defies prediction. Theexplorer operates in an almost completely unstructured manner.

The ODS and Explorers and Farmers

TheODS must satisfy the needs of both the farmer and the explorer; andbecause of this paradox, the design of the ODS is a difficult task inthe best of circumstances.

The Basis of Design in DSS

Theclassical design of the structures found in the DSS environment beginswith a data model, which reflects the informational needs of thecorporation. Figure 2 shows the steps leading to a DSS design.

Normalizedtables are generated from the data model. These tables constitute whatcan be described as a logical design. The many normalized tables arecombined into a form of physical design that can be described aslightly normalized design. In a lightly normalized design, tables arecombined on the basis of containing common keys and general commonusage.

The design technique of creating normalized/lightlynormalized structures based on a data model that has been describedhere fits many instances of DSS design. But there is a fly in theointment of this approach. When the issues of performance where many tables must be joined, performance where there are many occurrences of data that will populate the design, and simplicitywhere users find it unnatural to join many tables together to representdata in a form comprehensible to the end user each time the end userdoes a transaction are considered, the design technique of light normalization yields marginal results.

Analternate design approach is to take into consideration the volume andusage of the data. When the volume and usage of the data are factoredinto the design, a mutant form of normalization is achieved. The lightnormalization turns into heavy normalization, and a structure known asthe "star join" is created. (See Figure 3.)

Thereare two essential parts to a star join-- fact tables and dimensiontables. (Note: For an in-depth discussion of the subject ofmultidimensional design, refer to Ralph Kimball's book, The Data Warehouse Toolkit: Practical Techniques for Building Dimensional Data Warehouses,published by John Wiley & Sons. This book is the definitive sourcefor the subject of multidimensional database design.) The fact tablerepresents the structure that holds the majority of the occurrences ofthe data. Fact tables typically combine data and cross reference keysfrom a variety of other tables.

The other type of table thatparticipates in a star join is the dimension table. Dimension tablescontain data which is not terribly voluminous. Dimension tables arerelated to fact tables by means of a foreign key relationship.

Facttables are efficient to access because data has been prejoined into thetable at the moment of loading. The end user is able to access facttables efficiently because the fact tables are extremely streamlined intheir design. In addition, the fact table is familiar to the end user,in terms of the day-to- day structuring of data that the end user isaccustomed to seeing.

By building star joins, the designer hascreated a structure for efficient access, large volumes of data andnatural end-user viewing. However, there is a problem with star joins.In order to know how to create the star join, the designer must makeassumptions about the usage of the data. Stated differently, withoutknowing the predominant pattern of access and usage of the data, youcannot create a star join. At the heart of the design of any star joinis the implicit understanding of how the data in the star join is to beused. Unfortunately, one department will look at data very differentlyfrom another department. The star join for finance will be verydifferent than the star join for production, for example.

Thereis a second problem with star join structures, and that problem is thaton-line update plays havoc with the underlying data management requiredto make the star join complete. In a DSS world where there is noupdate, this is not a problem. But in an ODS world where on-line updateis a normal event, the inability of the star join to gracefully handleupdates presents a special challenge.

A Dilemma

Thus, theODS designer has a dilemma. On the one hand, the designer wishes tohave efficiency of access and the ability to handle large amounts ofdata. On the other hand, the ODS designer must design the system to beable to accommodate a wide variety of users. The following tableillustrates the dilemma of the ODS database designer:

Thedesigner in the ODS environment faces Hobson's choice. Neither designapproach--normalized or star join--is optimal for the ODS. Bothapproaches have their strengths and weaknesses.

The way thesophisticated designer goes about solving this apparent contradictionis to go back to the users of the system. For those parts of the systemused primarily by explorers, a normalized design is optimal. Explorersdo not know how they are going to use the system, so normalizationsuits them just fine. For those parts of the system used primarily byfarmers, a star join approach is optimal. Since farmers have apredictable and repetitive usage pattern, a star join can be created toallow them optimal access. Figure 4 shows this dual design approach forthe ODS.

Thenext factor that must be accounted for is the issue of update or pureDSS processing. Some farmers do no update. They are the "pure" DSSprocessors. Other farmers do update as a regular part of their ODSprocessing.

Explorers, however, seldom do on-line update. Ifexplorers do update at all, it is by creating sweeping batch programsthat march across entire tables and make massive changes. But explorersare not known for making changes, certainly not on-line updates. Figure5 shows that the proper basis of design for an ODS is entirelydependent on who is using the ODS and what kind of work they are doing.

If the ODS is used only byfarmers doing DSS processing, then an exclusive star join approach isin order for the entire ODS. But if update processing is being done byfarmers or if there is usage of the ODS by explorers to any extent,then one or the other form of normalization is in order. If the ODS isused only by explorers, then a normalized approach is in order for the entire ODS.

Thisarticle has addressed the architectural structure of an ODS and how itis architecturally positioned. The ODS has a dual design objective,which is quite different from other database structures found in theworld of DSS and operational systems.

BillInmon is universally recognized as the father of the data warehouse. Hehas more than 35 years of database technology management experience anddata warehouse design expertise. His books have been translated intonine languages. He is known globally for his seminars on developingdata warehouses and has been a keynote speaker for many major computingassociations. For more information, visit www.inmongif.com and www.inmoncif.com. Inmon may be reached at (303) 681-6772.

For more information on related topics, visit the following channels:

  • DW Basics
  • Operational Data Store
原创粉丝点击