Lawrence Corr of DecisionOne Consulting visits South Africa again this November to run a Data Warehouse Design and Development Master Class, hosted by Alicornio Africa. In this article he questions the corporate information factory approach to data warehousing.
Data warehouses have been with us since the early 1990s. From almost the very beginning there have been two schools of thought on how they should be. Either you believed entity relationship modelling and normalised data models were the right way to design all databases including data warehouses or you believed that dimensional modelling and star schemas were more suitable for query and reporting. Each approach had its champion, Bill Inmon, the ‘father of data warehousing’ advocated the normalised enterprise data warehouse (EDW) and Ralph Kimball recommended dimensional models. So began the Inmon vs. Kimball holy war.
By the late nineties Kimball’s techniques had gained widespread acceptance amongst practitioners and database vendors alike who found that dimensional data warehouses were easier for business users to understand and had better query performance than their 3NF equivalents.
Undaunted, Inmon mounted an influence-preserving rear guard action in the form of the Corporate Information Factory (CIF). This conceptual framework for decision support begrudging embraced dimensional models in their ‘right’ place; data marts sourced from the EWD. Inmon and his followers argued that the CIF offered the best of both worlds employing the necessary mixture of normalised and dimensional database designs for data management and end user access benefits.
However, for many dimensional practitioners including myself the CIF offers the worst of all worlds as it continues to lock away the detailed data in a virtually unqueryable EDW database and advocates providing most users with dimensional data mart containing summarised data.
Why would you do this? One CIF argument that most users will not require detail data simply doesn’t hold water. It is true that business intelligence style reports seldom display detail records but users typically need to constrain these highly aggregated reports by attributes that are only available at the detail level. They also need the flexibility to drill down not just on the standard hierarchies of time, location and organisation but on any attribute. Isn’t one of the main reasons we go down the data warehousing route to free business user from the constraints of summary data?
Experience has taught many of us that it is no use introducing a shiny new KPI dashboard or ‘Executive Information System’ as we use to call them without making the detail behind it readily available too.
The CIF architecture of course has an answer to this. A full CIF implementation augments the EDW and OLAP data marts with several additional databases operational data store (ODS), Operational marts, data mining and exploration warehouses which do provide more detail for those who need it. If this sounds like a BIG IT solution then it is. There are many stages for data to pass though and many locations and formats in which to hold the same data and they will inevitably introduce additional complexities, costs and latency compared to loading the atomic-level detail data directly into a dimensional warehouse.
If you or your company are being sold the CIF approach here are a few questions to ask you CIF consultant and the likely answers you will get.
Why are so many databases layers needed?
Part of the CIF justification comes from Inmon’s early writing which identifies users by simplistic behaviour labels such as explorers, tourists and farmers that bear no relationship to actual work patterns. In reality users don’t fall into pigeon holes and the line between operational reporting and analytical report got blurred a long time ago. Users don’t want to have to go to several different marts or warehouses and perform disconnected analysis.
Why can’t a data warehouse which provides all the necessary functionality be built using dimensional models?
A typical response is “dimensional modelling starts and ends with its focus primarily on the individual business unit”. This would only be true if you choose not to consider the big picture and intended to build a standalone data mart. Ralph Kimball has written many times about the use conforming dimensions and a dimensional bus matrix when planning an enterprise dimensional warehouse. The CIF argument chooses to completely ignore dozens of articles and,
Another CIF response is that “dimensional models focus on known requirements and presuppose the questions. They can’t answer the unknown question and that is the role of the data warehouse” or “star schema limits the usefulness of data marts for complete and unbiased data mining and statistical analysis …”. Most of these CIF answers are based on the premise that dimensional models can only cope with aggregated data. In most cases there isn’t a technical limitation anymore and without imposing an artificial one the need for an EDW goes away. By the way, the Data mining and statistical analysis argument is a red herring, these tools don’t concern themselves with the data model they want flat file or table extracts which can be build from any data model.
How do you build a CIF incrementally?
CIF proponents will now agree that you need to build data warehouses incrementally and that while it is desirable to design an EDW only a subset of the enterprise data need be loaded per iteration. But what is this subset or unit of iteration? They typically talk about subject areas such as Product or Customer. Neither of these entity-focused subject areas represents a manageable chunk of corporate data that I would choose to bite off in a single iteration. Surely product or customer cut right across your business and implicate almost every significant source system. The dimensional modelling approach is to iterate by business process or significant measurable event. These are our manageable subject areas and our integer unit of deliverable work is a star schema. This typically limits our scope to a single operational source per iteration. If the scope must be wider there is the opportunity for teams to work on multiple sources and stars in parallel.
OK the gloves need to come off. I don’t believe a full implementation of the corporate information factory exists anywhere outside of a book by Bill Inmon or Claudia Imhoff. I believe the CIF is an expensive smoke screen which has given an extended lease of life to the tired old EDW. We need to stop propping it up with more data marts with fancy names. It is time to consign both the CIF and normalised data warehouse to the great IT burial ground and move on.
Lawrence Corr
From the 15 – 19 November 2004 at the Unisys Auditorium Sunninghill, Johannesburg, Lawrence Corr and Joe Caserta will present a Data Warehouse Design and Development Master Class which includes Extended Dimensional Analysis and Design and ETL Design course, respectively. For further information and for course registrations, contact Fiona King at Alicornio Africa on (011) 258 8739, email FionaK@alicornio.co.za or visit www.alicornio.co.za
Lawrence Corr is a leading data warehouse design specialist and highly experienced educator. He has taught Data Warehouse design courses across Europe, North America and Africa. Having worked in decision support for the past 18 years, he later channelled his expertise into the arena of Data Warehousing and has since 1996 become recognised as a world leader in the field. He has advised on numerous data warehouse projects in Europe, USA, Middle East and Africa and has developed and reviewed designs for clients within the industries of Insurance, Aerospace, Pharmaceuticals, Engineering, Manufacturing, Telecom’s, Financial Services and Retail. In recent years, he held the position of data warehouse practice leader at Linc Systems Corporation, CT, USA and was vice-president of data warehousing products at Teleran Technologies, NJ, USA. In 2000 he was invited by Dr. Ralph Kimball to become an associate and has taught data warehousing classes for Kimball University in Europe and South Africa. He now works independently through his own company based in the UK providing dimensional data warehouse consultancy and education worldwide.
About Alicornio Africa
For the past six years, Alicornio Africa has serviced an impressive array of blue chip and other companies. In addition, they have continuously added to the knowledge pool of their industry by hosting leading international experts in the field in training and discussion forums. They have established themselves as leaders in the education of the expert and layman alike in the intricacies of the highly complex arena of business intelligence, information integration management, data warehousing architecture, design and implementation. Alicornio Africa’s consulting services are both product and non-product specific. The company distributes and implements DOC1 for Customer Communication Management, the Sagent Solution for Information Integration and Istante software for RTE.