Turning Data Iinto Useful Information

Susan AndreSurvival Strategy Column

Data warehouse projects sometimes fail because basic issues are ignored. Fundamentals such as identifying a trusted source, training users and ongoing analysis should be addressed from the outset.

With nearly any project, mart or warehouse, the rub will be both the extraction and the cleansing of data. In some cases extraction proves to be a big undertaking and the time required to clean the data is often under-estimated.

Trust

"Confidence in the quality of the data is essential for the project to be a success."

Companies that populate the initial repository with legacy data will face a similar problem. However, by implementing a few policies early on, these problems will pose a minimal delay on the project. The chances are that various systems will have different ways of entering data.

The data warehouse team should examine all data sources, decide on a uniform structure, map which changes need to be made to which data source, and devise a plan for dealing with anomalies.

Identify a trustworthy source

When the question of data accuracy becomes an issue, a source needs to be chosen and used as the baseline. It may be necessary to spot-check the database for accuracy to ensure the quality of the chosen system.

Developers should also tell users how data quality has been established and invite them to comment on this issue. Confidence in the quality of the data is essential for the project to be a success.

When dealing with data, it's not just the data acquisition, it's also the integrity of the data that affects the warehouse. If there isn't a way to check back to see if the data is fine when it comes from the source system, there will be problems. Confidence in the quality of the data is essential for the project to be a success.

While modelling the data and developing extraction methods and policies, we can decide on several other policies of importance such as how often to reload records, what are the data transformation rules and the metadata strategy.

Train users

Training occurs at the prototyping stage and beyond. A good technique is to include respected, helpful people in the prototyping committee. This way the warehouse help desk might be relieved of some of its calls. Training should also be geared toward a mix of users. The executive branch will probably run different types of queries than those run by the sales department. Also, this is the time to introduce online documentation and avenues where users can freely report bugs and offer suggestions for improvements.

Ongoing analysis

Finally, feedback is important. Zero feedback could mean people aren't using the system. Part of the initial budget and timeline should include periodic interviews with the pre-engineering team and/or the prototyping team. Developers can then be sure that the system is responding to the needs of users as the company grows. Also, at three months, six months, and semi-annually thereafter, the performance of the system should be modelled again and compared to previous models. Developers can watch how performance develops as users grow comfortable with the system and incorporate it more tightly into their work culture.

The important point to remember is that we are building a production system for business analysis. This means that even though we are moving through the analysis-to-build process in a rapid iterative form, all the due diligence processes required to properly certify the data warehouse for production must still be complied with.

The aggressive time frames inherent in today's design of operational and analytic information systems dictate that parallel work streams become the norm. Knowledge workers need to get information from their current production systems even though little or no help is available from IS, queries against the data need to be asked in business terms, and questions are often ad hoc or nonprocedural in nature.

What differentiates data warehousing from operational systems design is that the process of data collection and operational reporting can be precisely determined. We also need to ensure that during the requirements gathering process we understand what it is that we need to build.

More than 90% of all initial user requirements will be operational-reporting based. If the client is not in marketing or senior management, he/she will ask for operational reporting and not long-term trending and business direction types of enquiries. The general rule of thumb is to assume an ODS will be required and not a data warehouse. If designing an ODS we will build a data store which is: Not time variant (limited time window for review and analysis). Integrated (in terms of data collection and consolidation). Volatile (overwriting data on a continual basis and not retain history). Subject-oriented (common profiles refined from collection of subsets of same topic information).

What we will require as a common building block for this environment is a data staging process where we can start thinking about the extract, transform and load (ETL) process and how it will evolve as the data mart or ODS grows, and integrate it within an overall data warehouse architecture. Again, both the infrastructure and analysis teams need to put their heads together to determine how much ETL is enough for the current set of project requirements and what to consider as we move forward.

A few simple rules guide the interaction between the components of a system. First, in a business context, managers should attend to relationships at all levels within their organisations. The second rule is that small changes can have large effects. And third, interesting and unpredictable properties can be expected to emerge from a system. As a result, it is hard, if not impossible, to implement a strategic plan for anything but the short-term. A hoped-for direction can be set but not the ultimate goal.

The challenge for managers is to feel comfortable merely setting the direction for the future, and to be ready to adapt and evolve as the environment changes.