Unpacking Untruths about Modern Data Warehousing

Susan AndreArticles, News

What is on the wish list of most Business information users? They are yearning for new analytical solutions that:

  • Support multiple reporting tools in a self-service mode that is easy to use
  • Allow rapid ingestion of all kinds of new data files without consulting IT
  • Do not need extensive modeling
  • Scale large data volumes while delivering performance
  • Support advanced analytics such as machine learning and text mining
  • Allow users to cleanse and process the data iteratively
  • Track lineage of data for compliance
  • Offer advanced yet easy search features to explore structured, unstructured, internal, and external data from multiple sources in one secure place

The fallacy:

  • A single solution is available to match the above needs
  • The solution that fits all of the above criteria is the Data Lake
  • All data is magically available for analytical purposes and fit for use

Indeed these are the promises of eager vendors who have very little or no practical experience in the field of information integration and the preparation of data for analytical purposes.

Important factors such as accuracy and quality must be weighed up against the demands for flexibility and self-service while at the same time applying data governance.

Let’s unpack the untruths:

  • Demand: Support multiple reporting tools in a self-service mode that is easy to use
    • Reality: Current tools ease of use is questionable. Technical prowess in this game is a prerequisite for assembling, cleaning data and gleaning insight from it
  • Demand: Allow rapid ingestion of all kinds of new data files without consulting IT
    • Reality: Technical expertise is required to deal with challenges presented around setting up access to various data sources, data cleansing, data integration and data privacy.  Complexities increased with the advent of many different data sources
  • Do not need extensive modeling
    • Reality: Correctly modelled data provides a frame work that guides users. Without it finding relevant, fit for purpose data will be difficult and virtually impossible
  • Scale large data volumes while delivering performance
    • Reality: Correctly modelled data is fully scalable and will contain the lowest level of detailed information available.  Performance tuning remains a technical task in both traditional and modern environments

Setting up the analytical information platform continues to present a myriad of challenges. Advocates of “Data Lake only” implementations tend not to mention the following issues:

  • Complexities are just moved to the analytical area
  • Inconsistent and overlapping data
  • Hidden difficulties and costly maintenance
  • Redundant data extracts and transformations
  • No standardization
  • Little re-use
  • Problems due to lack of Meta Data
  • Duplicated data
  • Data privacy complications
  • Ongoing data integration issues

Data Lakes and Data Warehouses are complementary concepts emerging from different business needs and technological possibilities. Functions should be distributed based on best fit. Urgent new insights based on modern and mostly external data sources (big data) can be gleaned from the Data Lake using new technologies. The traditional dimensionally modelled Data Warehouse continues to support the day to day analytical requests.

While vast quantities of raw data forms an ideal basis for predictive and prescriptive analytics it is still the traditional Data Warehouse that provides a consistent, reconciled, integrated and accurate data foundation across business functions and departments whilst offering legally binding and traceable processes.

Consideration must be given to data and processes that are well-managed for business continuity and legality.

Written by Susan Andrè

Alicornio Africa offers courses on how to complement and extend the lifespan of the existing Data Warehouse by positioning the role of the Data Lake.  It gives a comprehensive overview of all aspects pertaining to BI / DW and covers all the related areas and is technology independent.

Concepts, Design and Modelling for Extended Data Warehousing   teaches sound principles to be applied no matter what technology is used. Register HERE