Author: Etienne Oosthuysen
I am increasingly asked by customers – Is the Data Warehouse dead?
In technology terms, 30 years is a long time. This is how old the Data Warehouse is – that makes the Data Warehouse an old timer. Can we consider it a mature yet productive worker, or is it a worker gearing up for a pension?
I come from the world of Data Warehouse architecture and in the mid to late naughties (2003 to 2010) whilst working for various high profile financial service institutions in the London market, Data Warehouses were considered all important and companies spent a lot of money on their design, development, and maintenance. The prevailing consensus was that you could not get meaningful, validated and trusted information to business users for decision support without a Data Warehouse (whether it followed an Inmon, or a Kimbal methodology – the pros and cons of which are not under the spotlight here). The only alternative for companies without the means to commit to the substantial investment typically associated with a Data Warehouse was to allow Report Writers to develop code against the source systems database (or a landed version thereof), but this, of course, leads to the proliferation of reports, and it caused a massive maintenance nightmare and it went against every notion of a single trusted source of the truth.
Jump ahead to 2011, and businesses started showing a reluctance to invest in Data Warehouses – a trend that accelerated from that point onward. My observations of the reasoning for this ranged from the cost involved, the lack of quick ROI, a low take-up rate, difficulty to align it to ongoing business change, and, more recently, a change in the variety, volume and velocity of data that businesses are interested in.
Does all of this mean the Data Warehouse is dead/ dying? Is it an old timer getting ready for pension, or does it still have years of productive contribution to a corporate data landscaper left?
My experience across the Business Intelligence and Data Analytics market, across multiple industries and technology taught me that:
A Data Warehouse is no longer a must-have for meaningful, validated and trusted information to the business users for decision support. As explained in the previous article the PaaS, SaaS and IaaS services that focus on Data Analytics (for example the Cortana Intelligence Suite in Azure (https://www.microsoft.com/en-au/cloud-platform/cortana-intelligence-suite), or the Amazon Analytics Products (https://aws.amazon.com/products/analytics/) allows for modular solutions that can be provisioned as required which collectively answers all the Data Analytics challenges and ensures data gets to users (no matter where it originates, its format, its velocity or its volume) fast, validated and in a business-friendly format.
But this does not mean that these modular Data Platforms that use a clever mix of PaaS, Saas, and IaaS services can easily provide some of the fundamental services provided by a Data Warehouse (or more accurately, components typically associated with a Data Warehouse), such as:
- Where operational systems do not track history and the analytical requirements require such history to be tracked through (for example slowly changing dimensionality type 2).
- Where business rules and transformations are so complex that it makes sense to define the rules and transformations by way of detailed analysis and for it to be hardcoded into the landscape through code and materialised data in structures that the business can understand and is often reused (for example dimensions and facts resulting from complex business rules and transformations).
- Where complex hierarchies are required by the reporting and self-service layer.
- To assist regulatory requirements such as proven data lineage, reconciliation, and retention by law (for example for Solvency II, Basel II and III and Sarbanes-Oxley).
Where these requirements exist, a Data Warehouse (or more accurately, components typically associated with a Data Warehouse) is required. But even in those cases, a Data Warehouse (or more accurately, components typically associated with a Data Warehouse) will merely form part of a much larger Data Analytics Data Landscape. It will perform the workloads described above, and there is a larger data story delivered by complimentary services.
In the past, Data Warehouses were key to delivering optimized analytical models that normally manifested themselves in materialized Data Mart Star Schemas (the end result of a series of layers such as ODS, staging, etc.) Such optimized analytical models are now instead handled by business-friendly metadata layers (e.g. Semantic Models) that source data from any appropriate source of information, bringing fragmented sources together in models that are quick to develop and easy for the business to consume. These sources include those objects typically associated with a Data Warehouse/ Mart (for example materialized SCD2 Dimensions, materialized facts resulting from complex business rules, entities created for regulatory purposes, etc.) and they are blended with data from a plethora of additional sources. The business user still experiences that clean and easy to consume Star Schema-like model. The business-friendly metadata layer becomes the Data Mart, but is easier to develop, provides a quicker ROI, is much more responsive to business change, etc.
Conclusion
The data warehouse is not dead but its primary role as we knew it is fading. It is becoming complementary to a larger Data Analytics Platforms we see evolving. Some of its components will continue to fulfil a central role, but it will be surrounded by all manner of services and collectively these will fulfil the organisation’s data needs.
In addition, we see the evolution of Data Warehouse as a Service (DWaaS). This is not a Data Warehouse in the typical sense of the word as spoken of in this article, but rather a service optimized for Analytical Workloads. Can it serve those requirements typically associated with a Data Warehouse such as SCD2, materialization due to complex rules, hierarchies or regulatory requirements? Absolutely. But its existence does not change the need for those modular targeted architectures and the need for a much larger Data Analytics Data Landscape using a variety of PaaS (including DWaaS), IaaS and SaaS. It merely makes the hosting of typical DW workloads much simpler, better performing and more cost-effective. Examples of DWaaS are Microsoft’s Azure SQL DW and Amazon’s Redshift.