Thursday, August 15, 2019

Data Lake vs Data Warehouse

Is a data lake part of your data warehouse platform or does the data lake sit beside it? There is a fair amount of ambiguity as to what a data lake is and how it should fit into your overall data strategy. 

I believe data lakes (coupled with elastic cloud storage and compute) are a game changer in both the DW and BI world. Your data warehousing strategy should be part of the data lake not the other way around. While you don't have to throw away everything you have done or learned in your traditional ETL and DW world, the fundamentals have changed. 

To take advantage of your data and build better BI/analytics you must build atop a sold data lake foundation. And this going well beyond the many failed Big Data and Hadoop projects of the recent past that many enterprises have experienced. 

While Hadoop was a necessary step forward at the time, it was and is an evolutionary dead end - RIP Hadoop. Cloud data lakes are the future and it is more than putting your data into S3 buckets. 

Well architected data lakes are the culmination of a succinct data management strategy that leverages the strengths of cloud services and many traditional DW best practices and data governance policies.

