Site icon Roel Peters

Data Lakehouse

What is a data lakehouse?

For decades, two concurrent paradigms operated next to each other: the data lake and the data warehouse. The first one offering extremely cheap storage capacity for unstructured data formats, the second one offering easy querying capabilities on structured data formats.

A data lakehouse is a piece of the data architecture that brings the benefits of a data lake and data warehouse together. More specifically, you enjoy both cheap storage and the possibility to query the objects you stored like it was structured data.

As an end-user, this means that your organization’s data is both available as files (often .parquet), with associated metadata files, all structured in a specific way. Some popular formats are Apache Hudi, Apache Iceberg and Delta Lake.

If you wonder what the big deal is about data lakehouses, you should keep in mind the following benefits:

What this means is that data analysts and business intelligence experts can work with the same data sources as their data science colleagues.

Want to know more?

Exit mobile version