Home » Glossary » Data Warehouse

Data Warehouse

What is a data warehouse?

A data warehouse is a central repository that contains all data of an organization. The data in a data warehouse often comes from a variety of data sources within marketing, sales, finance and operations. Most often, all data in the warehouse has already been cleaner and processed for consumption.

From a conceptual point of view, a data warehouse shares a lot of properties with a traditional database. But from a technical perspective, a data warehouse is optimized for analytical purposes. It is worth noting that a data warehouse can even contain multiple databases, each with its own tables. Most data warehouse vendors offer their technologies in the cloud and offer consumption-based pricing.

A data warehouse offers multiple benefits to an organization

  • By combining all data sources, decision-makers can make informed decisions
  • Because all the data is consolidated in a central repository, it can be easily consumed
  • Because a data warehouse is near-infinitely scalable, one can keep a history of all data
  • When all data goes through managed data pipelines, it is of high quality and accurate
  • Because analytics is separated from the operational databases, processes are secure and both systems are optimized for their specific purpose

The difference with a data lake

Just like a data warehouse, a data lake is also a centralized repository for data. However, unlike a data warehouse, a data lake does not require a tabular format and can contain semi-structured and unstructured data. A data lake and a data warehouse can coexist in the same data pipeline: the data lake contains all raw data before being processed and stored in the data warehouse, ready for consumption.