⟵ Posts

Modern Data Stack Diagrammed

30 Mar 2022

This post is dedicated to all those engineering managers who’ve found themselves suddenly having to manage a data team and now need to get up to speed before Tuesday’s planning meeting.

Modern Data Stack Data Flow Diagram

Data Warehouses store structured data that’s transformed and augmented to support regular, and ad-hoc reporting and analysis– it’s the database that Tableau connects to.

A Data Mart is a data repository that supports specialized reporting needs, e.g., near real-time reporting, or purchase records with masked customer names.

A Data Lake is a collection of repositories of unstructured or semi-structured data, in which data is stored in as close to “raw” form as possible. Data-lakes comprised of BigQuery and Google Cloud Storage are common.

A Data Lakehouse provides a programmatic interface to data stored in a data lake. In practice it’s a collection of tools, services and wiki entries with instructions on how to retrieve the data.

Bonus definition: Reverse ETL. This is the process of copying data back into the ERP system. Extracting data from multiple sales channels (stripe, paypal, out of band contracts), to generate the consolidated sales report, and then uploading it into Dynamics 365 is an example of reverse etl.


👋🏽 If you’re up to your elbows in performance reports and need an extra pair of hands supporting your data team and infrastructure, say hi. I’m available for hire 🎉.