The Existential Threat of Data Quality

Chad Sanderson

May 30, 2022

And Why the Modern Data Stack Can't Solve It

Read →

8 Comments

Arpit Choudhury

Jun 1, 2022

Great read Chad, easy to follow and compelling!

Expand full comment

Robert Gröver

Jun 24, 2022

Great read and the most important topic is that it is a organizational topic in its root. Data quality issues mostly originate from the sources, source processes and impact the data warehouse and the analytics downstream. As outline the separated ownership of source system and reporting and analytics and an overarching data quality issue is the problem constellation. There needs to be organizational function on top of both data quality issues and the impact can be elevated to, to take a decision if the efforts on change on the source are justified and need to be done.

Expand full comment

Rick Sjogren

Jun 20, 2022

Interesting read. I expect that when you start trying to answer those 4 questions, you will discover that you will need to regulate your data environment and that the regulation will be imposed rather than self-regulation. Enter Data Governance and possibly interface agreements or data contracts.

One thing to consider when implementing data governance is what happens when one of the teams doesn’t play nicely. That is, what sort of ‘enforcable’ sanctions will the business apply to someone who breaks the rules. Look at your company’s financial governance and how it responds to internal rule breakers. That should form the basis for your data governance.

Not only that, data governance sanctions should be the equal of financial governance sanctions. Anything less and senior management will ride roughshod over you and you’ll be back to where you started but will have spent a lot of money and time to get there.

Expand full comment

Spencer Weeks

Jun 7, 2022

I’ve always thought it would be great to embed DEs on product/engineering teams in the same way some teams have an ops engineer and qa person. I guess it is similar to having embedded (or decentralized) analysts and would require a strong foundation to maintain consistency.

Excited to follow along and hear more of your thoughts.

Expand full comment

Deck1187hw

Jun 2, 2022

Good post. My 2cents is that with a data team to support the modern data stack that establishes the rules on how to: ingest, document and test the raw datasets. And let the product teams be responsible for their data ingestions. Is a win situation at least for us. Whatever happens afterwards (operation / activation data via reverse etl, reporting layer or even ML layer) is another story and give us the flexibility to use that raw cleaned data the way we want

Expand full comment