8 Comments

Great read Chad, easy to follow and compelling!

Expand full comment

Great read and the most important topic is that it is a organizational topic in its root. Data quality issues mostly originate from the sources, source processes and impact the data warehouse and the analytics downstream. As outline the separated ownership of source system and reporting and analytics and an overarching data quality issue is the problem constellation. There needs to be organizational function on top of both data quality issues and the impact can be elevated to, to take a decision if the efforts on change on the source are justified and need to be done.

Expand full comment

Interesting read. I expect that when you start trying to answer those 4 questions, you will discover that you will need to regulate your data environment and that the regulation will be imposed rather than self-regulation. Enter Data Governance and possibly interface agreements or data contracts.

One thing to consider when implementing data governance is what happens when one of the teams doesn’t play nicely. That is, what sort of ‘enforcable’ sanctions will the business apply to someone who breaks the rules. Look at your company’s financial governance and how it responds to internal rule breakers. That should form the basis for your data governance.

Not only that, data governance sanctions should be the equal of financial governance sanctions. Anything less and senior management will ride roughshod over you and you’ll be back to where you started but will have spent a lot of money and time to get there.

Expand full comment

I’ve always thought it would be great to embed DEs on product/engineering teams in the same way some teams have an ops engineer and qa person. I guess it is similar to having embedded (or decentralized) analysts and would require a strong foundation to maintain consistency.

Excited to follow along and hear more of your thoughts.

Expand full comment

Good post. My 2cents is that with a data team to support the modern data stack that establishes the rules on how to: ingest, document and test the raw datasets. And let the product teams be responsible for their data ingestions. Is a win situation at least for us. Whatever happens afterwards (operation / activation data via reverse etl, reporting layer or even ML layer) is another story and give us the flexibility to use that raw cleaned data the way we want

Expand full comment

This is going to be fun 🔥

Expand full comment

This is going to be fun 🔥

Expand full comment