Programmatic Accountability in Batch
great post :) as ex-dwh team in a company DL scenario, i frequently see us tackling problems like this. this is a really nice and broad overview!
one of the current projects is a company wide dictionary on how to name fields and their contents. this serves as basis for both, data discovery input and contract generation. let’s see where we end up
This post is pure gold. It's a must-read if you own a data platform or a data warehouse and you want to ensure that data quality gets better over time. The challenge is to get the executive buy-in for the upfront effort and investment required.
Well done with this post. It is concise yet very informative.
I have one question though, you have stated that:
Similar to contracts in production services, contracts in the warehouse should be implemented in code and version controlled. The implementation of contracts can take many forms depending on your data tech stack and can be spread across tools.
Considering our tech stack includes dbt as well, could you consider the dbt model itself (with tests, metadata, metrics, etc.) the definition of a data contract?
The advantage of that over Protobuf, for example, is that I don't need to write custom-code to set up the monitor. As you mentioned, dbt + great expectations can validate the schema and the semantic layer.