Implementing Data Contracts for Entities
Great article! It is good to read about an actual technical implementation on a topic which is mostly being discussed on a more conceptual and functional level. Would love to get some insights in a technical solution like this but then for typical batch / file based exchange of data between producers and the datalake/warehouse.
"The entity gates of reality" :)
Love it. In implementing these kinds of things throughout my career, the main problem as you point out is the enforcement. This ends up going down one of two paths:
1) The battle over the PR. Criteria are added to PRs. The product engineers/app owners (or as you say, Data Producers) have criteria added to PRs. This almost never works in my experience, though I'd like to know how people have been successful with it.
2) Chargebacks. If you screw up, you pay. This is 'the stick' approach. This type of model in general is hard to implement at most tech and digital native companies that have less mature budget management, though it is somewhat common at most large organizations, whether tech or digital or really any industry. I'm also curious as to success stories about when to enforce 'the stick' for startups/scaleups, etc.
your Data Quality Camp link does not work
That's pretty great article and it makes a lot of sense to me.
I'd say the requirements for contracts are pretty spot on and draw a lot from software engineering proven practices.
I'd say the hardest to implement are "Data contracts must be enforced at the producer level" and then "Data contracts cover semantics" (so I am really curious to see part 3).
Data contracts must be enforced at the producer level - that seems hardest.It is on one hand the producer (e.g software engineering creating the product or service ) need to be aware of how changes in semantics of data (which may not change product behaviour but would change analytics) is impactful. But there are purely seasonal changes (normal semantic/data drift) that depends on things outside of even the producer people which can change the semantic of the data.
"A contract by definition requires enforcement." Probably missed this in the text, but who will be in charge of this enforcement? Let's say I have a DevOps and DataOps team with product owners and then CIO at the top of the org.