Yet another clear explanation from one of the best communication teams in the data community! We are obviously living through an evolution of data platforms, and reading Chad and Mark is like seeing ahead into the future of data-governance-as-code.
Nice post, but i still haven't found a technical implementation - for example having Collibra SaaS as catalog and imagining "some magic easy trick" created data products table schemas and all tagged etc, i know Collibra have some rest api which i think support crud opetations on all assets (tables columns).. but i still haven't found realistic posts or implementations where i imagine 2 things:
1. ddl (schema schema,masking,etc) - either collibra pushdown to db (imagine Snowflake), either some custom client code keeps asking collibra if there's any change right? and if yes create ddl based on schema chsnges returned by rest api.
2. data quality (data contracts)- here i imagine like you showed in one of your post flow image, at certain points a schema validation is done. i duppose priori to that some custom client code called collibra rest api pulling specific schema and saved locally to be used on the couple points in flow to compare actual schema of data file(this i suppose already exists when using iceberg, but when using files in like s3 either inferhshema(files) either someother automatic way-ses glue seemes very poor as it needs manual trigger and need of pulling asking if infer schema completed) vs collibra table schema?
I would love to see feedback on above, i tried posted in Collibra community but their forum very low activity (compared with snowflake one). Look forward for your insights.. thanks 🙏✨😊
Yet another clear explanation from one of the best communication teams in the data community! We are obviously living through an evolution of data platforms, and reading Chad and Mark is like seeing ahead into the future of data-governance-as-code.
Thank you for your kind words Gordon, and for taking the time to read this!
I really like the grocery shopping analogy, thank you.
Also noticed that the link to "The Rise of Data Contracts" is broken.
Nice post, but i still haven't found a technical implementation - for example having Collibra SaaS as catalog and imagining "some magic easy trick" created data products table schemas and all tagged etc, i know Collibra have some rest api which i think support crud opetations on all assets (tables columns).. but i still haven't found realistic posts or implementations where i imagine 2 things:
1. ddl (schema schema,masking,etc) - either collibra pushdown to db (imagine Snowflake), either some custom client code keeps asking collibra if there's any change right? and if yes create ddl based on schema chsnges returned by rest api.
2. data quality (data contracts)- here i imagine like you showed in one of your post flow image, at certain points a schema validation is done. i duppose priori to that some custom client code called collibra rest api pulling specific schema and saved locally to be used on the couple points in flow to compare actual schema of data file(this i suppose already exists when using iceberg, but when using files in like s3 either inferhshema(files) either someother automatic way-ses glue seemes very poor as it needs manual trigger and need of pulling asking if infer schema completed) vs collibra table schema?
I would love to see feedback on above, i tried posted in Collibra community but their forum very low activity (compared with snowflake one). Look forward for your insights.. thanks 🙏✨😊