Data is not a Microservice

Jul 7, 2023

Why Software Engineering Can't Solve Data's Problems

8 Comments

Jul 8, 2023

Thank you for this. Framing data mesh as data’s application of microservices was helpful for me as someone relatively new to both. Overall, this has me thinking about what architectures are possible and ideal for my enterprise. Looking forward to your next post.

Expand full comment

Jason DeRise

Jul 7, 2023

Really enjoyed this entry. Wondering if you'd be willing to elaborate on this point, which probably could be its own blog post.

When talking about source of truth datasets and the ability to fork the data for new R&D purposes, you also noted that "Data producers should be aware of changes to their dependencies when these promotions occur. They should be aware of the impact backward incompatible changes might have on data dependents as well."

My question is: What approaches have you seen work well for data producers to be aware of all the downstream dependencies when many independent R&D projects could depend on the same underlying data?

Expand full comment

Reply (1)

Chad Sanderson

Jul 7, 2023

You're right in that one line could almost definitely be a blog post on its own. I think there are levels to increasing the awareness of data producers. I always recommend starting from the foundations: What are the business critical data assets at the company, what is the lineage of those assets, and what happens if the pipeline fails or the upstream data changes unexpectedly? There are programmatic approaches for doing this that are a bit more complicated, but I find most data teams don't really even know what all is important, where the data comes from, and what is the risk profile of those assets.

Communicating the dependencies to producers is a really good starting point, but ultimately you want to alert folks in the developer workflow/as early as possible.

Expand full comment

David McLoughlin

Jul 8, 2023

Doesn’t event sourcing address this issue? Having a persistent stream of raw events that consumers can subscribe to even after the fact? Having said this, I have found adopting event sourcing with existing systems is much harder, because people never really get over the fact that their old database is no longer the source of truth and just one particular read model. Easier to do for new systems.

Expand full comment

Reply (1)

Chad Sanderson

Jul 8, 2023

Yes and no. Event sourcing is good, but only if the events are A.) what the data team actually needs, B.) that the database does not evolve separately from the events themselves and C.) if all new data from the database is regularly added as an event. Practically, these 3 things almost never happen.

Outside of the practical implementations of event sourcing is that most database engineers often don't know where in the production code an event is emitted, and tracking down all the event sources can be complicated and time consuming. There's also issues where microservices are collecting the same data which doesn't solve point number one. There are some clever ways to build aggregate topics and enrich events from multiple sources, but it's a good amount of work and requires a strong bit of buy in already.

I see event sourcing a lot like Data Mesh - great theoretical end-state with lots of tactical and cultural roadblocks in the way.

Expand full comment

Paulo Barbosa

Mar 27

Hey Chad, great piece. By the way, your X/Twitter handler doesn't seem to be a valid account.

Expand full comment

dlthub

Oct 21, 2024

Well articulated, thanks Chad!

Expand full comment

Aman Gupta

Aug 29, 2024

Great write-up, Chad! As someone deeply involved in the data lifecycle and a big supporter of democratizing data, I found your take on microservices and data extremely insightful. Your emphasis on the need for a trustworthy source of truth resonates well with the concepts we're advocating at dlt. It's fascinating to see how our thoughts align, especially when it comes to ensuring data quality and ownership.

I particularly appreciate your discussion on the incremental promotion of data assets to a higher quality once their use case is established. At dlt, we're passionate about embedding governance from the ground up. In our recent blog Governance and Democracy in a Data Mesh World:

https://dlthub.com/blog/governance-democracy-mesh,

we touched on this by highlighting the importance of shift-left data democracy (SLDD). We support schema evolution and data contracts to maintain data integrity right from the ingestion phase. This approach helps issues of backward compatibility and data quality without burdening producers with unnecessary overhead until it’s needed.

Looking forward to reading more of your insightful perspectives!

Best,

Aman Gupta

DLT Team

Expand full comment

Data Products

Data is not a Microservice