Data mesh: the beginning, revisited

In 2019, Nextdata founder Zhamak Dehghani wrote a blog post titled “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh” which kicked off the data mesh movement.

Since then, we’ve seen tremendous changes in how organizations think about and use their data. But the pain points and challenges of centralized data management Zhamak surfaced more than five years ago are as vital as ever. In fact, in many ways, these issues have grown worse.

This article revisits the key points from Zhamak’s original blog post and references the seminal texts that helped inspire it—all of which are even more relevant today as enterprise AI use cases are finally reaching production.

Takeaways from the original article 

At the time “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh” was published, organizations had spent nearly a decade applying domain-driven design to guide their software architecture, but disregarded “domain” concepts applied to their data platforms.

Three generations of IT infrastructure led to the use of centralized, monolithic data lakes, which had become an increasingly problematic practice.

Source: martinfowler.com

In response, these were Zhamak’s key points:

Each generation of enterprise data platforms faced a distinct set of challenges

  • First generation: Proprietary DW and BI platforms incurred large costs and technical debt
  • Second generation: Big Data ecosystems, with data lakes as the assumed solution—led to long batch jobs and a centralized team of hyper-specialized data engineers
  • Third generation: Data lakes with streaming added, hosted in the cloud—promised cost optimization and real-time data analytics based on the same failing Big Data-era patterns

The paradigm shift toward data mesh, based on how Data converges with other key technology advances:

  • Convergence of data and distributed domain-driven architecture
  • Convergence of data and product thinking.
  • Convergence of data and self-serve platform design.
  • Convergence of data governance and automation.

This requires a new set of governing principles accompanied by new language: 

  • Serving instead of ingesting
  • Discovering and using instead of extracting and loading
  • Publishing events as streams instead of flowing data around via centralized pipelines
  • An ecosystem of data products instead of a centralized data platform.

This article and Zhamak’s book, Data Mesh: Delivering Data Value At Scale, launched a wave of decentralizing data ownership which continues today. Since 2019, within our community we’ve talked with more than 600 enterprise organizations that have implemented elements of data mesh. 

This is a positive change, but it’s only the beginning. The reality is that data mesh has not been implemented fully, and in many cases, it has been implemented poorly, with “me-too” tools making big promises and failing to deliver.

Current trends in AI place even more extreme demands on data products within an enterprise organization. The data rates needed to train and fine-tune LLMs have increased dramatically, while the shape of this data has become steadily more complex. This makes the need for authentic, successful implementations of data mesh even more important. 

Data professionals are well-served by internalizing the first principles and key concepts of data mesh to avoid pitfalls and make the best decisions as they navigate this transition. The following references inspired Zhamak’s early writing on the subject and remain prescient today.

A recommended reading list

Domain-Driven Design: Tackling Complexity in the Heart of Software
Eric Evans
Addison-Weseley (2003-08-20)
https://oreil.ly/T5saX

Previous approaches to software architecture put technology first, while domains were considered second-class concepts. This text challenged the notion of building software based on one canonical model. Instead, DDD moved toward a multi-modal approach based on concepts such as bounded contexts and ubiquitous language. When organizations put domains first, above technology–and put the people working in those domains first–they arrive at a different abstraction for technology. This thinking leads to a new domain-oriented unit of work, which shapes the boundaries for data products.

Thinking in Systems
Donella Meadows
Chelsea Green (2008-12-03)
https://oreil.ly/wbsbw

An important influence for computer science in general, this text describes how to apply elements of systems thinking where a system reaches a state of dynamic equilibrium with information continuously flowing through it. Maintaining dynamic equilibrium requires carefully adjusting system behaviors using leverage points and feedback loops. In data mesh, the governance model uses federated computational governance–continuously pursuing a dynamic equilibrium between localization of decisions (so that domains can go fast) balanced with globalization and centralization of decisions (so that everyone can go far). Chapter 5 in the Data Mesh book explores this in detail.

Design of Everyday Things
Don Norman
Basic Books (1988-01-01)
https://oreil.ly/cotSU

This text calls out affordances as a principle of interaction: “The term affordances refers to the relationship between a physical object and a person (or for that matter, any interacting agent, whether animal or human, or even machines and robots).” We need to think of data products in terms of product design, providing the appropriate affordances for both the humans and the automated agents involved. For example, in GenAI do the data products used for fine-tuning your LLM model include benchmarks, evals, guardrails for bias safety, as well human-in-the-loop feedback when you need to update the model?

Who Can You Trust?
Rachel Botsman
Portfolio Penguin (2017-11-14)
https://rachelbotsman.com/books/

This text launched a radical shift in understanding the mechanics of how trust gets built, managed, lost, or repaired in the digital era. Taking this to heart, how do we establish trust between a human and a data product? We assume data is one thing (e.g., maybe obtained from a data lake) that metadata is another thing (e.g., maybe obtained from a catalog). A data product consumer cannot trust data until they’ve understood its metadata–which means there’s a gap, and that limits trust. Also check out Rachel Botsman’s excellent article “Trust-Thinkers” which cuts to the chase, describing trust quite eloquently as “A confident relationship with the unknown.”

Our Thinking

A selection of videos, articles, and podcasts that gives insights on Data Mesh

Our Company

Introducing Mesh RAG - unified data management for big data and GenAI

Read article

Data Mesh is neither socio nor technical, it’s sociotechnical

Read article

Data mesh applied: a decentralized, connected, and context-aware data supply chain for AI

Read article

Join the movement.

When data empowers everyone, it changes everything.

Let’s change the way data is created, shared, and used, forever.

Get in touch

Jobs

No items found.

Let’s change the way data is created, shared, and used, forever.

Nextdata is hiring. We’re looking for pragmatic, empathetic problem-solvers who understand the needs of tomorrow and dare to challenge the ways of the past.

An error occurred while processing your request. Please check the inputted data and try again.
This is a success message.