Solutions, use cases, and case studies
A hot topic in enterprise software, data mesh is a new approach to thinking about data based on a distributed architecture for data management. The idea is to make data more accessible and available to business users by directly connecting data owners, data producers, and data consumers. Data mesh aims to improve business outcomes of data-centric solutions as well as drive adoption of modern data architectures.
From the business point of view, data mesh introduces new ideas around “data product thinking.” In other words, thinking about data as a product that fulfils a “job to be done”, for example, to improve decision-making, help detect fraud or alert the business to changes in supply chain conditions. To create high-value data products, companies must address culture and mindset shifts and commit to a more cross-functional approach to business domain modeling.
From the technology side, Oracle’s view on the data mesh involves three important new focus areas for data-driven architecture:
Other important concerns such as self-service tooling for non-technical users and strong federated data governance models are just as important for data mesh architecture as they are for other, more centralized and classical data management methodologies.
A data mesh approach is a paradigm shift to thinking about data as a product. Data mesh introduces organizational and process changes that companies will need to manage data as a tangible capital asset of the business. Oracle’s perspective for the data mesh architecture calls for alignment across organizational and analytic data domains.
A data mesh aims to link data producers directly to business users and, to the greatest degree possible, remove the IT middleman from the projects and processes that ingest, prepare, and transform data resources.
Oracle’s focus on data mesh has been in providing a platform for our customers that can address these emerging technology requirements. This includes tools for data products; decentralized, event-driven architectures; and streaming patterns for data in motion. For data product domain modeling and other sociotechnical concerns, Oracle aligns with the work being done by the thought leader in data mesh, Zhamak Dehghani.
Investing in a data mesh can yield impressive benefits, including:
Data mesh is still in the early stages of market maturity. So while you may see a variety marketing content about a solution that claims to be “data mesh,” often these so-called data mesh solutions don’t fit the core approach or principles.
A proper data mesh is a mindset, an organizational model, and an enterprise data architecture approach with supporting tools. A data mesh solution should have some mix of data product thinking, decentralized data architecture, domain-oriented data ownership, distributed data-in-motion, self-service access and strong data governance.
A data mesh is not any of the following:
Oracle is a Leader in the Forrester Wave report on Enterprise Data Fabric, Q2 2020
The sad truth is that the monolithic data architectures of the past are cumbersome, expensive, and inflexible. Over the years, it’s become clear that most of the time and costs for digital business platform from applications to analytics are sunk into integration efforts. Consequently, most platform initiatives fail.
While data mesh is not a silver bullet for centralized, monolithic data architectures, the principles, practices, and technologies of the data mesh strategy are designed to solve some of the most pressing and unaddressed, modernization objectives for data-driven business initiatives.
Some of the technology trends that led to the emergence of data mesh as a solution include:
To learn more about why data mesh is needed today, read Zhamak Dehghani’s original 2019 paper: How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh.
The decentralized strategy behind data mesh aims to treat data as a product by creating a self-service data infrastructure to make data more accessible to business users.
When the theory moves to practice it is necessary to deploy enterprise class solutions for mission-critical data; that’s where Oracle can provide a range of trusted solutions to power up an enterprise data mesh.
Data mesh is more than just a new tech buzzword. It is newly emerging set of principles, practices, and technology capabilities that make data more accessible and discoverable. The data mesh concept distinguishes itself from prior generations of data integration approaches and architectures by encouraging a shift away from the giant, monolithic enterprise data architectures of the past, towards a modern, distributed, decentralized data-driven architecture of the future. The foundation of the data mesh concept involves the following key attributes:
A mindset shift is the most important first step toward a data mesh. The willingness to embrace the learned practices of innovation is the springboard towards successful modernization of data architecture.
These learned practice areas include:
Design thinking methodologies bring proven techniques that help break down the organizational silos frequently blocking cross-functional innovation. The jobs to be done theory is the critical foundation for designing data products that fulfill specific end-consumer goals—or jobs to be done—it defines the product’s purpose.
Although the data product approach initially emerged from the data science community, it is now being applied to all aspects of data management. Instead of building monolithic technology architectures, data mesh focuses on the data consumers and the business outcomes.
While data product thinking can be applied to other data architectures, it is an essential part of a data mesh. For pragmatic examples of how to apply data product thinking, the team at Intuit wrote a detailed analysis of their experiences.
Products of any kind—from raw commodities to items at your local store—are produced as assets of value, intended to be consumed, and have a specific job to be done. Data products can take a variety of forms, depending on the business domain or problem to be solved, and may include:
A data product is created for consumption, typically owned outside of IT and requires tracking of additional attributes, such as:
Decentralized IT systems are a modern reality, and with the rise of SaaS applications and public cloud infrastructure (IaaS), the decentralization of applications and data is here to stay. Application software architectures are shifting away from the centralized monoliths of the past to distributed microservices (a service mesh). Data architecture will follow the same trend toward decentralization, with data becoming more distributed across a wider variety of physical sites and across many networks. We call this a data mesh.
A mesh is a network topology that enables a large group of nonhierarchical nodes to work together collaboratively.
Some common tech examples include:
Data mesh is aligned to these mesh concepts and provides a decentralized way of distributing data across virtual/physical networks and across vast distances. Legacy data integration monolithic architectures, such as ETL and data federation tools—and even more recently, public cloud services, such as AWS Glue—require a highly centralized infrastructure.
A complete data mesh solution should be capable of operating in a multicloud framework, potentially spanning from on-premises systems, multiple public clouds, and even to edge networks.
In a world where data is highly distributed and decentralized, the role of information security is paramount. Unlike highly centralized monoliths, distributed systems must delegate out the activities necessary to authenticate and authorize various users to different levels of access. Securely delegating trust across networks is hard to do well.
Some considerations include:
Security within any IT system can be difficult, and it is even more difficult to provide high security within distributed systems. However, these are solvable problems.
A core tenet of data mesh is the notion of distribution of ownership and responsibility. The best practice is to federate the ownership of data products and data domains to the people in an organization who are closest to the data. In practice, this may align to source data (for example, raw data-sources, such as the operational systems of record / applications) or to the analytic data (for example, typically composite or aggregate data formatted for easy consumption by the data consumers). In both cases, the producers and consumers of the data are often aligned to business units rather than to IT organizations.
Old ways of organizing data domains often fall into the trap of aligning with the technology solutions , such as ETL tools, data warehouses, data lakes or the structural organization of a company (human resources, marketing and other lines of business). However, for a given business problem the data domains are often best aligned to the scope of the problem being solved, the context of a particular business process, or the family of applications in a specific problem area. In large organizations, these data domains usually cut across the internal organizations and technology footprints.
The functional decomposition of data domains takes on an elevated, first-class priority in the data mesh. Various data decomposition methodologies for domain modeling can be retrofit to data mesh architecture including classical data warehouse modeling (such as Kimball and Inmon) or data vault modeling, but the most common methodology currently being tried in data mesh architecture is domain-driven design (DDD). The DDD approach emerged from microservices functional decomposition and is now being applied in a data mesh context.
An important area where Oracle has added to the data mesh discussion is to elevate the importance of data in motion as a key ingredient to a modern data mesh. Data in motion is a fundamentally essential to take data mesh out of the legacy world of monolithic, centralized, batch-processing. The capabilities of data in motion answer several core data mesh questions, such as:
These questions are not just a matter of “implementation details” they are centrally important to the data architecture itself. A domain-driven design for static data will use different techniques and tools than a dynamic, data in motion process of the same design. For example, in dynamic data architectures, the data ledger is the central source of truth for data events.
Ledgers are a fundamental component of making a distributed data architecture function. Just as with an accounting ledger, a data ledger records the transactions as they happen.
When we distribute the ledger, the data events become “replayable” in any location. Some ledgers are a bit like an airplane flight recorder that is used for high availability and disaster recovery.
Unlike centralized and monolithic datastores, distributed ledgers are purpose-built to keep track of atomic events and/or transactions that happen in other (external) systems.
A data mesh is not just one single kind of ledger. Depending on the use cases and requirements, a data mesh can make use of different types of event-driven data ledgers, including the following:
Together, these ledgers can act as a sort of durable event log for the whole enterprise, providing a running list of data events happening on systems of record and systems of analytics.
Polyglot data streams are more prevalent than ever. They vary by event types, payloads, and different transaction semantics. A data mesh should support the necessary stream types for a variety of enterprise data workloads.
Simple events:
- Base64 / JSON—raw, schemaless events
- Raw telemetry—sparse events
Basic app logging /Internet of Things (IoT) events:
- JSON/Protobuf— may have schema
- MQTT—IoT-specific protocols
Application business process events:
- SOAP/REST events—XML/XSD, JSON
- B2B—exchange protocols and standards
Data events/transactions:
- Logical change records—LCR, SCN, URID
- Consistent boundaries—commits versus operations
Stream processing is how data is manipulated within an event stream. Unlike “lambda functions,” the stream processor maintains statefulness of dataflows within a particular time window and can apply much more advanced analytic queries on the data.
Basic data filtering:
Simple ETL:
CEP and complex ETL:
Stream analytics:
Of course, there are more than just three attributes of a data mesh. We’ve focused on the three above as a way to bring attention to attributes that Oracle believes are some of the new and unique aspects of the emerging modern data mesh approach.
Other important data mesh attributes include:
A successful data mesh fulfills use cases for operational as well as analytic data domains. The following seven use cases illustrate the breadth of capabilities that a data mesh brings to enterprise data.
By integrating real-time operational data and analytics, companies can make better operational and strategic decisions.MIT Sloan School of Management
Looking beyond 'lift and shift' migrations of monolithic data architectures to the cloud, many organizations also seek to retire their centralized applications of the past and move toward a more modern microservices application architecture.
But legacy applications monoliths typically depend on massive databases, raising the question of how to phase the migration plan to decrease disruption, risks, and costs. A data mesh can provide an important operational IT capability for customers doing phased transitions from monoliths to mesh architecture. For example:
In the lingo of microservices architects, this approach is using a bidirectional transaction outbox to enable the strangler fig migration pattern , one bounded context at a time.
Business-critical applications require very high KPIs and SLAs around resiliency and continuity. Regardless of whether these applications are monolithic, microservices or something in between, they can’t go down!
For mission-critical systems, a distributed eventual-consistency data model is usually not acceptable. However, these applications must operate across many data centers. This begs the business continuity question, “How can I run my apps across more than one data center while still guaranteeing correct and consistent data”
Regardless of whether the monolithic architectures are using ‘sharded datasets’ or the microservices are being set up for cross-site high availability, the data mesh offers correct, high-speed data at any distance.
A data mesh can provide the foundation for decentralized, yet 100% correct data across sites. For example:
A modern, service mesh–style platform uses events for data interchange. Rather than depending on batch processing in the data tier, data payloads flow continuously when events happen in the application or datastore.
For some architectures, microservices need to exchange data payloads with each other. Other patterns require interchange between monolithic applications or datastores. This begs the question, “How can I reliably exchange microservice data payloads among my apps and datastores?”
A data mesh can supply the foundation technology for microservices-centric data interchange. For example:
Microservices patterns, such as event sourcing, CQRS, and transaction outbox, are commonly understood solutions; a data mesh provides the tooling and frameworks to make these patterns repeatable and reliable at scale.
Beyond microservice design patterns, the need for enterprise integration extends to other IT systems, such as databases, business processes, applications, and physical devices of all types. A data mesh provides the foundation for integrating data in motion.
Data in motion is typically event-driven. A user action, a device event, a process step, or a datastore commit can all initiate an event with a data payload. These data payloads are crucial for integrating Internet of Things (IoT) systems, business processes and databases, data warehouses, and data lakes.
A data mesh supplies the foundation technology for real-time integration across the enterprise. For example:
Large organizations will naturally have a mix of old and new systems, monoliths and microservices, operational and analytic data stores; a data mesh can help to unify these resources across differing business and data domains.
Analytic data stores may include data marts, data warehouses, OLAP cubes, data lakes, and data lakehouse technologies.
Generally speaking, there are only two ways to bring data into these analytic datastores:
A data mesh provides the foundation for a streaming data ingest capability. For example:
Ingesting events by stream can reduce the impact on the source systems, improve the fidelity of the data (important for data science), and enable real-time analytics.
Once ingested into the analytic datastores, there is usually a need for data pipelines to prepare and transform the data across different data stages or data zones. This data refinement process is often needed for the downstream analytic data products.
A data mesh can provide an independently governed data pipeline layer that works with the analytic datastores, providing the following core services:
These data pipelines should be capable of working across different physical datastores (such as marts, warehouses, or lakes) or as a “pushdown data stream” within analytic data platforms that support streaming data, such as Apache Spark and other data lakehouse technologies.
Events are continuously happening. The analysis of events in a stream can be crucial for understanding what is happening from moment to moment.
This kind of time-series-based analysis of real-time event streams may be important for real-world IoT device data and for understanding what is happening in your IT data centers or across financial transactions, such as fraud monitoring.
A full-featured data mesh will include the foundational capabilities to analyze events of all kinds, across many different types of event time windows. For example:
Like data pipelines, the streaming analytics may be capable of running within established data lakehouse infrastructure, or separately, as cloud native services.
Those at the leading edge of data integration are seeking real-time operational and analytical data integration from a diverse collection of resilient datastores. Innovations have been relentless and fast as data architecture evolves into streaming analytics. Operational high availability has led to real-time analytics, and data engineering automation is simplifying data preparation, enabling data scientists and analysts with self-service tools.
Build an operational and analytical mesh across the whole data estate
Putting all these data management capabilities to work into a unified architecture will impact every data consumer. A data mesh will help improve your global systems of record and systems of engagement to operate reliably in real time, aligning that real-time data to line-of-business managers, data scientists, and your customers. It also simplifies data management for your next-generation microservice applications. Using modern analytical methods and tools, your end users, analysts, and data scientists will be even more responsive to customer demand and competitive threats. To read about a well-documented example, see Intuit‘s goals and results.
Benefit from a data mesh on point projects
As you adopt your new data product mindset and operational model it is important to develop experience in each of these enabling technologies. On your data mesh journey, you can achieve incremental benefits by evolving your fast data architecture into streaming analytics, leveraging your operational high-availability investments into real-time analytics, and providing real-time, self-service analytics for your data scientists and analysts.
Data fabric | App-Dev integration | Analytic data store | |||||
---|---|---|---|---|---|---|---|
Data mesh | Data integration | Metacatalog | Microservices | Messaging | Data lakehouse | Distributed DW | |
People, process, and methods: | |||||||
Data product focus | available |
available |
available |
1/4 offering |
1/4 offering |
3/4 offering |
3/4 offering |
Technical architecture attributes: | |||||||
Distributed Architecture | available |
1/4 offering |
3/4 offering |
available |
available |
1/4 offering |
3/4 offering |
Event Driven Ledgers | available |
not available |
1/4 offering |
available |
available |
1/4 offering |
1/4 offering |
ACID Support | available |
available |
not available |
not available |
3/4 offering |
3/4 offering |
available |
Stream Oriented | available |
1/4 offering |
not available |
not available |
1/4 offering |
3/4 offering |
1/4 offering |
Analytic Data Focus | available |
available |
available |
not available |
not available |
available |
available |
Operational Data Focus | available |
1/4 offering |
available |
available |
available |
not available |
not available |
Physical & Logical Mesh | available |
available |
not available |
1/4 offering |
3/4 offering |
3/4 offering |
1/4 offering |
Faster, data-driven innovation cycles
Reduced costs for mission-critical data operations
Multicloud data liquidity
- Unlock data capital to flow freely
Real-time data sharing
- Ops-to-Ops and Ops-to-analytics
Edge, location-based data services
- Correlate IRL device/data events
Trusted microservices data interchange
- Event sourcing with correct data
- DataOps and CI/CD for data
Uninterrupted continuity
- >99.999% up-time SLAs
- Cloud migrations
Automate and simplify data products
- Multi-model data sets
Time series data analysis
- Deltas/changed records
- Event-by-event fidelity
Eliminate full data copies for operational datastore
- Log-based ledgers and pipelines
Distributed data lakes and warehouses
- Hybrid/multicloud/global
- Streaming integration / ETL
Predictive analytics
- Data monetization, new data services for sale
Digital transformation is very, very hard, and unfortunately, most companies will fail at it. Over the years, technology, software design, and data architecture are becoming increasingly more distributed, as modern techniques move away from highly centralized and monolithic styles.
Data mesh is a new concept for data—a deliberate shift toward highly distributed and real-time data events, as opposed to monolithic, centralized, and batch-style data processing. At its core, data mesh is a cultural mindset shift to put the needs of data consumers first. It is also a real technology shift, elevating the platforms and services that empower a decentralized data architecture.
Use cases for data mesh encompass both operational data and analytic data, which is one key difference from conventional data lakes/lakehouses and data warehouses. This alignment of operational and analytic data domains is a critical enabler for the need to drive more self-service for the data consumer. Modern data platform technology can help to remove the middleman in connecting data producers directly to data consumers.
Oracle has long been the industry leader in mission critical-data solutions and has fielded some of the most modern capabilities to empower a trusted data mesh: