Consistent access and delivery with Data Integration

Data integration patterns help create a unified, accurate, and consistent view of enterprise data within an organization. This data is often dissimilar, living in different locations and being stored in a variety of formats. 

The approaches used to achieve data integration goals will depend largely on the Quality-of-Service (QoS) and usage characteristics surrounding different sets of data. A data integration strategy helps to logically—and perhaps also physically—combine different sets of enterprise data sources to expose the data services needed by your organization.

Understanding data integration patterns and using them effectively can help organizations create an effective data integration strategy. In the sections that follow, we will detail these patterns.

Legacy data gateways for microservices



Application architecture evolution has fragmented the backend implementation into independent microservices and functions. However, there's still a gap in the way this evolution has dealt with data because evolution tends to avoid dealing with static environments.

At the same time, microservices encourage developers to create new polyglot data persistence layers that then, need to be composite to deliver business value. How can we apply the knowledge from API gateways to these new data stores?

In this discussion of legacy data, Hugo Guerrero talks about the behavior of data gateways and API gateways, the different data gateway types and their architectures, and the extended data-proxy for hybrid cloud deployments.

 

Pattern 1. Data consolidation

Data consolidation involves designing and implementing a data integration process that feeds a datastore with complete, enriched data. This approach allows for data restructuring, a reconciliation process, thorough cleansing, and additional steps for aggregation and further enrichment.

ETL

Extract, transform, load (ETL)

ETL offloaded the transformation of raw data to usable data from the target datastore. This transformation process can become a bottleneck. In cloud computing, there's no added benefit, such as the reduction in target server loads.

ELT

Extract, load transform (ELT)

ELT  is highly scalable–store as much data as you need in its raw form and get it to the target quickly. No specialized infrastructure for transformation processes is necessary before it lands at its destination.

Pattern 2. Data federation

Data federation uses a pull approach where data is retrieved from the underlying source systems on-demand. This pattern provides real-time access to data. Data federation creates a virtualized view of the data with no data replication or moving of the source system data.

Composite service

Composite service

A composite service implements the aggregator pattern. It combines the data from different, distinct services in a meaningful way and serves this response to the consuming application.

EII

Data virtualization or Enterprise Information Integration (EII)

Data virtualization (Ell) cpmbines large sets of diverse data sources in a way that makes them appear to a data consumer as a single, uniform data source. It uses data abstraction to provide a common data access layer.

Pattern 3. Data propagation

Data propagation involves the promotion of data updates on two levels. At the application level, an event in the source application triggers processing in one or more target applications.  At the datastore level, an event in the source system triggers updates in the source datastore. These change events are then replicated in near real-time to one or more target datastores.

EAI

Enterprise Application Integration (EAI)

EAI is distributed, lightweight, and scalable for elastic operating environments—the integration itself may be deployed as a containerized application.

EDR

Enterprise Data Replication (EDR)

In distributed and microservices architectures, replication allows applications to be more reliable. 

The data service needs may be replicated and colocated with it and stored in a manner that is more usable by that particular service. This reduces overhead and latency.

Camel K logo

Six reasons to love Camel K

Based on the famous Apache Camel, Camel K is designed and optimized for serverless and microservices architectures. In this article, discover six ways that Camel K transforms how developers work with Kubernetes, Red Hat OpenShift, and Knative on cloud platforms.

Read more about Camel K

Data integration common practices

Change data capture

Change data capture detects data change events in a source datastore and triggers an update process in another datastore or system. CDC is usually implemented as trigger-based or log-based. In the trigger-based approach, transaction events are logged in a separate shadow table that can be replayed to copy those events to the target system on a regular basis. Log-based CDC—also known as transaction log tailing—identifies data change events by scanning transaction logs. This approach is often used as it can be applied to many data change scenarios and can support systems with extremely high transaction volumes because of the minimal amount of overhead it involves.

Watch webinar

Event sourcing

Event sourcing is a pattern that makes sure that all changes to an application’s state are stored as a sequence of events. These events can then be used for temporal queries allowing for the reconstruction of past states and activity replay. This pattern is useful for creating audit logs, debugging, and use cases that require the reconstruction of the state at a specific point.

Read more

Streaming data and event stream processing

ESP involves taking action on a series of data points that originate from a system that continuously creates data. In this context, an event is a data point in the system and the stream is the continuous delivery of those events. This series of events is also referred to as streaming data. The types of actions that are taken as a result of these events include aggregations, analytics, transformations, enrichment, and ingestion into another datastore.

Start tutorial

Distributed caching and in-memory data grids

The concept of caching is to provide storage capacity for data on a system that's used to serve future requests more quickly. Data that's stored in cache is placed there because it's frequently accessed or contains duplicated copies of data stored in another datastore. The overarching goal of caching is to improve performance.

Read more

Data integration use cases

data replication

Data replication

CDC can be used for data replication to multiple databases, data lakes, or data warehouses, to ensure each resource has the latest version of the data. In this way, CDC can provide multiple distributed and even siloed teams with access to the same up-to-date data.

  Click here to see a diagram

auditing

Auditing

Facing today's strict data compliance requirements, and heavy penalties for noncompliance, it is essential to save a history of changes made to your data. CDC can be used to save data changes for auditing or archiving requirements.

  Click here to see a diagram

data exchange

Microservice data exchange

CDC can be used to sync microservices with monolithic applications, enabling the seamless transfer of data changes from legacy systems to microservices-based applications.

  Click here to see a diagram

strangler pattern

Mono-to-micro Strangler Pattern

Through an incremental approach, you can take scoped components and move them to a new microservices architecture. Use CDC to stream changes from the monolithic database over to the microservices database and the other way around.

  Click here to see a diagram

DevNation TechTalk logo

Battle of the in-memory data stores

Have you ever wondered what the relative differences are between two of the more popular open source, in-memory data stores, and cachés? The caché is a smaller, faster memory component inserted between the CPU and the main memory that stores its data on disks for retrieval, while in-memory data stores depend on machine memory to store retrievable data.

In this DevNation Tech Talk, the DevNation team describes those differences and more importantly, provides live demonstrations of the key capabilities that could have a major impact on your architectural Java​ application designs.

Data stores vs cachés

Get started with hands-on data integration

Lesson

Change data capture with Debezium

20 minutes | Intermediate

Monitor your change data capture (CDC) events with Debezium, a set of distributed services that identifies row-level changes in your databases so you can respond.

Lesson

Send events to Apache Kafka with Reactive Messaging

25 minutes | Beginner

Create a Quarkus application that uses the MicroProfile Reactive Messaging extension to send events to Apache Kafka. Build real-time streaming data pipelines and streaming applications that transform or react to the streams of data.

Lesson

Get started with Camel Kafka Connectors

20 minutes | Beginner

Widen the scope of possible integrations beyond the external systems supported by Kafka Connect connectors alone.