Automate test and failure analysis via streams for Apache Kafka

In enterprise software testing, teams often rely on multiple platforms for various stages of the quality lifecycle. The ReportPortal excels at real-time test reporting and AI-powered failure analysis, while Polarion ALM serves as the single source of truth for full software testing lifecycle management. The challenge is keeping failure analysis verdicts in sync between these two systems without burdening QE teams with manual data entry.

In this article, I will describe how we built an event-driven synchronization solution using streams for Apache Kafka, Debezium Change Data Capture (CDC), and Quarkus with SmallRye Reactive Messaging to automatically sync failure verdicts from multiple ReportPortal instances to Polarion, achieving eventual consistency with minimal lag and zero manual intervention.

The problem: Two platforms and divergent timelines

QE teams use a routing service to deliver xUnit test results to ReportPortal and Polarion. But there is no single path for results to reach these two systems.

Some teams upload concurrently to both platforms in one request. Other teams upload sequentially, sending results to ReportPortal first for failure analysis, then to Polarion later. Teams can also upload to ReportPortal first and only push to Polarion after analysis is complete.

Even with concurrent uploads, results do not land in both systems at the same time. Each target system has its own processing pipeline, ingestion latency, and import mechanics. The ReportPortal may finish processing a launch while the corresponding Polarion test run is still being created, or vice versa. This means there is no reliable ordering between when a test record appears in Polarion and when the corresponding failure analysis verdict finalizes in the ReportPortal.

This timing mismatch is what makes synchronization fundamentally difficult. The sync solution cannot assume the Polarion test runs already exist when a verdict updates on the ReportPortal, nor can it assume that the ReportPortal analysis happens after the Polarion import. It must handle every ordering of events gracefully.

In addition to this sequencing challenge, failure analysis introduces a human-time dependency:

Test results uploaded to ReportPortal and Polarion (in any order or timing).
Failure analysis happens on ReportPortal either manually by engineers or automatically via test failure analysis (TFA) integration minutes, hours, or even days later.
The analyzed verdict (e.g., product bug, automation bug, system issue, or no defect) and linked defect tickets must get back to Polarion.

Without automation, teams had to manually copy failure verdicts, linked defect URLs, and resolution details from ReportPortal to Polarion for every failed test case across every test run. This was error-prone, time-consuming, and often simply skipped.

Why change the data capture?

We chose the CDC approach because it captures every verdict change at the database level the moment it occurs, with no API limitations, no user intervention, and no polling overhead. It also decouples the sync trigger entirely from the upload path, regardless of whether results were uploaded concurrently, sequentially, or at completely different times, the CDC event fires the instant a verdict updates on the ReportPortal.

We evaluated the following three approaches before settling on CDC for data capture:

Polling-based sync: Periodically querying the ReportPortal APIs for changes. The ReportPortal API doesn't expose issue-level change feeds or webhooks, making this approach inefficient and incomplete.
Manual trigger sync: Requires users to invoke a second CLI command after completing analysis, causing added friction and reliance on user discipline.
CDC-based auto sync: Monitoring the ReportPortal PostgreSQL database directly for verdict changes using Debezium—fully automatic, real-time, and transparent to users.

Architecture overview

This section provides an overview of the architectural framework for the solution. It highlights the structured approach used to ensure scalability, reliability, and efficient data processing. The solution architecture consists of four key layers as follows:

ReportPortal PostgreSQL (multiple instances)
         |
    [ Debezium CDC Connectors on Kafka Connect ]
         |
    [ Red Hat Streams for Apache Kafka ]
         |
    [ Quarkus Microservices with SmallRye Reactive Messaging ]
         |
    [ Polarion REST API ]

Layer 1: CDC capture with Debezium on Kafka Connect

Each ReportPortal instance runs its own PostgreSQL database. We deploy a Debezium PostgreSQL connector per instance on the Kafka Connect platform (managed by Strimzi on Red Hat OpenShift Container Platform). The connector monitors the issue table in each ReportPortal database using PostgreSQL's native logical decoding (pgoutput plug-in).

The key configuration uses Debezium's built-in transformation pipeline:

spec:
  class: io.debezium.connector.postgresql.PostgresConnector
  config:
    database.dbname: reportportal
    table.include.list: public.issue
    plugin.name: pgoutput
    replica.identity.autoset.values: public.issue:FULL
    heartbeat.interval.ms: 1000
    transforms: filter,reroute
    transforms.filter.type: io.debezium.transforms.Filter
    transforms.filter.language: jsr223.groovy
    transforms.filter.condition: "value.op == 'u'"
    transforms.reroute.type: io.debezium.transforms.ByLogicalTableRouter
    transforms.reroute.topic.regex: (.*)public.issue(.*)
    transforms.reroute.topic.replacement: rp-cdc-issue

Two transforms do the heavy lifting:

Filter: A Groovy-based filter captures only update operations (value.op == 'u'), ignoring inserts and deletes. We only care about verdict changes, not initial issue creation.
Reroute: All CDC events from all ReportPortal instances routed to a single rp-cdc-issue Kafka topic. Downstream consumers use the physical table identifier (e.g., reportportal-{team}-debezium) to determine the origin of the change. This identifier preserves the team's identity, indicating which ReportPortal instance generated the data change.

Layer 2: Streams for Apache Kafka as the event backbone

The streams for Apache Kafka (based on the Strimzi operator) provides the event streaming backbone. The Kafka cluster runs in KRaft mode (no ZooKeeper dependency) with a replication factor of 3 and minimum in-sync replicas of 2 for durability.

The architecture uses six Kafka topics to coordinate between microservices:

rp-cdc-issue Producer: Debezium; Consumer: Sync Connector; Purpose: CDC events from RP issue table
rp-fail-record-sink Producer: RP Connector; Consumer: Record Updater; Purpose: Failure records with RP item details
polar-test-run-sink Producer: Polar Processor; Consumer: Record Updater; Purpose: New Polarion test run metadata
polar-feedback-sink Producer: Polar Connector; Consumer: Sync Connector; Record Updater; Purpose: Polarion test run URL feedback
rp-polar-item-sync Producer: Sync Connector; Consumer: Record Updater; Purpose: Individual test item sync results
rp-polar-sync-status Producer: Sync Connector; Consumer: Record Updater; Purpose: Overall test run sync completion

This topic-per-concern design decouples the data flow. Each microservice reads from and writes to specific topics, enabling independent scaling and failure isolation. Critically, Kafka acts as a buffer that absorbs the timing differences between the ReportPortal and Polarion pipelines. Events are durably stored and processed whenever the downstream consumer is ready.

Layer 3: Quarkus and SmallRye Reactive Messaging

The microservices built with Quarkus 3.x and Java 21 use SmallRye Reactive Messaging (the Quarkus implementation of the MicroProfile Reactive Messaging specification) as the bridge between Kafka topics and application logic. This combination makes event-driven development easy, dramatically reducing the boilerplate of building Kafka-driven services.

Annotation-driven Kafka consumers

Instead of writing Kafka consumer loops, poll configurations, and offset management code, SmallRye Reactive Messaging allows us to declare Kafka consumers with a single @Incoming annotation. Here is the CDC event consumer that drives the entire sync:

@ApplicationScoped
public class Import {
    @Incoming("rp-cdc-issue")
    @Transactional
    public CompletionStage<Void> cdcConsume(
            Record<CDCKeyContainer, CDCValueContainer> containerRecord) {
        // Extract team identity from Debezium physical table identifier
        String rpTeam = containerRecord.key().payload
            .dbzPhysicalTableIdentifier.split("-debezium")[0];
        Issue after = containerRecord.value().payload.after;
        // Only process updates with actual verdicts
        if (!value.payload.op.equals("u")) return null;
        if (!verdictList.contains(after.issueType)) return null;
        // Look up matching Polarion test case and sync
        RPPolarSync rpLaunchInfo = rpPolarUtils
            .getRPLaunchWithTeamAndItemID(rpTeam, after.issueID);
        PolarTestRun polarTestRun = rpPolarUtils
            .getPolarTestRun(rpLaunchInfo);
        consumeUtil.processCDC(polarTestRun, after.issueID,
            after.issueType, after.issueDescription, rpLaunchInfo, ...);
        return CompletableFuture.completedFuture(null);
    }
}

The framework handles consumer group management, offset commits, deserialization, and error routing. Our code focuses entirely on business logic.

Multiple channels in a single service

A single Quarkus microservice can consume from and produce to multiple Kafka topics simultaneously using separate @Incoming and @Channel annotations. The sync connector consumes from three topics and produces to two as follows:

// Consuming from three different Kafka topics in the same service
@Incoming("rp-cdc-issue")           // Debezium CDC events
public CompletionStage<Void> cdcConsume(...) { ... }
@Incoming("polar-feedback-sink")    // Polarion feedback events
public void polarConsume(...) { ... }
@Incoming("rp-fail-record-sink")    // RP failure records
public void rpFailRecordConsume(...) { ... }
// Producing to two outgoing topics using @Channel emitters
@Channel("rp-polar-item-sync")
Emitter<RPPolarSync> rpPolarItemSyncEmitter;
@Channel("rp-polar-sync-status")
Emitter<RPPolarSync> rpPolarSyncEmitter;

Each channel maps to its Kafka topic through configuration properties, keeping the wiring declarative and the code clean.

Declarative channel configuration

To define all Kafka channel bindings in application.properties, use the MicroProfile Reactive Messaging naming convention. Each channel gets its own connector type, consumer group, serializer/deserializer, and failure strategy, as shown in this snippet:

# Incoming CDC channel with dead-letter queue for failed messages
mp.messaging.incoming.rp-cdc-issue.connector=smallrye-kafka
mp.messaging.incoming.rp-cdc-issue.group.id=rp-polar-sync
mp.messaging.incoming.rp-cdc-issue.failure-strategy=dead-letter-queue
mp.messaging.incoming.rp-cdc-issue.key.deserializer=com.example.CDCKeyDeserializer
mp.messaging.incoming.rp-cdc-issue.value.deserializer=com.example.CDCValueDeserializer
# Outgoing sync status channel
mp.messaging.outgoing.rp-polar-sync-status.connector=smallrye-kafka
mp.messaging.outgoing.rp-polar-sync-status.value.serializer=
    io.quarkus.kafka.client.serialization.ObjectMapperSerializer

The failure-strategy=dead-letter-queue setting automatically routes messages that fail deserialization or processing to a dead-letter topic, preventing a single bad message from blocking the entire consumer. You don't need a custom error-handling infrastructure.

Built-in health checks

With quarkus-smallrye-health and health-enabled=true on each channel, Quarkus automatically exposes Kafka connectivity status through Kubernetes readiness and liveness probes. If a consumer loses connection to a Kafka topic, it marks the pod as unhealthy, and Kubernetes handles the restart.

Quarkus Scheduler for retry logic

Because results land in the ReportPortal and Polarion at different times and in an unpredictable order, a CDC event may arrive before the corresponding Polarion test run exists in the sync database. Quarkus provides @Scheduled annotation support for handling this timing mismatch. The auto-finalization retry scheduler periodically reprocesses CDC events that could not be matched when they first arrived.

@ApplicationScoped
public class SyncRetryScheduler {
    @Scheduled(every = "300s", identity = "currentItemCompleter")
    public void currentItemCompleter() {
        // Retry recent unmatched verdicts (last 24 hours)
        Result<Record> staleTeamIssues = rpPolarUtils
            .getStaleTeamIssues(now.minusHours(24), now.minusSeconds(30));
        updateStaleRecords(staleTeamIssues, false);
    }
    @Scheduled(cron = "{sync.retry.cron.expr}",
               identity = "oldItemCompleter")
    public void oldItemCompleter() {
        // Daily sweep for older unsynced verdicts (last 7 days)
        Result<Record> staleTeamIssues = rpPolarUtils
            .getStaleTeamIssues(now.minusDays(7), now.minusHours(24));
        updateStaleRecords(staleTeamIssues, true);
    }
}

GraalVM native images for production

Compiling all microservices as GraalVM native images for production deployment provides fast startup times and small memory footprints, which is important for scaling a service to respond to CDC events with low latency. The Quarkus build handles GraalVM native image configuration, including reflection registration for Jackson-serialized DTOs as follows:

@RegisterForReflection
@JsonIgnoreProperties(ignoreUnknown = true)
public class CDCValueContainer {
    public Payload payload;
}

The Quarkus development advantage

The combination of Quarkus + SmallRye Reactive Messaging reduced the amount of infrastructure code we had to write. The following lists what the framework handles for us versus what we had to build:

Kafka consumer lifecycle
- Framework-provided: SmallRye manages poll loops, offsets, rebalancing
- Custom code: None
Error handling
- Framework-provided: Dead-letter queues, deserialization failure handlers
- Custom code: Business-level error routing
Health checks
- Framework-provided: Auto-exposed per channel
- Custom code: Custom DB health check
Serialization
- Framework-provided: ObjectMapperSerializer
- Custom code: Custom Debezium CDC deserializers
Scheduled tasks
- Framework-provided: @Scheduled with cron support
- Custom code: Retry business logic
REST clients
- Framework-provided: Quarkus REST Client + Fault Tolerance
- Custom code: Polarion/RP API call logic
Native compilation
- Framework-provided: GraalVM build plug-in
- Custom code: @RegisterForReflection annotations

Layer 4: Polarion REST API integration

The sync connector updates Polarion through its REST API, setting the following:

Test record result (e.g., passed, failed, or blocked)
Resolution ticket as rich text HTML links to Jira/GitHub issues, extracted from the verdict description
Comment with the full verdict description from ReportPortal
RP Launch URL on the test run for cross-reference back to ReportPortal
Work item status (e.g., setting automation bugs to needsupdate)

How to handle the sequencing problem

The core design challenge is that events from the ReportPortal pipeline and the Polarion pipeline arrive in an unpredictable order. A CDC verdict event may arrive before revealing the Polarion test run URL, or a Polarion feedback event may arrive before the ReportPortal launch record persisted. The solution handles this through several mechanisms:

Correlation database: A shared PostgreSQL database maps ReportPortal launches, test items, and issue IDs to Polarion test runs and test cases. The record updater and sync connector write to and read from this database as events arrive from either side.
Exponential backoff retry: When a Polarion feedback event arrives before the corresponding Polarion test run record is ready, the sync connector retries with exponential backoff (100ms to 128s) before giving up.
Stale record scheduler: CDC events that arrive before their Polarion counterparts exist are saved to a tracking table. The @Scheduled jobs sweep every 5 minutes for recent unmatched records and daily for older ones, retrying the sync when both sides are finally available.
Idempotent updates: All Polarion REST API updates are idempotent, so replaying a verdict sync produces the same result regardless of how many times it runs.

This combination means the system converges to a consistent state regardless of which pipeline finishes first or the time difference between the two uploads.

Scale to multiple ReportPortal instances

One of the key advantages of using Kafka Connect for CDC is how easily it scales to multiple source databases. Each ReportPortal instance gets its own Debezium connector deployed as a KafkaConnector custom resource on the OpenShift Container Platform.

apiVersion: kafka.strimzi.io/v1beta2
kind: KafkaConnector
metadata:
  name: reportportal-team-a-debezium-connector
  labels:
    strimzi.io/cluster: my-connect-cluster
spec:
  class: io.debezium.connector.postgresql.PostgresConnector
  config:
    database.hostname: reportportal-team-a-postgresql.svc
    database.dbname: reportportal
    topic.prefix: reportportal-team-a-debezium
    table.include.list: public.issue
    plugin.name: pgoutput
    transforms: filter,reroute
    transforms.filter.type: io.debezium.transforms.Filter
    transforms.filter.language: jsr223.groovy
    transforms.filter.condition: "value.op == 'u'"
    transforms.reroute.type: io.debezium.transforms.ByLogicalTableRouter
    transforms.reroute.topic.regex: (.*)public.issue(.*)
    transforms.reroute.topic.replacement: rp-cdc-issue

Adding a new ReportPortal instance to the sync requires only deploying a new KafkaConnector resource with the instance's database connection details and registering the team in the sync database.

There are no code changes needed. The reroute transform sends all events to the same rp-cdc-issue topic, and the team name embedded in the Debezium topic prefix allows downstream consumers to identify the source instance. In our production environment, we run multiple connectors monitoring separate ReportPortal instances, all feeding into the same pair of Quarkus sync microservices.

Auto-finalized verdicts

Not all verdicts come from humans. TFA, ReportPortal's built-in automation analyzer, and AI agent analysis comments can auto-classify failures. The sync connector detects these by checking the autoAnalyzed flag in the CDC event (set by Automation Analyzer) and a known marker string in the issue description (set by TFA or AI Agent).

Auto-finalized verdicts are especially susceptible to the sequencing problem, since automated analysis often completes before fully registering the Polarion test run. It saves these verdicts to a tracking table, and the scheduled jobs retries them until the corresponding Polarion test run becomes available.

Eventual consistency in practice

In practice, when both pipelines complete within a normal window, the latency between a user updating a verdict on ReportPortal and the updating of the corresponding Polarion record is typically under a few seconds. When uploads are separated by longer intervals, the system catches up as soon as both sides are available.

The event-driven architecture guarantees eventual consistency between ReportPortal and Polarion, regardless of upload path or timing:

Durability: Kafka's replication (factor=3, min ISR=2) prevents the loss of CDC events.
Ordering: Debezium preserves the order of changes per database row.
Convergence: The correlation database, exponential backoff retries, and scheduled sweeps ensure both sides eventually match up, no matter which pipeline finishes first.
Dead-letter queues: SmallRye Reactive Messaging routes poison messages to dead-letter topics automatically, preventing a single bad event from blocking the pipeline.
Idempotency: Polarion REST API updates are idempotent. Replaying the same verdict update produces the same result.
Completion tracking: The connector tracks sync status per test item and marks the Polarion test run as finished only when it syncs all items.

The test run lifecycle with sync

With sync enabled, the Polarion test run lifecycle changes to accommodate the asynchronous verdict flow.

1. Submit test results with sync enabled
        |
2. Results routed to RP and Polarion (concurrent or sequential)
   - Results persist into each system at different times
   - Polarion test run created with status: "inprogress"
   - Failed test cases set to "Waiting" status
        |
3. Correlation records built as events arrive from both pipelines
   - RP failure records, Polarion test run URLs, test case mappings
   - Timing differences handled by retries and scheduled sweeps
        |
4. Failure analysis on ReportPortal
   - TFA or AI Agent auto-classifies known failures
   - Engineers analyze remaining failures
        |
5. Debezium captures each verdict change in real-time
        |
6. Quarkus Sync Connector processes CDC event
   - Maps RP verdict to Polarion result
   - "Waiting" -> "Passed" / "Failed" / "Blocked"
   - Adds linked Jira/GitHub defect URLs
   - Tags RP test item with sync status
        |
7. All items synced -> Test run marked "finished"

Deployment on OpenShift

Environment promotion follows a standard dev → stage → prod pipeline, with ArgoCD auto-sync in dev/stage and manual sync gating for production. The entire solution runs on OpenShift, managed by ArgoCD.

Strimzi operator manages the Kafka cluster (KRaft mode) and Kafka Connect deployment with Debezium plug-ins.
Debezium connectors are deployed as KafkaConnector CRDs—one per ReportPortal instance, all managed declaratively.
Quarkus microservices built as GraalVM native images for fast startup and low memory footprint.
Horizontal pod autoscaler scales the sync connector and record updater based on load.
OpenTelemetry provides distributed tracing across the event pipeline with trace IDs propagated through Kafka message headers.

Key takeaways

CDC eliminates polling and API limitations. By capturing changes at the database WAL level, Debezium bypasses ReportPortal's API constraints and provides real-time change detection with zero impact on application performance. Kafka Connect simplifies multi-source CDC. This event-driven design absorbs timing differences. By using Kafka as a durable buffer and a correlation database with scheduled retries, the system handles any ordering of events between the two platforms and converges to consistency regardless of upload path.

Quarkus and SmallRye Reactive Messaging makes event-driven development productive. Annotation-driven consumers, declarative channel configuration, built-in dead-letter queues, and auto health checks allow us to focus on business logic instead of messaging infrastructure. The same @Incoming/@Channel pattern works across all our microservices.

GraalVM native images keep latency low. Fast startup and small memory footprints mean our sync services respond to CDC events quickly and scale efficiently under Kubernetes HPA. Streams for Apache Kafka provides production-grade infrastructure. KRaft mode, Strimzi operator management, and OpenShift-native deployment give us a reliable event backbone without the operational overhead of managing Kafka directly.

Automate test and failure analysis via streams for Apache Kafka

The problem: Two platforms and divergent timelines

Why change the data capture?

Architecture overview

Layer 1: CDC capture with Debezium on Kafka Connect

Layer 2: Streams for Apache Kafka as the event backbone

Layer 3: Quarkus and SmallRye Reactive Messaging

Annotation-driven Kafka consumers

Multiple channels in a single service

Declarative channel configuration

Built-in health checks

Quarkus Scheduler for retry logic

GraalVM native images for production

The Quarkus development advantage

Layer 4: Polarion REST API integration

How to handle the sequencing problem

Scale to multiple ReportPortal instances

Auto-finalized verdicts

Eventual consistency in practice

The test run lifecycle with sync

Deployment on OpenShift

Key takeaways

Red Hat Dependency Analytics works with your private Trusted Profile Analyzer instance!

Understanding ApplicationSets - Generators (Part 2)

Benchmark Red Hat Data Grid in OpenShift 4 using Hyperfoil

Layered sandboxing for AI agents: OpenShift and OpenShell

How obs-mcp boosts AI-native OpenShift observability

Building Reactive Microservices in Java: Asynchronous and Event-Based Application...

Platforms

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links