In this article, we will discuss how Red Hat OpenShift Service Mesh 3 facilitates advanced traffic management, observability and security policies. As microservices have become the standard for modern applications, we’ve found that with great flexibility, comes great complexity. What starts as a simple design of a few independent services can quickly grow into a tangled web of communication.
It’s a common challenge. As your architecture scales, so do the difficulties in managing routing, securing, and gaining visibility into your system. This ever-growing network of service-to-service communication is where many teams find themselves spending a surprising amount of time and effort.
When services wrote the rules
Imagine the last time you ordered food on an app like Uber Eats. From your perspective, it’s a simple, seamless experience. You browse, tap the order button, and a short time later, a delivery driver is at your door. You didn’t have to worry about the restaurant’s schedule, how the driver was routed through city traffic, or whether your payment was handled securely.
Before the modern food delivery platform, every restaurant had to be its own delivery service. This do-it-yourself model is a perfect analogy of how microservices were managed in the past. It led to a chaotic, inefficient, and often insecure system.
Let’s look at how this translates to the microservices world:
- Security: In the old delivery model, you had to trust every individual restaurant with your credit card details. This was a major security risk. Because each restaurant had its own isolated payment system, it created countless vulnerable endpoints. A small, local restaurant might not have the resources to implement the same level of security and encryption as a major payment processor. A single breach at one of these restaurants could expose thousands of customer credit card numbers, and there would be no way to track which restaurant was responsible. This lack of centralized security meant the entire system was only as strong as its weakest link. This mirrors the microservice problem of every service having to write its own code to secure communication and manage credentials—a tedious, error-prone, and inconsistent approach.
- Traffic management: A popular new restaurant couldn’t handle a huge surge in orders without hiring more drivers, leading to long waits. There was no central system to balance the load. This means every service has to handle its own resilience, with developers writing custom code for retries and circuit breakers to prevent cascading failures.
- Observability: If your order was late, you had no idea where the driver was or what the holdup was. You just had to call the restaurant and hope they could help. In the microservices world, this is like having a request get lost between services, with no single dashboard to trace its journey. Debugging becomes a frustrating ping-pong of logs between different teams.
This decentralized approach led to fragmented, fragile, and difficult-to-manage applications, leaving a critical question unanswered. How can we build a unified, cohesive system out of this chaos? How do you ensure that all communication is secure, that you can see where a delay is happening, and that a single overloaded service doesn’t cause a domino effect of failures?
This platform puts it together
This is where a service mesh comes in - a smart, invisible network that handles all the hard parts for you. The real trick is that it places a tiny, intelligent helper right next to every one of your services. This is called Envoy, a sidecar powered by a high-performance proxy. Your service simply focuses on its core job, while the sidecar handles all network communication on its behalf, doing all the heavy lifting behind the scenes.
OpenShift Service Mesh 3.0 (OSSM 3) is a powerful implementation of a service mesh, designed to streamline your microservices and make them more resilient. It’s a dedicated, invisible infrastructure layer that handles all the messy work. With this new platform, you gain the benefits of a large, professional delivery service for all your applications.
The control plane is the brain of the operation, running a component called Istiod. It’s the central authority that takes your high-level rules and translates them into thousands of tiny, specific configurations for each sidecar.
The shift to the OSSM 3 new architecture marks a significant change for existing Red Hat OpenShift users. While previous versions were based on the Maistra.io project, a Red Hat-specific distribution of Istio, OSSM 3 aligns directly with upstream Istio.io. This strategic pivot is accompanied by a new, streamlined operator called Sail, which focuses exclusively on managing the core service mesh components while allowing observability tools managed independently. In essence, it simplifies the architecture to provide a more modern, flexible, and future-proof service mesh solution for the OpenShift platform. Most importantly, OSSM 3 brings native support for complex, multi-primary, multi-network topologies, enabling the creation of a single, secure, and resilient application network that spans across multiple clusters. But I’ll get into that later.
In this analogy:
- Your Service is the Restaurant: It cooks the food and handles the business logic, like processing a payment or managing a menu.
- The Envoy Proxy Sidecar is the Delivery Driver, a tiny, intelligent helper attached to every service.
- The Istio Control Plane is the Uber Eats Dispatch System and the central brain that manages all the drivers.
Let’s peel back the layers and see how this works in practice.
Secure order handover
In a traditional, decentralized system, what’s to stop a fake driver from walking into a restaurant and claiming to be there for an order? How do you prevent someone who is not the customer from accepting the delivery? A modern food delivery app solves this with a two-way authentication system. The driver’s app must securely confirm its identity with the restaurant’s tablet before the order is handed over. Then the platform verifies the customer, often with a unique PIN that must be provided to the driver.
In the microservices world, the service mesh provides a similar, automated secure handover for every single request. When your order service needs to communicate with the payment service, the Envoy proxy assigned to each service handles the entire exchange. These proxies perform a mutual TLS (mTLS) handshake—a rapid, automated digital ID check that verifies the identity of both services.
Under the hood, the Istio control plane acts as a trusted authority, managing its own Certificate Authority (CA) to secure the mesh. It issues a unique, short-lived cryptographic certificate to every sidecar proxy. When a client service needs to communicate with a server service, the sidecar proxies on both ends automatically perform a mutual TLS (mTLS) handshake. This cryptographic process verifies the identity of both the client and the server by exchanging their certificates. Only after a successful verification is a secure, encrypted tunnel established for the traffic. Your services remain completely unaware of this process. They simply send and receive standard HTTP requests, with security handled transparently at the network layer.
Traffic management
Imagine you’ve developed a new algorithm to match drivers to orders more efficiently. Instead of a high-risk, full-scale rollout, you want to test this new system in a controlled way. You can configure the central dispatch system to say, “For the next week, send 5% of all new orders to the driver pool managed by the new algorithm, and the rest to the standard pool”. This is a canary deployment, allowing you to test new features in a safe, controlled way.
But the platform’s power goes far beyond that.
- A/B testing: Let’s say you want to test two different “new user” promotions. The dispatch system can be configured to send users with a “first-time order” flag to a service that offers Promotion A, while all other users are sent to Promotion B. The service mesh gives you the power to route traffic based on specific user attributes, not just a simple percentage, enabling true A/B testing.
- Circuit breaking: What if a restaurant’s kitchen gets completely overwhelmed with orders and starts failing? The dispatch system can automatically see this and cut off new orders to that restaurant for a short time to give it a chance to recover. This prevents a single overwhelmed restaurant from causing a chain reaction of failures across the entire system.
- Fault injection: To truly test the resilience of your application, you can simulate a problem. The dispatch system can intentionally introduce a 5-second delay to 1% of deliveries to see how the app and its customers handle the wait. This allows you to perform chaos engineering and test your application’s behavior under real-world conditions.
With the service mesh, you create a simple VirtualService configuration rule that tells the control plane how to manage this traffic. The control plane pushes this rule to all the relevant proxies, which then intelligently distribute the traffic, ensuring safe, controlled, and resilient application behavior.
Observability: Live delivery map
The dispatch system provides a real-time tracking app that shows the journey of every order, from the moment it leaves the kitchen to the second it’s delivered. If a customer’s order is delayed, you can see the precise location of the driver and any potential holdups.
Similarly, the service mesh provides you with a live map of your application network. This is made possible by a wealth of data it automatically collects.
- Metrics: The service mesh continuously gathers high-level operational data, like the total number of orders placed, the average delivery time, and the number of failed deliveries. This data is collected by tools like Prometheus and visualized in dashboards like Grafana, giving you a high-level view of the entire operation’s health.
- Logs: For every single delivery, the service mesh provides a detailed “trip record” that logs every event, such as when the driver arrived at the restaurant, when the food was picked up, and when it was handed off. This gives you a granular, auditable record for every request that flows through your system.
- Distributed tracing: This is the game-changer that ties it all together. Every request that enters the mesh is given a unique trace ID, like an order number. As the request moves between services, each proxy adds a new “span” to the trace, recording key information like the duration and status. Tools like Jaeger or Tempo collect this data, allowing you to visualize the entire journey of a single request and pinpoint exactly where a slowdown or failure occurred.
- Service graph: All of this data is brought together in a service visualization tool like Kiali. Kiali provides a real-time, visual map of your entire service mesh, showing how your services are connected and the health of the traffic flowing between them. It’s the live delivery map for your entire application.
By integrating these powerful tools, the service mesh turns debugging into a clear, unified, and powerful diagnostic process.
Tying it all together
Ultimately, OpenShift Service Mesh 3 marks a fundamental shift in distributed system architecture. It replaces the traditional model of embedding networking logic in each application with a unified data plane composed of uniform sidecar proxies that are remotely configured by a central control plane. This architectural pattern is what enables the mesh's most powerful features without requiring any code changes.
This includes establishing a zero-trust security posture through automatic mutual TLS, enabling declarative traffic management for fine-grained routing and resilience patterns like circuit breakers and providing universal observability as each proxy consistently generates metrics, logs, and traces. The result is a powerful separation of concerns, liberating developers to focus purely on business logic while the platform handles the complexity of the network.
Connecting the unconnectable
Modern applications rarely live in a single cluster. For high availability, disaster recovery, or geographic distribution, they must span multiple clusters, often across different networks and clouds. Connecting these distributed services securely has always been a major architectural challenge.
OpenShift Service Mesh 3 directly addresses this with its native support for multi-primary, multi-network topologies. It allows you to create a single, unified service mesh from a federation of independent clusters. Each cluster remains autonomous and resilient, but services can securely communicate across cluster boundaries when needed.
This powerful capability is a game-changer for building truly resilient, global applications. But how does it work exactly? In my next article, we’ll get our hands dirty and demonstrate how to build this unified, secure network across clusters with OpenShift Service Mesh 3.