Prior to working at Red Hat, I worked for a software company, building financial software for large institutions. From my experiences I knew that some customers required, or demanded, a very aggressive Service Level Agreement (SLA).

If we consider an SLA of 99.999% (generally referred to as “five nines”) then this would allow for a six-second unavailability or downtime over a full week, anything more and penalties would have to be paid. To provide this level of uptime, it is essential to provide a strategy for high availability (HA). This got me thinking --- how this could be achieved with OpenShift and JBoss Enterprise Application Platform (EAP) 7?

For an initial test, I thought I’d try to get a simple HA Servlet working with session sharing to see how EAP 7 works in a cluster of pods within OpenShift.

In subsequent articles I intend to increase the complexity of the solution to support most aspects of what I see as typical large scale applications today.

From what I could discover doing online research, the easiest way to get started would be to use a preloaded operating system via a virtual machine (VM). Because I use OSX, I wanted to have an easy to use VM and image management, which lead me to this article Installation Guide – Red Hat Customer Portal which explains how to install VirtualBox, Vagrant and how to download the Red Hat Container Development Kit (CDK).

Setting up the environment

Downloading the (CDK) Container Development Kit – Get Started does require you to have a zero dollar developer subscription, however this is not a problem --- as the name implies, the subscription is offered to developers at zero cost: http://developers.redhat.com/register. You will actually be prompted to register as you download the CDK, if you do not already have an account: http://developers.redhat.com/download-manager/file/cdk-2.1.0.zip (It's pretty quick and painless.)

Once I signed up for Red Hat Developers, the CDK downloaded I followed the install for Vagrant and VirtualBox on the MAC. I was then able to follow the document section 4.3 entitled “Setting Up Container Development Kit Software Components” and I provisioned the OpenShift virtual image and started it by running:

vagrant up

Once the OpenShift virtual image was started, I went to a browser and connected to the console url that is described in the contents of the startup messages of the OpenShift environment. For my instance this was https://10.1.2.2:8443/console, which does not have a trusted certificate, so I had to accept the certificate to get to the web page --- no problem. I was presented with the login screen. The default admin userid and password is:

admin / admin

Lastly, I knew I would also want to interact with the OpenShift instance using the command line interface (CLI) --- it’s far easier to perform certain tasks this way. The OpenShift Enterprise 3.1 | CLI Reference | Get Started with the CLI explained how to do this on OSX.

Getting a Servlet running

With the environment up and running, I needed a Servlet to test the system --- the first thing to do was learn how to configure OpenShift to run my Servlet.

I found a great book at summit 2016, “OpenShift for Developers” by Grant Shipley & Graham Dumpleton, and it refers to the github repository GitHub – gshipley/book-insultapp-final. This book shows you how to deploy and run a Servlet inside OpenShift.

It was relatively straightforward to get everything up and running, and the book provides a very clear step by step process on how to deploy the above Servlet based application to OpenShift --- so I will not copy the same details here; however, in a nutshell:

  1. I started the OpenShift environment and then logged into the OpenShift console.
  2. I then clicked the “New Project” button on the home-page of the console to create the project for this application.
  3. Once the project was created I used the “Add to project” link at the top of the project page to add a new application to this new project.
  4. I chose to create a new WildFly application, then copied the link to the book-insultapp-final GitHub repository into the field labelled as “Git Repository URL”: https://github.com/gshipley/book-insultapp-final
  5. After clicking the “Create” button at the bottom of this form, the OpenShift system added all the necessary components to build and deploy this GitHub project as a new application into the project.
  6. Once the process has completed, there is a link to the new application on the project overview page within the console.

(Note that the Servlet example in the book does not exploit session sharing within EAP, nor does it demonstrate the HA aspect of EAP 7 running within the pods.)

For my case, testing HA and session sharing, I needed to start many pods for a web application, and then have the sessions in existence as the number of pods decreased (either for load levelling, or due to a crash of the pod). Using this strategy, clients should not see any major interruption within their browser, even if the server pods go offline or if they get failed over to a different data center.

As a note, sometimes the creation of the application on OpenShift did not always work, as the docker-formatted image it depended on was not deployed on the OpenShift instance. To get around this I had to run additional script based commands to load the images onto the OpenShift instance, prior to running the OpenShift console process of building a project and application.

These script based commands are performed in a terminal window with the current working directory set to the location of where the CDK was installed followed by the following subdirectories:

components/rhel/rhel-ose

To start a new terminal window which is running a shell inside the OpenShift environment I used:

vagrant ssh

Then I ran:

docker pull <imagerepositoryname>

Where <imagerepositoryname> is the location of the image. For example:

docker pull registry.access.redhat.com/jboss-eap-6/eap64-openshift:1.2

Next, the default CDK that I downloaded and installed did not have any JBoss EAP 7 based templates. It did have WildFly, which can be used to test things as per the discussion above, but as I specifically wanted to test EAP I started to search for how I could add EAP-based templates into my OpenShift environment.

I was pointed at GitHub – jboss-openshift/application-templates: OpenShift application templates supporting JBoss Middleware based applic… Within the readme document, it explains that you need to issue the following CLI command to install the templates. (To run ‘oc’ commands you must have the OpenShift environment running, using the above vagrant up command.) I downloaded the GitHub repository to my local disk and then within the root folder I issued:

oc create -f jboss-images-streams.json -n openshift

This inserted all the additional templates in the OpenShift environment so that when the “Add to Project” link was clicked, additional templates (such as WildFly and EAP) were visible in the interface. I was now in a position to create a new project with my HA Servlet based application rather than the simple test Servlet mentioned above.

Getting a HA Servlet running

To test the HA features I was interested in, I created my own Servlet that has a simple counter that increases each time the page is accessed and stores the counter in the httpSession. If the pod is removed then the sessions associated with that pod will be transferred to one of the remaining pods however the counter should still increase and not be reset back to one.

The code for the HA example Servlet can be found at GitHub – markeastman/haexample: HA with EAP example for OpenShift --- it is very simple. This is my private repository but feel free to clone or copy it as you want. The only class is HAExampleServlet, and it just increments a counter, stores it in the session and then returns the message.

To deploy this into OpenShift, I first logged into the console using admin/admin and the created a project called “haexample”. Since I’ve installed the EAP 7 templates I can choose the EAP 7 template, however from experience I knew it would fail until I had pulled the image, so before creating the application from the template I ran the following commands:

vagrant ssh
docker pull registry.access.redhat.com/jboss-eap-7/eap70-openshift:1.2

Next, within the OpenShift console I clicked the “haexample” project and then at the top of the overview page, I clicked the link labelled “Add to Project” and then selected the EAP 7:1.2 based template. For the GitRepository URL field of the presented form I entered the location of my GitHub repository https://github.com/markeastman/haexample.git and a branch name of “master”

This will start a build that creates the image and deploys the pod with my Servlet in it. To test it I went to the url as shown on the overview page for the project along with the Servlet context (in my example this is http://haexample-haexample.rhel-cdk.10.1.2.2.xip.io/haexample). Success! The page was up and the counter value was displayed.

To check the HA aspect I started two pods and waited for them to come up. Once they had, I used Chrome and Firefox to connect to the Servlet to see the messages.

I hoped my browsers connected to different pods, but in case they did not I used Safari as well, just in case. I made sure the counters in each session were greater than one --- this ensured I could check the reset action.

I then scaled the pods down to one and clicked refresh in each browser. For the pod that was remaining I found the sessions stayed and the counter incremented. For the session that was on the removed pod I found that the load balancer within OpenShift now failed over to the remaining pod, which then created a new session and reset the counter to one. Obviously clustering and HA were not working (this was expected as I had not yet configured HA.)

Attempting HA

To learn how to turn on HA within OpenShift and EAP, I was pointed towards OpenShift Enterprise 3.1 | Using Images | xPaaS Middleware Images | JBoss Data Grid and the section entitled “Forming a Cluster using the OpenShift JBoss Data Grid xPaaS Images”.

For a group of pods to work as a cluster, they need to know about each other so that they can share data between themselves, and they need to know when those other pods are no longer available. This detection mechanism is often implemented using a “ping” based approach where each pod will issue a “ping” command to every other pod and they will return a simple message to say that they are still working. (As a verb, ping means "to get the attention of" or "to check for the presence of" another party online. The computer acronym was contrived to match the submariners' term for the sound of a returned sonar pulse.)

The default ping implementation for EAP within OpenShift is KubePing, so I followed the details on how to set the variables for getting it to work. I issued the following CLI commands:

oc login
admin
admin
oc project haexample
oc env dc/haexample -e OPENSHIFT_KUBE_PING_NAMESPACE=haexample OPENSHIFT_KUBE_PING_LABELS=app=haexample

The same document indicated that the “LABELS” should be “application=appname”; however, when I checked the labels defined for the service I found that the label was defined as

“app: haexample”

To check your own labels within your service, you can go to the OpenShift console, select the “Browse -> Services” menu option on the left, and then select the service that you created. In my case it was called “haexample”.

When the service page is displayed, there will be an “Actions” menu in the top right --- one of the actions is “Edit YAML”. When selected, a popup window will be displayed with the YAML definition of the service. The labels are near the top of the content.

To make sure the cluster could detect all the pods within my service, I set the OPENSHIFT_KUBE_PING_LABELS to match the label definition contained with the service YAML --- hence OPENSHIFT_KUBE_PING_LABELS=app=haexample. (If the labels are not correct then the EAP 7 instance in each pod cannot see any of the other pods in the service, and they each form a cluster of one, themselves, with no other pods.)

The clustering document also stated that I needed to execute the following commands to ensure each pod has permission to list all the other pods in the service definition:

oc policy add-role-to-user view system:serviceaccount:haexample:default -n haexample

Note that in the above I used ‘haexample’ within the account name and the project name. If you used a different project name you will need to change these commands to match.

Now that the clustering configuration has been completed, I started a number of pods on my service and looked at the log files for each pod to make sure clustering had been enabled. To do this, I clicked the center of the pod circle on the project overview page and then clicked each pod in turn.

For each pod there is a tab at the top of the display that says “Logs” and when selected you can see the log output for the EAP running inside that pod.

I checked within the logs of each pod that the first line had something like:

Service account has sufficient permissions to view pods in kubernetes (HTTP 200). Clustering will be available.

Prior to issuing the oc policy statement above, the top line of the log file stated that there were insufficient permissions and so clustering was being disabled.

Further down the log file I looked for:

06:35:13,478 INFO  [org.jgroups.protocols.openshift.KUBE_PING] (MSC service thread 1-1) namespace [haexample] set; clustering enabled

Note that project name is within the [] brackets just after the namespace keyword. This name matches my service name.

Even further down the log file I found the cluster statement. Note that haexample-3-e0qil1 was the name of the pod for which the this log relates to.:

06:35:19,255 INFO  [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) ISPN000094: Received new cluster view for channel web: [haexample-3-e0qi1] (1) [haexample-3-e0qi1]

This statement did not seem right, as I was expecting to see both pod names defined and inserted into the cluster. When I tried the browser to test the HA session failover, it did not work and I started the long process of trying to figure out what I had not done correctly.

I looked at the KubePing code (an example is here) and found that it had some additional debug information that would be helpful, so I created the configuration directory within my project and added a sample standalone-openshift.xml file into it. (When OpenShift deploys an application, it checks for this folder and copies the xml file to the runtime area of the container image.)

Using this technique, I could replace the same file within the built image and thereby configure the application server differently. This file took some time to locate, as it needs to have the correct keywords ready for substitution at runtime. A colleague send me the following instructions to help get the correct template file for my environment:

You will need two terminal windows in the host environment (in my case OSX). In the first one, run:

docker run –rm -ti –name tempimage registry.access.redhat.com/jboss-eap-7/eap70-openshift:1.2 bash

This will give you a bash prompt; it has started up the image but substituted the normal startup script for bash. Leave that window, and in the second terminal, run the following command:

docker cp tempimage:/opt/jboss/eap/standalone/standalone-openshift.xml my-local-file.xml

That will copy it the configuration file, unsubstituted, since the startup has not yet run (this is why we started the image the way we did in the first terminal window.)

You can then copy out the contents of the my-local-file.xml file the second command created. Within the extracted xml file,I added the additional logging information to the logging section:

<logger category=”org.openshift.ping”>
   <level name=”DEBUG”/>
</logger>

I then checked the code back into git and pushed it to my public repository https://github.com/markeastman/haexample.git  (git add, git commit, git push). Note that if you have already copied or used my repository for your “Add to Project” referred to above, then this file will already be present within your deployed image.

To rebuild the image and deploy the pod again I issued the following command:

oc start-build haexample

When the pods started up again I checked the logs and saw the additional information I needed to figure out what was going wrong:

06:35:34,500 FINE  [org.openshift.ping.kube.Client] (thread-3,ee,haexample-3-e0qi1) getPods(haexample, app=haexample) = [Pod[podIP=172.17.0.4, containers=[Container[ports=[]]]], Pod[podIP=172.17.0.5, containers=[Container[ports=[]]]]]

When checking the code for KubePing.java, I noticed that the containers would only form a cluster if the containers had a port that was consistent with the env variable OPENSHIFT_KUBE_PING_PORT_NAME (which has a default of ‘ping’).

I did not have any named ports within the containers and hence no cluster was formed. From looking in the documents about how to perform DNS_PING, I noticed it used port 8888, and I also noticed Undertow in EAP 7 had booted with the following configuration:

Starting UndertowServer on port 8888 for channel address: haexample-3-e0qi1

So I started to see how I could define such a name port.

When I looked at the yaml for the deployment I saw it define the container ports so I added an additional port entry to this array:

ports:
–
containerPort: 8080
protocol: TCP
–
containerPort: 8443
protocol: TCP
–
containerPort: 8778
protocol: TCP
–
name: ping
containerPort: 8888
protocol: TCP

I left the name as the default, “ping”, but this could be changed as long as the environment variable OPENSHIFT_KUBE_PING_PORT_NAME was also defined to match. When I changed the yaml configuration and saved it, the OpenShift system auto-deployed the pods again. Upon inspection of the log files I now saw two pods defined with named ping ports:

07:08:36,010 FINE  [org.openshift.ping.kube.Client] (Thread-0 (ActiveMQ-server-org.apache.activemq.artemis.core.server.impl.ActiveMQServerImpl@13986e41-626916456)) getPods(haexample, app=haexample) = [Pod[podIP=172.17.0.7, containers=[Container[ports=[Port[name=ping, containerPort=8888]]]]], Pod[podIP=172.17.0.8, containers=[Container[ports=[Port[name=ping, containerPort=8888]]]]]]

Further down the logs as the pods started I saw the cluster had picked up both pods:

Received new cluster view for channel web: [haexample-4-z44yb] (2) [haexample-4-z44yb, haexample-4-iec7f]

When I ran the browser commands I saw that one of the browsers displayed a consistent session id and incremental counter on the remaining pod (after downscaling to one), whereas the other  browser showed a change in the session information but kept the counter value:

Prior to downscale:

From session PqBJ4AiPYKUdD2yfTydEedGMRiLM9vHPRTM45sFG, for the 3 time on pod haexample-4-z44yb
From session 5ZO7_d47CQ_tEUGLj_B_jLGvLCEKWKUo4iWvZ5Nt, for the 3 time on pod haexample-4-iec7f

After downscale:

From session PqBJ4AiPYKUdD2yfTydEedGMRiLM9vHPRTM45sFG, for the 4 time on pod haexample-4-iec7f
From session 5ZO7_d47CQ_tEUGLj_B_jLGvLCEKWKUo4iWvZ5Nt, for the 4 time on pod haexample-4-iec7f

You can see that (post scaling down) the first line of each side of the log statements has the same session id --- the incremented counter, whereas the pod name has changed to the remaining pod.

Conclusion

Within this blog entry, I have shown how to install a local OpenShift environment running within its own VM, and then within that same instance I showed how to deploy a simple Servlet --- as described by the book written by Grant Shipley and Graham Dumpleton. I then used that knowledge to deploy a new project that deployed my HA example Servlet as the service.

To get EAP running as a cluster within the pods for a Service I had to follow the configuration documentation, but this turned out not to be sufficient. In addition to the details within the document, I had to define a named port named “ping” that was exposed by the EAP system --- this acted as a valid ping protocol provider.

Although the named port 8888 was needed to get the cluster to form, I have been unable at the moment to confirm that this is indeed the correct port to use.

Please feel free to post comments or questions if you have any questions, thoughts to share, or improvements to this process.

 

Last updated: March 16, 2023