Resilient Enterprise Messaging with JBoss A-MQ & Red Hat Enterprise Linux

Background

At JUDCon 2013 in Boston, Scott Cranton and I presented a talk entitled Resilient Enterprise Messaging with Fuse & Red Hat Enterprise Linux. This technical article is the follow-up work from that presentation.

JBoss A-MQ is built on ActiveMQ which is a robust messaging platform that supports STOMP, JMS, AMQP and modern principals in Message Oriented Middleware (MOM). It's built from the ground up to be loosely coupled and asynchronous in nature. This provides ActiveMQ with native high availability capabilities. An administrator can easily configure an ActiveMQ master/slave architecture with a shared filesystem. In the future this will be augmented with Replicated LevelDB.

Red Hat Enterprise Linux is the popular, stable and scalable Linux distribution which has High Availability and Resilient Storage add-on support built on CMAN, RGManager, Corosync, and GFS2. High Availability and Resilient Storage expand upon the high availability capabilities built into ActiveMQ and provide a robust, and complete solution for enterprise deployments that require deeper clustering capabilities.

There are two main architectures commonly used to provide fault tolerance to a messaging server. The first is master/slave, which is easy to configure, but as it scales, it requires 2X resources. The second is made up of active nodes and redundant nodes. The redundant nodes can take over for any one of the active nodes should they scale. The active/redundant architecture requires more software and more initial configuration, but uses N+1 or N+2 resources as it scales.

This article will explore the technical requirements and best practices for building and designing a N+1 architecture using JBoss A-MQ, Red Hat Enterprise Linux, the High Availability Add-On and the Resilient Storage Add-On.

Scaling

The native High Availability features of ActiveMQ allow it to scale quite well and will satisfy most messaging requirements. As the messaging platform grows, administrators can add pairs of clustered servers while sharding Queues and Topics. Even shared storage can be scaled by providing different LUNs or NFS4 shares to each pair of clustered servers. The native High Availability in ActiveMQ provides great scaling and decent fault resolution in smaller environments (2-6 nodes).

As a messaging platform grows larger (6+ nodes), the paired, master/slave architecture of the native high availability can start to require a lot of hardware resources. For example, at 10 nodes of messaging, the 2N architecture will require another 10 nodes of equal or greater resources for resiliency. This is a total of 20 messaging nodes. At this point, it becomes attractive to investigate an N+1 or N+2 architecture. For example, in an N+2 architecture, acceptable fault tolerance may be provided to this same 10-node messaging platform with only two extra nodes of equal or greater resources. This would only require a total of 12 instead of 20 nodes.

Fault Tolerance

The native master/slave architecture of ActiveMQ provides decent fault detection. Both nodes are configured to provide the exact same messaging service. As each node starts, each attempts to gain a lock on the shared filesystem. Whoever gets the lock first, starts serving traffic. The slave node then periodically attempts to get the lock. If the master fails, the slave will obtain the lock the next time it tries and will then provide access to the message queues identical to the master. This works great for some basic hardware/software failures.

As fault tolerance requirements expand, failure scenario checking can be mapped into the service clustering software such as the Red Hat Enterprise Linux High Availability Add-On. This allows administrators to write advanced checks and expand upon them as lessons are learned in production. Administrators can ensure that the ActiveMQ processes and shared storage are monitored, as with the native master/slave high availability, but can also utilize JMX, network port checking things as advanced as internal looking glass services to ensure that the messaging platform is available to its clients. Finally, the failover logic can be embedded in the clustering software, which allows administrators to create failover domains and easily add single nodes as capacity is required.

From a fault tolerance perspective, it is worth stressing that in either architecture, the failover node must be of equal or greater resources. For example, if the workload running on a messaging node requires 65% of the memory, CPU, or network bandwidth, the failover node must be able to satisfy these requirements. If the workload consumes this amount of resources on the day the messaging platform is put into production, the requirements will typically grow over time. If, for example, at the end of two years, the workload has grown to 85%, the workload may now require more capacity than the failover node can provide and will cause an outage. It is an anti-pattern to have failover nodes that are not of equal or greater resource capacity than the production nodes.

Architecture

To create an example architecture that demonstrates client to broker and broker to broker message flow, we will configure two brokers as separate services in the clustering software. The producer and consumer will be sample Java code that is run from any computer that has network connectivity to the messaging cluster. Messages will flow as follows:

The two services AMQ-East and AMQ-West will be limited to their respective failover domains East and West. This will prevent AMQ-East and AMQ-West from running on the same physical node. Failover domains also allow administrators to architect several failover servers and scale out capacity within the clustering software. For this tutorial, the environment will be composed of the following systems and failover domains.

amq01.example.com: Producer and consumer code will be ran here
amq02.example.com: Member of East Failover Domain
amq03.example.com: Member of West Failover Domain
amq04.example.com: Failover node for both domains.

ActiveMQ

The following installation should be performed identically on amq02.example.com, amq03.example.com, and amq04.example.com.

Installation

ActiveMQ is supported by Red Hat in two configurations. It can be ran inside of a Karaf container and is the preferred method in non-clustered environments or if configuration will be managed by the Fuse Management Console (Zookeepr). ActiveMQ can also be ran directly on the JVM. This is the preferred method for clustered setups and is the method employed in this tutorial.

Red Hat provides a tar ball with standalone ActiveMQ in the extras directory

/opt/jboss-a-mq-6.0.0.redhat-024/extras/apache-activemq-5.8.0.redhat-60024-bin.zip
For simplicity the standalone version is installed and linked in /opt

cp /opt/jboss-a-mq-6.0.0.redhat-024/extras/apache-activemq-5.8.0.redhat-60024-bin.zip /opt cd /opt unzip apache-activemq-5.8.0.redhat-60024-bin.zip ln -s apache-activemq-5.8.0.redhat-60024 apache-activemq

Configuration

ActiveMQ comes with several example configuration files. Use the static network of brokers' configuration files as a foundation for the clustered pair. Notice that the ports are different by default. This is so that both brokers can run on the same machine. In a production environment, you may or may not want this to be possible.

cd /opt/apache-activemq/conf cp activemq-static-network-broker1.xml activemq-east.xml cp activemq-static-network-broker2.xml activemq-west.xml
vim activemq-west.xml

The original network of broker configuration files is set up to both run on the same machine. Change default from localhost to the activemq-east failover ip:

Storage

The storage requirements in a clustered ActiveMQ messaging platform based on RGManager are different than the ActiveMQ Master/Slave architecture. A standard filesystem such as Ext3, Ext4, BTRFS, or a shared filesystem such as NFS or GFS2 can be used. This provides the architect to use the filesystem that maps best to functional and throughput requirements. Each filesystem has advantages and disadvantages.

EXT4

This is an obvious choice for reliability and throughput. There is no lock manager which could provide better performance. An EXT4 filesystem is not clustered and must be managed by the cluster software. This will take extra time during a failover.

GFS2

We have seen good success with GFS2. It allows each node in the cluster to mount the filesystem by default and prevents the cluster software from needing to handle mounts and unmounts. This will provide quicker failover during an outage. GFS2 has the advantage of typically residing on Fiber Channel storage and, as such, is out of band from the standard corporate network.

NFS

Unlike the Master/Slave architecture built into ActiveMQ, any version of NFS can be used with the clustered architecture. Like GFS2, NFS has the advantage of being mounted at boot, providing quicker failovers during an outage. NFS traditionally uses the standard enterprise network and, in my experience, may be susceptible to impact by other users on the network.

Clustering

The following installation should be performed identically on amq02.example.com, amq03.example.com, and amq04.example.com.

Init Scripts

These init scripts were developed to be scalable. Configuration data is embedded in a separate configuration file in /etc/sysconfig. As new brokers are added, the administrator can simply copy the wrapper script. For example, camq-west could be copied to camq-north to start a North broker.

Attentive readers may also notice that there are artifacts in this code implying that these scripts are also used for clustered systems that rely on the Fuse Management Console, built on Zookeeper. While it will not be used in this tutorial, most of the required infrastructure is included in these scripts.

/etc/sysconfig/camq

JBOSS_AMQ_HOME=/opt/jboss-a-mq ACTIVEMQ_HOME=/opt/apache-activemq FABRIC_SCRIPTS_HOME=/root/src/external-mq-fabric-client/scripts CLUSTER_DATA_DIR=/srv/fuse

CONTAINER_USERNAME=admin
CONTAINER_PASSWORD=admin
CONTAINER_SSH_HOST=host117.phx.salab.redhat.com
CONTAINER_SSH_PORT=8101
ZOOKEEPER_PASSWORD=admin
ZOOKEEPER_URL=host117.phx.salab.redhat.com:2181
/etc/init.d/camq
#!/bin/bash # # camq This starts and stops rngd # # chkconfig: - 99 01 # description: starts clustered amq server # # processname: /sbin/rngd # config: /etc/sysconfig/camq # pidfile: /var/run/camq.pid # # Return values according to LSB for all commands but status: # 0 - success # 1 - generic or unspecified error # 2 - invalid or excess argument(s) # 3 - unimplemented feature (e.g. "reload") # 4 - insufficient privilege # 5 - program is not installed # 6 - program is not configured # 7 - program is not running #

# Source function library.
. /etc/init.d/functions

# Check config
test -f /etc/sysconfig/camq || exit 6
source /etc/sysconfig/camq

KARAF_CLIENT=$JBOSS_AMQ_HOME/bin/client
AMQ=$JBOSS_AMQ_HOME/bin/amq

test -x $AMQ || exit 5
test -x $KARAF_CLIENT || exit 5

RETVAL=0

# Logic to handle naming and case problems
if [ "$2" == "east" ]
then
profile="east"
cap_profile="East"
elif [ "$2" == "west" ]
then
profile="west"
cap_profile="West"
fi

pidfile="/var/run/camq.pid"
prog="/usr/lib/jvm/java/bin/java"
#OPTIONS="-server -Xms128M -Xmx512M -XX:+UnlockDiagnosticVMOptions -XX:+UnsyncloadClass -XX:PermSize=16M -XX:MaxPermSize=128M -Dcom.sun.management.jmxremote -Djava.endorsed.dirs=/usr/lib/jvm/java/jre/lib/endorsed:/usr/lib/jvm/java/lib/endorsed:JBOSS_AMQ_HOME/lib/endorsed -Djava.ext.dirs=/usr/lib/jvm/java/jre/lib/ext:/usr/lib/jvm/java/lib/ext:JBOSS_AMQ_HOME/lib/ext -Dkaraf.name=A-MQ-${cap_profile} -Dorg.fusesource.mq.fabric.server-default.cfg:config=JBOSS_AMQ_HOME/etc/activemq-${profile}.xml -Dkaraf.instances=JBOSS_AMQ_HOME/instances -Dkaraf.home=JBOSS_AMQ_HOME -Dkaraf.base=JBOSS_AMQ_HOME -Dkaraf.data=JBOSS_AMQ_HOME/data -Djava.io.tmpdir=JBOSS_AMQ_HOME/data/tmp -Djava.util.logging.config.file=JBOSS_AMQ_HOME/etc/java.util.logging.properties -Dkaraf.startLocalConsole=false -Dkaraf.startRemoteShell=true -classpath JBOSS_AMQ_HOME/lib/karaf-jaas-boot.jar:JBOSS_AMQ_HOME/lib/karaf.jar org.apache.karaf.main.Main"
OPTIONS="-Xms1G -Xmx1G -Djava.util.logging.config.file=logging.properties -Dcom.sun.management.jmxremote -Djava.io.tmpdir=/tmp -Dactivemq.classpath=$ACTIVEMQ_HOME/conf -Dactivemq.home=$ACTIVEMQ_HOME -Dactivemq.base=$ACTIVEMQ_HOME -Dactivemq.conf=$ACTIVEMQ_HOME/conf/ -Dactivemq.data=$CLUSTER_DATA_DIR -jar $ACTIVEMQ_HOME/bin/activemq.jar start xbean:file:$ACTIVEMQ_HOME/conf/activemq-$profile.xml"

status(){
echo "bean org.apache.activemq:brokerName=amq,service=Health,type=Broker
get CurrentStatus" | java -jar /root/jmxterm.jar -l service:jmx:rmi://$HOSTNAME:44444/jndi/rmi://$HOSTNAME:1099/karaf-A-MQ-${cap_profile} -u admin -p admin -n| grep Good
true
RETVAL=$?
exit $RETVAL
}

start(){
echo -n $"Starting AMQ server"
nohup $prog $OPTIONS &>/dev/null &
echo $! > ${pidfile}-$profile

RETVAL=$?
if test $RETVAL = 0 ; then
touch /var/lock/subsys/camq-$profile
fi
return $RETVAL
}

stop(){
echo -n $"Stopping AMQ server"
killproc -p $pidfile-$profile -d2
RETVAL=$?
echo
rm -f /var/lock/subsys/camq-$profile
return $RETVAL
}

restart(){
stop
start
}

usage(){
echo "Usage: $0 {status PROFILE|start PROFILE|stop PROFILE|restart PROFILE}"
exit 3
}

test $2 || usage

# See how we were called.
case "$1" in
status)
status
;;
start)
start
;;
stop)
stop
;;
restart)
restart
;;
*)
usage
esac

exit $RETVAL
/etc/init.d/camq-east

#!/bin/bash /etc/init.d/camq $1 `basename $0| cut -f2 -d'-'`
/etc/init.d/camq-west

#!/bin/bash /etc/init.d/camq $1 `basename $0| cut -f2 -d'-'`

Cluster Configuration File

This configuration file was generated by Luci for a three node cluster built in the Solutions Architect's Lab at Red Hat (not the Crunchtools Lab). It is provided for guidance and may require additional tuning for your environment.

/etc/cluster/cluster.conf

Test Code & Cluster Failover

The test code is pulled from the Fuse by Example repo on GitHub

Producer

Modify the producer code to send messages to the AMQ-East broker. Make the following change.

vim /root/src/external-mq-fabric-client/simple-producer/src/main/resources/jndi.properties

#java.naming.provider.url = discovery:(fabric:a-mq-east)?maxReconnectDelay=1000 java.naming.provider.url = failover:(tcp://10.3.77.51:61616)?maxReconnectDelay=1000

Consumer

Modify the producer code to send messages to the AMQ-East broker. Make the following change.

vim /root/src/external-mq-fabric-client/simple-consumer/src/main/resources/jndi.properties

#java.naming.provider.url = discovery:(fabric:a-mq-west) java.naming.provider.url = failover:(tcp://10.3.77.52:61618)?maxReconnectDelay=1000

Cluster

To set up the experiment, make sure that the cluster is in the following state:

clustat Cluster Status for AMQ @ Thu Aug 8 19:42:33 2013 Member Status: Quorate

Teminal One (amq01.example.com): Build & Run

Member Name ID Status
------ ---- ---- ------
amq02.example.com 1 Online, Local, rgmanager
amq03.example.com 2 Online, rgmanager
amq04.example.com 3 Online, rgmanager

Service Name Owner (Last) State
------- ---- ----- ------ -----
service:AMQ-East amq02.example.com started
service:AMQ-West amq03.example.com started

Run the Code

Run the producer in one terminal and the consumer in another. You will see the messages flow:

Teminal One (amq01.example.com): Build & Run

cd /root/src/external-mq-fabric-client/simple-consumer mvn -U clean install mvn exec:java

[INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building MQ-Fabric Client Example :: Simple Producer 2.0.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] >>> exec-maven-plugin:1.2.1:java (default-cli) @ simple-producer >>> [INFO] [INFO] <<< exec-maven-plugin:1.2.1:java (default-cli) @ simple-producer <<< [INFO] [INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ simple-producer --- 19:58:48 INFO Successfully connected to tcp://10.3.77.51:61616 19:58:48 INFO Sending to destination: queue://fabric.simple this text: 1. message sent 19:58:49 INFO Sending to destination: queue://fabric.simple this text: 2. message sent 19:58:49 INFO Sending to destination: queue://fabric.simple this text: 3. message sent 19:58:50 INFO Sending to destination: queue://fabric.simple this text: 4. message sent 19:58:51 INFO Sending to destination: queue://fabric.simple this text: 5. message sent ...
Teminal Two (amq01.example.com): Build & Run

cd /root/src/external-mq-fabric-client/simple-consumer mvn -U clean install mvn exec:java

[INFO] Scanning for projects... [INFO] [INFO] ------------------------------------------------------------------------ [INFO] Building MQ-Fabric Client Example :: Simple Consumer 2.0.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ [INFO] [INFO] >>> exec-maven-plugin:1.2.1:java (default-cli) @ simple-consumer >>> [INFO] [INFO] <<< exec-maven-plugin:1.2.1:java (default-cli) @ simple-consumer <<< [INFO] [INFO] --- exec-maven-plugin:1.2.1:java (default-cli) @ simple-consumer --- 20:00:51 INFO Successfully connected to tcp://10.3.77.52:61618 20:00:51 INFO Start consuming messages from queue://fabric.simple with 120000ms timeout 20:00:51 INFO Got 1. message: 125. message sent 20:00:51 INFO Got 2. message: 126. message sent 20:00:51 INFO Got 3. message: 127. message sent ...

Testing Failover

Terminal Three (amq04.example.com): Failover Tests

Now tell the clustering software to fail the AMQ-East service back and forth between amq03.example.com and amq04.example.com. This example, fails back and forth five times, but this can easily be changed to 25 or 500 for robust testing.

for i in {1..5}; do clusvcadm -r AMQ-West;sleep 20; done

As the service fails back and forth, you will notice the consumer output some messages like this. You can also fail the AMQ-East service back and forth, you will see similar messages on the producer side.

... 19:58:34 INFO Got 593. message: 593. message sent 19:58:35 INFO Got 594. message: 594. message sent 19:58:35 INFO Got 595. message: 595. message sent 19:58:36 WARN Transport (tcp://10.3.77.52:61618) failed, reason: java.io.EOFException, attempting to automatically reconnect 19:58:53 INFO Successfully reconnected to tcp://10.3.77.52:61618 19:58:53 INFO Got 596. message: 596. message sent 19:58:53 INFO Got 597. message: 597. message sent 19:58:54 INFO Got 598. message: 598. message sent ...

Last updated: July 3, 2023

Resilient Enterprise Messaging with JBoss A-MQ &#038; Red Hat Enterprise Linux

Background

Scaling

Fault Tolerance

Architecture

ActiveMQ

Installation

Configuration

Storage

EXT4

GFS2

NFS

Clustering

Init Scripts

Cluster Configuration File

Test Code & Cluster Failover

Producer

Consumer

Cluster

Run the Code

Testing Failover

Red Hat Trusted Software Supply Chain is now available

Synchronize instance tags from Amazon EC2 and Microsoft Azure with Red Hat Insights

Containerize Node.js applications at the edge on RHEL and Fedora

How to monitor OpenShift using the Datadog Operator

Red Hat build of Keycloak high availability: A simplified approach

Products

Build

Quicklinks

Communicate

RED HAT DEVELOPER

Red Hat legal and privacy links

Red Hat legal and privacy links

Report a website issue

Resilient Enterprise Messaging with JBoss A-MQ & Red Hat Enterprise Linux