XML editing with Bash script

Photo by seeweb

Countless products uses XML files, whether it is for data persistence, serialization or mere configuration. This is even more true when it comes to the Red Hat middleware portfolio, the JBoss projects having always been keen on using this format for configuration files – on top of the ones specified by JEE such as the famous (or infamous ?) web.xml.  While the XML format has some definitive qualities, it is not the easiest format to parse, and this often causes issues when integrating product inside an RPM or designing an automated installation procedure.

As I’ve been working on such automation for most of my career, I’ve picked up a bunch of nifty tricks and also designed some useful practices that I wanted to share on this blog.

Command lines

While one can use ‘sed’ or ‘awk’ to process XML files, it is always a tricky job. Indeed, those tools, on the contrary of the XML standard, assume that spacing within the files is structured and relevant. For instance, if a ‘sed’ statement assume that the XML attribute to edit is on the same line as the node tag, it will break if the file spacing is modified, while the XML file will remains valid.

Along with this, it is also extremely difficult to rely on such editing tools to perform rather crucial XML changes such as adding child node or removing a complete block. Bottom line is:  those awesome and standard tools are simply not the best ones for the job.

In this section, we will therefore introduce a couple of tools, available on any good Linux distribution (or easy to install), that will provide better support to handle XML content.

Validation

One of the good things with XML is that it’s a structured format. However, the bad thing is with it is that it’s quite easy to break such structured format. For this reason, it’s pretty important, when editing such file within a script, to validate before and after editing that the structure is proper XML.

I quite recently discovered the command ‘xmlwf’, coming with the ‘expat’ package, which allow to perform such validation operation:

$ xmlwf /tmp/index.xml
/tmp/index.xml:825:2: mismatched tag

While quite old, and not perfect (for instance, an invalid file does not result into the command returning a non zero status), this command is still quite handy to me on a daily basis.

XML edition

If ‘xmlwf’ is helpful, the hard point in handling XML files certainly does not rely in their validation, but their editing. As stated previously, adding or removing child elements, or tweaking attributes, are simply not easy to achieve with the regular script tricks. Fortunately, another useful command from the ‘libxml’ package comes to our rescue for this purpose: xsltproc.

This allows you to process an XML files using a XSLT style sheet, enabling one to easily modify its structure while ensuring the file remains valid. As the command allows one to pass parameters for the style sheet, it is also quite a handy tool for script usage. Let’s look at a concrete example to see how one can leverage this.

Adding a server to a server group in JBoss AS host definition

Editing with the XML structure using XSLT

Since the release of JBoss AS 7 (which is used as a base for JBoss EAP 6), the JEE application server offers a new mode of operation, called domain mode, which allows you to run several instances of the server, even across several systems, as a whole. One key configuration file of this feature is the ‘domain/configuration/host.xml’ file, which describes how many instance should be run on one host.

This example will focus on editing this file, within a script, to add server definitions to it.

The first step here consists of writing an appropriate style sheet. Sadly (and no one will get an argument from me about that) XSLT instructions is not that easy to understand. Especially, if you are coming from a regular RHEL administrator background and never had the basis of it. While I would love to provide some enlightment to the reader on this topic, it is simply off topic, so I will just show the content of the style sheet I designed to add a server entry to the host.xml:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:domain="urn:jboss:domain:1.4">

  <xsl:param name="server-name"/>
  <xsl:param name="server-group"/>
  <xsl:param name="port-offset"/>

  <xsl:template match="*" priority="-1">
    <xsl:element name="{name()}">
      <xsl:apply-templates select="node()|@*"/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="node()|@*" priority="-2">
    <xsl:copy/>
  </xsl:template>

  <xsl:template match="domain:servers">
    <xsl:element name="servers">
      <xsl:apply-templates select="node()|@*"/>
      <xsl:text>	</xsl:text>
      <xsl:message>Adding server 'bob' to 'main-group'</xsl:message>
      <xsl:element name="server">
        <xsl:attribute name="name"><xsl:value-of select="$server-name"/></xsl:attribute>
        <xsl:attribute name="group"><xsl:value-of select="$server-group"/></xsl:attribute>
        <xsl:attribute name="auto-start">true</xsl:attribute>
        <xsl:text>
</xsl:text><xsl:text>	</xsl:text><xsl:text>	</xsl:text>
        <xsl:element name="socket-bindings">
            <xsl:attribute name="port-offset"><xsl:value-of select="$port-offset"/></xsl:attribute>
        </xsl:element>
        <xsl:text>
</xsl:text><xsl:text>	</xsl:text><xsl:text>	</xsl:text>
	<profile name="full-ha"/>
        <xsl:text>
</xsl:text><xsl:text>	</xsl:text>
      </xsl:element>
      <xsl:text>
</xsl:text>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

Here are some key points regarding the style sheet above:

  1. Three XSL parameters – those values need to be provided in the style sheet – they are used to define the server’s name, the server group it belonged to along with the port shift value. This last value is indeed quite important as, to peacefully share the same network interface, each instance of the JBoss AS server will need to start its services (HTTP, JMS, and so on) on a different set of ports. The port-shift value is therefore used to shift the default port values for each instance.
  2. A couple of ‘xsl:template’ instructions are then used to defined how the style sheet should treat ANY elements (node, text,…) of the files it processes. In our case, the default behavior will be to simply copy them as they are to the resulting document. Of course, we’ll override this behavior, for the node ‘server’ to add our server definition, in the last ‘xsl:template’ instruction.
  3. The last instruction contains all the required code to append a new server definition. An important point to note in this part of the code is the use of our three parameters described above with the instruction ‘xsl:value-of’.

Let’s see now how we can now edit this file, using ‘xsltproc’, to add a server definition:

add-server.sh
#!/bin/bash

readonly JBOSS_HOME=/opt/jboss-eap-6
readonly INSTANCE_ID=1
readonly PORT_OFFSET=100
readonly SERVER_GROUP=${SERVER_GROUP:-'main'}

xlstproc --stringparam server-name "server${INSTANCE_ID}" 
         --stringparam server-group "${SERVER_GROUP}" 
         --stringparam port-shift "${PORT_OFFSET}"
         add-server.xsl 
         "${JBOSS_HOME}domain/configuration/host.xml" 

Running this script, one will get the resulting new document with the standard output:

<host>
    ...
    <servers>
        <server name="server-one" group="main-server-group" auto-start="true">
            <socket-bindings port-offset="100"/>
        </server>
        <server name="server-two" group="main-server-group" auto-start="true">
            <socket-bindings port-offset="200"/>
        </server>
        <server name="server-three" group="main-server-group" auto-start="true">
            <socket-bindings port-offset="300"/>
        </server>
    <server name="server1" group="main" auto-start="true">
        <socket-bindings port-offset="100"/>
        <profile name="full-ha"/>
    </server>
</servers>
</host>

Adding several server definition

From here, it is quite easy to enhance the script to automatically create as many instances as needed, automatically calculating the required port shift value:

add-server.sh:
#!/bin/bash

readonly ORIGINAL_FILE=${1}
readonly TARGET_FILE=${2:-'$(mktemp)'}
readonly SERVER_GROUP=${SERVER_GROUP:-'main'}

current_file="${ORIGINAL_FILE}"

for instanceId in {0..2}
do
  result_file=$(mktemp)
  xlstproc --stringparam server-name "server${instanceId}" 
           --stringparam server-group "${SERVER_GROUP}" 
           --stringparam port-shift "$(expr ${instanceId} * 100)" 
           'add-server.xsl' 
           "${current_file}" > ${result_file}
  current_file=${result_file}
done
cp "${current_file}" "${TARGET_FILE}"

Removing previous server definitions

The provided ‘host.xml’ file came with a set of predefined servers, given as an example. Before adding our own server definitions, using the script presented above, we’ll need to remove all the existing ones. This is rather easy to implement – we just need to copy the all XML structure except the ‘servers’ element:

rm-all-servers.xsl:
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:domain="urn:jboss:domain:1.4">

  <xsl:template match="domain:host">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*" priority="-1">
    <xsl:element name="{name()}">
      <xsl:apply-templates select="node()|@*"/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="node()|@*" priority="-2">
    <xsl:copy/>
  </xsl:template>

  <xsl:template match="domain:servers">
    <xsl:element name="servers"/>
  </xsl:template>
</xsl:stylesheet>

Trouble with name spacing…

While the command itself is certainly not to blame here, it is worth mentioning that one can run into trouble with name spaces, more especially with their associate attribute ‘xmlns’. Indeed, style sheets processing may sometime induce the addition or removal of such attributes, and it is sadly quite difficult to work around. Nevertheless, this is probably the only “XML idiosyncrasies” that I have not successfully (meaning here in a “satisfactory manner”) defeated using ‘xsltproc’.

The example above is also a good example of this situation. If you run the command yourself and compare the resulting file with the original, you’ll see that the name space attribute for the ‘host’ node has been removed – which will sadly cause JBoss to refuse to start. After some investigating, I failed to come up with an elegant solution to this problem, so I simply fell back to the use of a good old ‘sed’ statement. Following the configuration file edition, the statement just adds the missing attribute, as we’ll see in the script below.

readonly JBOSS_HOME=${JBOSS_HOME:-'/opt/jboss/jboss-eap/}
readonly HOST_FILE="${JBOSS_HOME}/domain/configuration/host.xml"
readonly EDITED_FILE=$(mktemp)
readonly RESULT_FILE=$(mktemp)

set -e # fails on the first error

echo -n "Checking if original host.xml is valid... '
xmlwf "${HOST_FILE}"
echo 'Done.'

echo -n "Deleting all previous server definition... "
xsltproc 'rm-all-servers.xsl' "${HOST_FILE}" > "${EDITED_FILE}"

echo -n "Add server instance to host.xml ...'
./add-server.sh > "${EDITED_FILE}" "${RESULT_FILE}"
echo 'Done'

echo -n "Add missing name space attribute... '
sed -e 's;<host ;<host xmlns="urn:jboss:domain:1.4" ;' "${RESULT_FILE}"
echo 'Done.'

echo -n "Checking if resulting file is still valid... '
xmlwf "${RESULT_FILE}"
echo 'Done.'

echo -n "Replacing host.xml'
cp "${HOST_FILE}" "${HOST_FILE}.bck" # backing up never hurts...
cp "${RESULT_FILE}" "${HOST_FILE}"
echo 'Done.'

Final words

As one can see on the script above, the resulting procedure which automates the addition of server definitions to a host.xml, is pretty simple to both understand and maintain. It could easily integrate into an RPM, or simply run by a deployment tool, or a configuration management tool such as Puppet (or by Kickstart when the host is set up).

But one thing is certain now, having an XML configuration file is no longer a blocker to properly automate deployment or design maintenance scripts. With those two commands line tools and a fair understanding of XSLT, the sky is limit (well, it’s not an excuse to go crazy on this…).

Join the Red Hat Developer Program (it’s free) and get access to related cheat sheets, books, and product downloads.

Who’s your Brent?

To learn more, visit our DevOps Topic page.

Join the Red Hat Developer Program (it’s free) and get access to related cheat sheets, books, and product downloads that can help you with your DevOps efforts.

Take advantage of your Red Hat Developers membership and download RHEL today at no cost.

Share