XML editing with Bash script

Photo by seeweb

Countless products uses XML files, whether it is for data persistence, serialization or mere configuration. This is even more true when it comes to the Red Hat middleware portfolio, the JBoss projects having always been keen on using this format for configuration files – on top of the ones specified by JEE such as the famous (or infamous ?) web.xml.  While the XML format has some definitive qualities, it is not the easiest format to parse, and this often causes issues when integrating product inside an RPM or designing an automated installation procedure.

As I’ve been working on such automation for most of my career, I’ve picked up a bunch of nifty tricks and also designed some useful practices that I wanted to share on this blog.

Command lines

While one can use ‘sed’ or ‘awk’ to process XML files, it is always a tricky job. Indeed, those tools, on the contrary of the XML standard, assume that spacing within the files is structured and relevant. For instance, if a ‘sed’ statement assume that the XML attribute to edit is on the same line as the node tag, it will break if the file spacing is modified, while the XML file will remains valid.

Along with this, it is also extremely difficult to rely on such editing tools to perform rather crucial XML changes such as adding child node or removing a complete block. Bottom line is:  those awesome and standard tools are simply not the best ones for the job.

In this section, we will therefore introduce a couple of tools, available on any good Linux distribution (or easy to install), that will provide better support to handle XML content.

Validation

One of the good things with XML is that it’s a structured format. However, the bad thing is with it is that it’s quite easy to break such structured format. For this reason, it’s pretty important, when editing such file within a script, to validate before and after editing that the structure is proper XML.

I quite recently discovered the command ‘xmlwf’, coming with the ‘expat’ package, which allow to perform such validation operation:

$ xmlwf /tmp/index.xml
/tmp/index.xml:825:2: mismatched tag

While quite old, and not perfect (for instance, an invalid file does not result into the command returning a non zero status), this command is still quite handy to me on a daily basis.

XML edition

If ‘xmlwf’ is helpful, the hard point in handling XML files certainly does not rely in their validation, but their editing. As stated previously, adding or removing child elements, or tweaking attributes, are simply not easy to achieve with the regular script tricks. Fortunately, another useful command from the ‘libxml’ package comes to our rescue for this purpose: xsltproc.

This allows you to process an XML files using a XSLT style sheet, enabling one to easily modify its structure while ensuring the file remains valid. As the command allows one to pass parameters for the style sheet, it is also quite a handy tool for script usage. Let’s look at a concrete example to see how one can leverage this.

Adding a server to a server group in JBoss AS host definition

Editing with the XML structure using XSLT

Since the release of JBoss AS 7 (which is used as a base for JBoss EAP 6), the JEE application server offers a new mode of operation, called domain mode, which allows you to run several instances of the server, even across several systems, as a whole. One key configuration file of this feature is the ‘domain/configuration/host.xml’ file, which describes how many instance should be run on one host.

This example will focus on editing this file, within a script, to add server definitions to it.

The first step here consists of writing an appropriate style sheet. Sadly (and no one will get an argument from me about that) XSLT instructions is not that easy to understand. Especially, if you are coming from a regular RHEL administrator background and never had the basis of it. While I would love to provide some enlightment to the reader on this topic, it is simply off topic, so I will just show the content of the style sheet I designed to add a server entry to the host.xml:

<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:domain="urn:jboss:domain:1.4">

  <xsl:param name="server-name"/>
  <xsl:param name="server-group"/>
  <xsl:param name="port-offset"/>

  <xsl:template match="*" priority="-1">
    <xsl:element name="{name()}">
      <xsl:apply-templates select="node()|@*"/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="node()|@*" priority="-2">
    <xsl:copy/>
  </xsl:template>

  <xsl:template match="domain:servers">
    <xsl:element name="servers">
      <xsl:apply-templates select="node()|@*"/>
      <xsl:text>	</xsl:text>
      <xsl:message>Adding server 'bob' to 'main-group'</xsl:message>
      <xsl:element name="server">
        <xsl:attribute name="name"><xsl:value-of select="$server-name"/></xsl:attribute>
        <xsl:attribute name="group"><xsl:value-of select="$server-group"/></xsl:attribute>
        <xsl:attribute name="auto-start">true</xsl:attribute>
        <xsl:text>
</xsl:text><xsl:text>	</xsl:text><xsl:text>	</xsl:text>
        <xsl:element name="socket-bindings">
            <xsl:attribute name="port-offset"><xsl:value-of select="$port-offset"/></xsl:attribute>
        </xsl:element>
        <xsl:text>
</xsl:text><xsl:text>	</xsl:text><xsl:text>	</xsl:text>
	<profile name="full-ha"/>
        <xsl:text>
</xsl:text><xsl:text>	</xsl:text>
      </xsl:element>
      <xsl:text>
</xsl:text>
    </xsl:element>
  </xsl:template>
</xsl:stylesheet>

Here are some key points regarding the style sheet above:

  1. Three XSL parameters – those values need to be provided in the style sheet – they are used to define the server’s name, the server group it belonged to along with the port shift value. This last value is indeed quite important as, to peacefully share the same network interface, each instance of the JBoss AS server will need to start its services (HTTP, JMS, and so on) on a different set of ports. The port-shift value is therefore used to shift the default port values for each instance.
  2. A couple of ‘xsl:template’ instructions are then used to defined how the style sheet should treat ANY elements (node, text,…) of the files it processes. In our case, the default behavior will be to simply copy them as they are to the resulting document. Of course, we’ll override this behavior, for the node ‘server’ to add our server definition, in the last ‘xsl:template’ instruction.
  3. The last instruction contains all the required code to append a new server definition. An important point to note in this part of the code is the use of our three parameters described above with the instruction ‘xsl:value-of’.

Let’s see now how we can now edit this file, using ‘xsltproc’, to add a server definition:

add-server.sh
#!/bin/bash

readonly JBOSS_HOME=/opt/jboss-eap-6
readonly INSTANCE_ID=1
readonly PORT_OFFSET=100
readonly SERVER_GROUP=${SERVER_GROUP:-'main'}

xlstproc --stringparam server-name "server${INSTANCE_ID}" 
         --stringparam server-group "${SERVER_GROUP}" 
         --stringparam port-shift "${PORT_OFFSET}"
         add-server.xsl 
         "${JBOSS_HOME}domain/configuration/host.xml" 

Running this script, one will get the resulting new document with the standard output:

<host>
    ...
    <servers>
        <server name="server-one" group="main-server-group" auto-start="true">
            <socket-bindings port-offset="100"/>
        </server>
        <server name="server-two" group="main-server-group" auto-start="true">
            <socket-bindings port-offset="200"/>
        </server>
        <server name="server-three" group="main-server-group" auto-start="true">
            <socket-bindings port-offset="300"/>
        </server>
    <server name="server1" group="main" auto-start="true">
        <socket-bindings port-offset="100"/>
        <profile name="full-ha"/>
    </server>
</servers>
</host>

Adding several server definition

From here, it is quite easy to enhance the script to automatically create as many instances as needed, automatically calculating the required port shift value:

add-server.sh:
#!/bin/bash

readonly ORIGINAL_FILE=
readonly TARGET_FILE=${2:-'$(mktemp)'}
readonly SERVER_GROUP=${SERVER_GROUP:-'main'}

current_file="${ORIGINAL_FILE}"

for instanceId in {0..2}
do
  result_file=$(mktemp)
  xlstproc --stringparam server-name "server${instanceId}" 
           --stringparam server-group "${SERVER_GROUP}" 
           --stringparam port-shift "$(expr ${instanceId} * 100)" 
           'add-server.xsl' 
           "${current_file}" > ${result_file}
  current_file=${result_file}
done
cp "${current_file}" "${TARGET_FILE}"

Removing previous server definitions

The provided ‘host.xml’ file came with a set of predefined servers, given as an example. Before adding our own server definitions, using the script presented above, we’ll need to remove all the existing ones. This is rather easy to implement – we just need to copy the all XML structure except the ‘servers’ element:

rm-all-servers.xsl:
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:domain="urn:jboss:domain:1.4">

  <xsl:template match="domain:host">
    <xsl:copy>
      <xsl:apply-templates select="node()|@*"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="*" priority="-1">
    <xsl:element name="{name()}">
      <xsl:apply-templates select="node()|@*"/>
    </xsl:element>
  </xsl:template>

  <xsl:template match="node()|@*" priority="-2">
    <xsl:copy/>
  </xsl:template>

  <xsl:template match="domain:servers">
    <xsl:element name="servers"/>
  </xsl:template>
</xsl:stylesheet>

Trouble with name spacing…

While the command itself is certainly not to blame here, it is worth mentioning that one can run into trouble with name spaces, more especially with their associate attribute ‘xmlns’. Indeed, style sheets processing may sometime induce the addition or removal of such attributes, and it is sadly quite difficult to work around. Nevertheless, this is probably the only “XML idiosyncrasies” that I have not successfully (meaning here in a “satisfactory manner”) defeated using ‘xsltproc’.

The example above is also a good example of this situation. If you run the command yourself and compare the resulting file with the original, you’ll see that the name space attribute for the ‘host’ node has been removed – which will sadly cause JBoss to refuse to start. After some investigating, I failed to come up with an elegant solution to this problem, so I simply fell back to the use of a good old ‘sed’ statement. Following the configuration file edition, the statement just adds the missing attribute, as we’ll see in the script below.

readonly JBOSS_HOME=${JBOSS_HOME:-'/opt/jboss/jboss-eap/}
readonly HOST_FILE="${JBOSS_HOME}/domain/configuration/host.xml"
readonly EDITED_FILE=$(mktemp)
readonly RESULT_FILE=$(mktemp)

set -e # fails on the first error

echo -n "Checking if original host.xml is valid... '
xmlwf "${HOST_FILE}"
echo 'Done.'

echo -n "Deleting all previous server definition... "
xsltproc 'rm-all-servers.xsl' "${HOST_FILE}" > "${EDITED_FILE}"

echo -n "Add server instance to host.xml ...'
./add-server.sh > "${EDITED_FILE}" "${RESULT_FILE}"
echo 'Done'

echo -n "Add missing name space attribute... '
sed -e 's;<host ;<host xmlns="urn:jboss:domain:1.4" ;' "${RESULT_FILE}"
echo 'Done.'

echo -n "Checking if resulting file is still valid... '
xmlwf "${RESULT_FILE}"
echo 'Done.'

echo -n "Replacing host.xml'
cp "${HOST_FILE}" "${HOST_FILE}.bck" # backing up never hurts...
cp "${RESULT_FILE}" "${HOST_FILE}"
echo 'Done.'

Final words

As one can see on the script above, the resulting procedure which automates the addition of server definitions to a host.xml, is pretty simple to both understand and maintain. It could easily integrate into an RPM, or simply run by a deployment tool, or a configuration management tool such as Puppet (or by Kickstart when the host is set up).

But one thing is certain now, having an XML configuration file is no longer a blocker to properly automate deployment or design maintenance scripts. With those two commands line tools and a fair understanding of XSLT, the sky is limit (well, it’s not an excuse to go crazy on this…).


Join the Red Hat Developer Program (it’s free) and get access to related cheat sheets, books, and product downloads.

Take advantage of your Red Hat Developers membership and download RHEL today at no cost.

Share
  • Hello,

    Why do you use xmlwf where xmllint (from libxml2) does a proper job for validation, as it return a proper error if the validation fails. You can test well-formness but also being more strict (DTD validation, etc).

    And, bonus point, libxml2 is from the same developer than libxslt (actually it even use libxml2), and he is working for Red Hat.

    Cheers

  • I haven’t heard of ‘xmlwf’ before. When it comes to XML validation my go-to tool has always been ‘xmllint’, provided by the ‘libxml2’ package.

    This gist I made shows a *full* example of the output for valid an invalid document: https://gist.github.com/tbielawa/7829506

    The short example looks like this:

    $ xmllint –noout –xinclude ./Virtual-Disk-Operations.xml
    ./Virtual-Disk-Operations.xml:13: parser error : Opening and ending tag mismatch: bar line 12 and foo

    ^

    xmllint will print also return non-zero for invalid documents! Just another tool to have in your arsenal.

  • One more thing I meant to mention: xmllint also provides options for specifying Document Type Definitions (DTDs) or RelaxNG schemas (no schematron support though) to ensure that the document conforms to the syntax and grammar of any schema you’re writing against.

  • Hi,

    At the time I looked for a XML validation command, my googling lead me to xmlwf, without ever mentioning ‘xmllint’ capabilities in the domain. So, thanks for this feedback, I’ll be sure to update my script asap to include the use of ‘xmllint’ in lieu of xmlwf !

    (I plan to release an other article on this topic, I’ll integrate a quick overview of ‘xmlint’ if I can).

    Cheers !

  • DV

    Heya,

    I’m DV the main author of libxml2 and libxslt, feel free to grab me, I’m on various IRC channels for example 🙂 . xmllint can be of course used to verify (or fix) well-formedness, check validity, convert encodings, cleanup superfluous namespace declaration, indent, etc … see xmllint –help and poke me if needed !

  • Hi Daniel,

    I remember you 😉 – we had a chat (a long time ago), I think at one of your talk at Solutions Linux Paris ! I even think this was when I realized I could process stylesheet within a shell, using tools from libxml… which down the road lead to this post.

    Thanks for the feedback and the support offer through IRC – it won’t be forgotten 😉

  • +1 for `xmllint` that I also use as the engine to reflow xml document while in Sublime Text editor.

    And…

    Not xml manipulation but still related in some way.

    If you are looking for a tool to colorize and reflow xml documents directly in the shell I reccomend `pygmentize`

    That with some helper alias, becomes also a nice `less` alternative able to colorize most programming languages and conf files:

    `$ which catc
    alias catc=’pygmentize -O style=monokai -f console256 -g’`

  • sbts

    xmlstarlet seems to be the passer/editor of choice
    this post certainly suggests it.
    http://stackoverflow.com/questions/1554143/bash-script-to-edit-xml-file

    • Well, again, xmlwf is just the command I found when I worked on the script. I never said it was the best, neither the more suited. What I do appreciate about it, it is the simpleness – no extra arg, no parameter, just print out the error and set status to != 0.

      But again, feel free to validate the XML correctness with whatever command you appreciate the most. And if you are RHEL customer, you should probably use the ones provided by libxml2, as you’ll be fully supported 😉

      • sbts

        xmlstarlet is more than just a validation tool.
        It allows you to extract a full tag list, values etc.
        Even better it allows changing, adding, deleting a value by key.
        No stylesheets needed in most cases either.

        Seems like a winner all round.

  • Yes, but that was never the point of this article ! 😉 I never said “xmlwf” is the best validation tool is the world. I just said, if you write a shell script which does XML validation, please check that the content is OK using a tool like “xmlwf”.

    The point was never to argue that xmlwf is the most complete validation tool outthere. IMHO, I like it for this specific use case, but I would have no issue using xmlstarlet or something else if I need to perform more advance step.

  • Peter Pitess

    in add-server.sh : not xlstproc but xsltproc 🙂

  • george carey

    @Dv the xmllint –c14n foo.xml. > cfoo.com does not generate a standard canonicalised file for two functionally equivalent xml files ; so that two files xml files could be compared with a diff command in a lot of cases. Can you make it do so?

  • george carey

    @Dv Example of what i am taking about.
    Xml file 1:

    Xml file 2:

    These don’t get made canonically equal even though they are!
    Same with many other functionally equivalent xml examples.

  • george carey

    My xml got stripped out on post. ?