Home > Software Development > Using XSLT to Replace Arbitrary XML Located Within An XML Document

Using XSLT to Replace Arbitrary XML Located Within An XML Document


I spent some time today working on an interesting XML/XSL problem. Not interesting like cold fusion research, but interesting like what-a-boring-day interesting.

Inside of two document elements are strings with embedded XML that needed to be changed into something else. For example:

<legal-xml>String with<arbitrary-xml/> inside <arbitrary2-xml>of</arbitrary2-xml></legal-xml>

XSL has no problem finding the <legal-xml> element, but allowing me to find and change the <arbitrary-xml> elements is interesting.

Here is the XML input:

<?xml version="1.0" encoding="UTF-8"?>

<documents>
    <document>
        <title>This is <bold-term>my</bold-term> title</title>
        <description>
            <separator/>This is the <bold-term>book's</bold-term><separator/>description<separator/>
        </description>
    </document>
</documents>

In the interest of full disclosure it is safe for me to tell you that the file I was working on did not look anything like the XML document above. Contain your disappointment.

In the document the <title> and <description> elements contain the strings with additional embedded XML tags; in this case <bold-term> and <separator>. What I needed to do was change <bold-term> into the HTML <b> tag and the <separator> into asterisks.

My first thought was to use fn:replace(), but since the strings I wanted to replace are XML elements fn:replace() would not work in this case. I was going to have to use <template>s. The solution is in the XSL below.

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes" />
  <xsl:template match="/">
    <html>
      <head>
        <title>An XSL example</title>
      </head>
      <body>
        Title: <xsl:apply-templates select="/documents/document/title" />
        Description: <xsl:apply-templates select="/documents/document/description" />
      </body>
    </html>
  </xsl:template>

  <xsl:template match="bold-term" >
    <b><xsl:value-of select="." /></b>
  </xsl:template>

  <xsl:template match="separator" >***</xsl:template>
</xsl:stylesheet>

By taking the contents of <title> and <description> and processing them as standalone XML nodes I was able to use <template>s to do the dirty work for me. For example, the template for the <bold-term> element:

<xsl:template match="bold-term" >
    <b><xsl:value-of select="." /></b>
</xsl:template>

takes care of pulling the text surrounded by <bold-term> and prints it between the <b> tags. The template for <separator> is only slightly different; all I wanted to do was put 3 asterisks in place of the element which I did quite unceremoniously.

Yes, there is a reason why the <separator> template is on one line: I wanted the asterisks on the same line as the description. If I had formatted the template by putting the asterisk on a separate line:

<xsl:template match="separator" >
  ***
</xsl:template>

the output would have looked more like:

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head>
    <title>An XSL example</title>
  </head>
  <body>
        Title: This is <b>my</b> title
        Description:
  ***
  This is the <b>book's</b>
  ***
  description
  ***
  </body>
</html>

Instead, by keeping the <separator> template on one line, I was able to generate the output I wanted.

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head>
    <title>An XSL example</title>
  </head>
  <body>
        Title: This is <b>my</b> title
        Description: ***This is the <b>book's</b> ***description***</body>
</html>

For those of you new to XSL, you might be wondering how the text that was not surrounded by any elements was output.

This is <bold-term>my</bold-term> title

Since the template only cared about the <bold-term> element it should have only printed the word “my”. Somehow the entire string was printed instead. The answer is in the default XSL templates. XSL has a template for plain text which was automatically applied when text was being processed so instead of generating:

<b>my</b>

it generated:

This is <b>my</b> title

I worked on this using the XML Copy Editor which lets me create an XML file in one tab, an XSL file in another and view the output of the transform in another simply by pressing F8 in the XML tab. Simplicity at its best. I highly recommend it.

Advertisements
  1. April 15, 2010 at 1:51 pm

    Awesome! One thing, though. What if I did want it to ignore everything but my? Suppose I had a block of HTML in a node and all I wanted to pull was an img, for instance.

    • cvalcarcel
      April 18, 2010 at 1:49 pm

      Great question! Without having an example of your input and what you expect as output I can only make something up.

      Let’s say that the input document listed above is left alone.

      My output can look like this with just a slight change:

      <?xml version="1.0" encoding="UTF-8"?>
      <html>
        <head>
          <title>An XSL example</title>
        </head>
        <body>
              Title: This is <b>my</b> title
              Description: <b>my</b><b>book's</b></body>
      </html>
      

      If the XSL is changed from:

      Description: <xsl:apply-templates select="/documents/document/description" />
      

      to:

      Description: <xsl:apply-templates select="//bold-term" />
      

      Then the output will exclude the extra text and only include the text within <bold-term>.

      Was that even close?

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: