Archive

Archive for September, 2008

In Memoriam – William Valcarcel

September 11, 2008 Leave a comment

William Valcarcel

11/14/1946 – 9/11/2001

Gone, but not forgotten. He is missed every day.

From my daughter, Lindley, whom I love very much and also miss everyday:

http://www.9-11heroes.us/v/William_Valcarcel.php

Advertisements
Categories: memoriam Tags: ,

Using XSLT to Replace Arbitrary XML Located Within An XML Document

September 6, 2008 2 comments

I spent some time today working on an interesting XML/XSL problem. Not interesting like cold fusion research, but interesting like what-a-boring-day interesting.

Inside of two document elements are strings with embedded XML that needed to be changed into something else. For example:

<legal-xml>String with<arbitrary-xml/> inside <arbitrary2-xml>of</arbitrary2-xml></legal-xml>

XSL has no problem finding the <legal-xml> element, but allowing me to find and change the <arbitrary-xml> elements is interesting.

Here is the XML input:

<?xml version="1.0" encoding="UTF-8"?>

<documents>
    <document>
        <title>This is <bold-term>my</bold-term> title</title>
        <description>
            <separator/>This is the <bold-term>book's</bold-term><separator/>description<separator/>
        </description>
    </document>
</documents>

In the interest of full disclosure it is safe for me to tell you that the file I was working on did not look anything like the XML document above. Contain your disappointment.

In the document the <title> and <description> elements contain the strings with additional embedded XML tags; in this case <bold-term> and <separator>. What I needed to do was change <bold-term> into the HTML <b> tag and the <separator> into asterisks.

My first thought was to use fn:replace(), but since the strings I wanted to replace are XML elements fn:replace() would not work in this case. I was going to have to use <template>s. The solution is in the XSL below.

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" indent="yes" />
  <xsl:template match="/">
    <html>
      <head>
        <title>An XSL example</title>
      </head>
      <body>
        Title: <xsl:apply-templates select="/documents/document/title" />
        Description: <xsl:apply-templates select="/documents/document/description" />
      </body>
    </html>
  </xsl:template>

  <xsl:template match="bold-term" >
    <b><xsl:value-of select="." /></b>
  </xsl:template>

  <xsl:template match="separator" >***</xsl:template>
</xsl:stylesheet>

By taking the contents of <title> and <description> and processing them as standalone XML nodes I was able to use <template>s to do the dirty work for me. For example, the template for the <bold-term> element:

<xsl:template match="bold-term" >
    <b><xsl:value-of select="." /></b>
</xsl:template>

takes care of pulling the text surrounded by <bold-term> and prints it between the <b> tags. The template for <separator> is only slightly different; all I wanted to do was put 3 asterisks in place of the element which I did quite unceremoniously.

Yes, there is a reason why the <separator> template is on one line: I wanted the asterisks on the same line as the description. If I had formatted the template by putting the asterisk on a separate line:

<xsl:template match="separator" >
  ***
</xsl:template>

the output would have looked more like:

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head>
    <title>An XSL example</title>
  </head>
  <body>
        Title: This is <b>my</b> title
        Description:
  ***
  This is the <b>book's</b>
  ***
  description
  ***
  </body>
</html>

Instead, by keeping the <separator> template on one line, I was able to generate the output I wanted.

<?xml version="1.0" encoding="UTF-8"?>
<html>
  <head>
    <title>An XSL example</title>
  </head>
  <body>
        Title: This is <b>my</b> title
        Description: ***This is the <b>book's</b> ***description***</body>
</html>

For those of you new to XSL, you might be wondering how the text that was not surrounded by any elements was output.

This is <bold-term>my</bold-term> title

Since the template only cared about the <bold-term> element it should have only printed the word “my”. Somehow the entire string was printed instead. The answer is in the default XSL templates. XSL has a template for plain text which was automatically applied when text was being processed so instead of generating:

<b>my</b>

it generated:

This is <b>my</b> title

I worked on this using the XML Copy Editor which lets me create an XML file in one tab, an XSL file in another and view the output of the transform in another simply by pressing F8 in the XML tab. Simplicity at its best. I highly recommend it.