blog

XML and XSLT

Not terribly long ago, XML was the darling of the web. HTML4 was reformulated as XHTML 1.0, SOAP messages were XML, and let us not forget XMLHttpRequest.

XML’s 15 minutes of fame came and went, however, when JSON entered the scene. Your ajax request today is far more likely to receive a JSON return that can parsed into native objects than an XML document that has to be parsed like markup; HTML5 parted ways with XML compliance; REST replaced SOAP for most development (and while you can, of course, send and receive XML with REST, its certainly less common).

XML’s passing is hardly to be lamented – JSON really is the better choice in many instances. But out of XML’s celebrity status, its human-readable/editable similarity to markup, and the need/desire to employ it in a wide variety of scenarios, came some interesting benefits that linger today – one of them being Extensible Stylesheet Language Transformations (XSLT). As promised previously let’s take a look at what XSLT is, and what it can offer us.

So what is XSLT?

Per MDN:

[XSLT] is an XML-based language used, in conjunction with specialized processing software, for the transformation of XML documents. Although the process is referred to as “transformation,” the original document is not changed; rather, a new XML document is created based on the content of an existing document. Then, the new document may be serialized (output) by the processor in standard XML syntax or in another format, such as HTML or plain text.

So, let’s break this into a problem/solution formulation.

Problem:

You need an [client|user]-editable document in markup-like format (or just want to completely separate content from presentation details), that nonetheless will not be markup — maybe you don’t want to burden the client/user with needing to understand the structure of an html document, maybe it needs to contain (meta-)data that html doesn’t easily accommodate, maybe the data contained therein needs to be transformed in various ways, or it needs to be transformed into multiple formats (i.e. not just HTML, but also PDF or Postscript, and you don’t want to rely on tools like wkhtmltopdf.)

Solution:

XML+XSLT. If it seems like you’d need a a fairly specific set of circumstances to ever need to delve into this tech, you’d be right — XSLT isn’t particularly common for just that reason. When the circumstances do crop up, however, it can prove the best tool for the job. There are some pitfalls to watch out for, though – especially if you’re hoping to use the support baked into many modern browsers.

Examples

Some easily available references (listed below) can offer you a full picture of what XSLT (or at least, XSLT 1.0, which is the form baked into most modern browsers) looks like. Rather than reproduce that detail, let’s take a look at what an XSL document might look like.

The general structure of a well-formed XSLT document will be modular – you want to take advantage of xsl:templates to seperate concerns, keeping your code well-organized and potentially reusable. Often times you’ll nest template calls within templates with increasing specificity, or to target other parts of the xml document you want to include within the structure of your new document at a certain place.

For instance, let’s consider this snippet, which describes a side-nav element for an html transformation for some xml (in the Docbook format):

<div class="sidebar affix">
    <div id="tabContent" class="tab-content">
        <div id="tocTab" class="tab-pane active">
            <ul class="nav nav-list">
                <xsl:apply-templates select="//book">
                </xsl:apply-templates>
            </ul>
        </div>
    </div>
</div>

Unseen at the top is a tag which matches us to the root of the xml document. In the snippet above, for each book node we encounter, we’re applying the relevant template:

<!--Template for book select on side nav -->
<xsl:template match="book">
    <a href="#{@xml:id}">
        <li class="branch">
            <i class="more icon-chevron-right"></i> <xsl:value-of select="info/title" />
        </li>
    </a>
    <div class="chapterSection" style="display:none;">
        <ul class="nav nav-list">
            <xsl:apply-templates select="chapter">
            </xsl:apply-templates>
        </ul>
    </div>
</xsl:template>

And the above snippet shows our template, with its match attribute indicating what node it matches, so when we call apply on book nodes, this is the template to use. The code>@{xml:id} is an attribute value template, enclosing an XPath expression – in this case, to get an attribute that will be included in the anchor href.

You’ll notice that within the ul.nav.nav-list we have another template call – as noted previously, this allows us to separate out the various templates for various sections and functionality, so that they can be used elsewhere, and are more readable/maintainable than hundreds (or thousands) of lines that do everything all clumped together.

You can have more than one template for a given node name, by specifying a mode for the template – for instance:

    <!-- Template for chapter select on side nav -->
    <xsl:template match="chapter">
        <a href="#{@xml:id}">
            <li class="nav-header">
                <i class="more icon-chevron-right"></i> <xsl:value-of select="info/title" />
            </li>
        </a>
        <div class="sectionLinks" style="display:none;">
            <xsl:for-each select="section">
                <a href="#{@xml:id}">
                    <li class="nav-subitem"><xsl:value-of select="title" /></li>
                </a>
            </xsl:for-each>
        </div>
    </xsl:template>
<!-- Template for chapters in content view -->
<xsl:template match="chapter" mode="inner">
    <div id="{@xml:id}" class="chapter">
        <h1 class="chapterTitle"><xsl:value-of select="info/title" /></h1>
        <xsl:for-each select="section">
            <div id="{@xml:id}" class="section">
                <h3 class="sectionTitle"><xsl:value-of select="title" /></h3>
                <xsl:apply-templates select="*[not(self::title)]">
                </xsl:apply-templates>
            </div>
        </xsl:for-each>
    </div>
</xsl:template>

XSL is a Turing-complete language, so for the most part, anything you might do in another language is doable in XSL – that said, it finds its best fit as a templating language, rather than general purpose use.

Static Compilation

So, you’re convinced that XSL is going to solve your problems (whatever those may be) – you’ll be able to create one data document, and then as many XSL documents as you need to present your data anywhere (HTML for the web, PDF for the execs, plain text for the the company records, ASCII art for your own amusement…) Sounds good. You’re creating those statically, I assume?

Because if you’re going to rely on the baked-in XSLT support in browsers, you’ll likely run into some problems. The references below will help you navigate some of the ones found in Trident (IE) and Gecko (Firefox) – the MDN reference is particularly good about calling out where support is lacking. Even so, you’re likely to encounter issues with certain versions of the clients not performing exactly the same (for instance, Firefox 20 broke XSL support from the local file system – Firefox 20.0.1 fixed it).

My suggestion is, if you’re offering the XML+XSL from a web server that can perform the transform for you (such as apache with mod-xslt) use that instead, and save yourself the QA time. Otherwise, use a processor like Saxon to produce a static version of your output to display, and re-run when you make changes – you lose the ability to simply update the XML and have those changes instantly reflected, but gain cross-browser stability. You’ll also gain access to newer versions of the XSLT and XQuery specifications than the browsers offer.

re: The Future

As someone who’s used XSLT for a few projects and counts the learning curve as time well-spent, I’m in a different position than someone considering learning it for the first time. Given how sparsely used it is, is it even worth the time investment?

As answer, I direct you to consider this thread from the chromium group, considering deprecating support for XSLT from Blink (Google’s fork of webkit). The consensus by the end seems to be that, while support for it in the browser is ‘nice-to-have’, it’s too seldomly used to justify its size in the binary, or its place in the code. At the same time, any number of individuals were citing their specific use cases (or even multi-year careers) built on XSLT — while the open web seems little interested in XSLT, corporate intranets are another story.

The answer would seem, then – if you have a specific use case at hand where XSLT would be useful, based on what you know of it – yes, go ahead and add it to your developer’s toolbelt (but be sure not to count on browser support of the standard in the future!). If you’re just passingly curious, pass on by – there are other, potentially more useful technologies out there, waiting for your attention.

Personally, I would very much like to see an approach like XSLT for JSON make an appearance (hopefully with the same emphasis on simplicity that JSON itself enjoys) – for now, we fill that gap with javascript templates and DOM manipulation, for better or for worse.

References

I prefer to learn by doing – pick a project that can benefit from XSLT, and go for broke! Both the MDN and MSDN references for XSLT are quite good – I’d suggest using both as you learn, with one sometimes offering details that the other glosses over.

If you’d prefer the dead-tree approach, Michael Kay’s XSLT 2 and XPath 2 Programmer’s Reference comes well-recommended.

Spot the Vulnerability: Loops and Terminating Conditions

by Paul Hendry | Jan 7, 2022 | Developer Blog, Home Display

Spot the Vulnerability: Loops and Terminating Conditions In memory-unsafe languages like C, special care must be taken when copying untrusted data, particularly when copying it to another buffer. In this post, we\'ll spot and mitigate a past vulnerability in Linux\'s...

Accurate Timing

by Jason Bagley | Sep 24, 2021 | Developer Blog, Home Display

In many tasks we need to do something at given intervals of time. The most obvious ways may not give you the best results. Time? Meh. The most basic tasks that don't have what you might call CPU-scale time requirements can be handled with the usual language and...

Exploring Dependent Types in Idris

by Paul Hendry | Aug 27, 2021 | Developer Blog, Home Display

When I'm not coding the "impossible" at Art+Logic, I take a lot of interest in new programming technologies and paradigms; even if they're not yet viable for use in production, there can often be takeaways for improving your everyday code. My current...

« Older Entries