Do not underestimate W3C specs

By Confusion on Saturday 20 September 2008 10:56 - Comments (4)
Categories: Software engineering, XML, Views: 1455

At many forums, I see people posting questions that they could easily answer themselves, if they would only read the specification for the relevant technology. Now of course, specifications are generally known to be voluminous and dense and consequently hard to read and sparse on relevant information. You are better of finding a tutorial, reference guide or simply using Google.

However, not so with the w3c specs. If you have a problem in html, xml, xslt, xpath, etc., the relevant w3c spec is really the first place you should look. They are very readable, littered with examples and have two other major advantages above, for instance, w3schools.com:
  • They are always complete
  • They are always correct
Especially the last one is not to be underestimated, lest you spend a day on a problem that turns out to be an error in a w3schools tutorial.
* Confusion pleads guilty of the last

XML attributes are pointless?

By Confusion on Thursday 04 September 2008 20:17 - Comments (7)
Categories: Software engineering, XML, Views: 1879

When you put together an XSD to specify how certain XML documents should look, you are always confronted with the choice to make parts of the information you wish to convey either an element or an attribute. Some people seem to think attributes are pointless and proponents of XML alternatives JSON and YAML sometimes opine as much.

Now I'm not going to make a thorough analysis of when to use elements and when to use attributes, because that has already been done on a great number of occasions. What I wish to do, is to give a simple example of a case in which an attribute makes the information that is to be conveyed easier to understand:

NB. After H!ghGuy's first response I realized I forgot to give an important piece of information: my intention is to store responses to a questionnaire. This is part of a larger piece if XML, storing, for instance, a name, address, etc.

XML:
1
2
3
4
5
<questionnaire>
  <answer question="1">D</answer>
  <answer question="3b">Cow</answer>
  <answer question="last">144</answer>
</questionnaire>


If the 'question' reference needs to be an element, you get either

XML:
1
2
3
4
5
6
7
<questionnaire>
  <answer>
    <question>1</question>
    <answer_contents>D</answer_content>
  </answer
  ...
</questionnaire>

in which case the fact that the element requires an awkward name is indication enough that something is wrong, or

XML:
1
2
3
4
5
6
<questionnaire>
  <answer>
    <question>1</question>
    D
  </answer
  ...

Here D is not actually an 'element' in the way 'element' is used above, but it is an obvious alternatieve.

In the last case, the question seems to be part of the answer element, just like the actual answer is part of the answer element. That just doesn't sit right with me, as they have a decidedly different relationship to the concept of an 'answer'. In JSON and YAML the same solutions appear and I think all are equally unsatisfying.

In short: child elements together constitute the parent element. If you need to provide a piece of meta-information or want to 'annotate' an element, using an attribute is a clear way in which to distinguish this kind of information from the constitutive kind.

An XSLT to turn XML into CSV

By Confusion on Friday 29 August 2008 21:23 - Comments (2)
Categories: Software engineering, XML, Views: 2395

Today I needed to turn an XML file into a CSV file. I was sure someone would have solved this problem before, but I could not find an appropriate XSLT. The problem can be seperated into two subproblems: one is 'flattening' the XML, by which I mean turning it from

XML:
1
2
3
4
5
6
7
8
9
10
11
12
<root>
  <element>
    <foo>1</foo>
    <bar>
      <baz>2</baz>
      <fooz>3</fooz>
    </bar>
  </element>
  <element>
    <foo>1</foo>
  </element>
</root>

into

XML:
1
2
3
4
5
6
7
8
9
10
<root>
  <element>
    <foo>1</foo>
    <bar.baz>2</bar.baz>
    <bar.fooz>3</bar.fooz>
  </element>
  <element>
    <foo>1</foo>
  </element>
</root>

considering I am interested in converting each 'element' into a CSV line.
The 'namespaced' element names are required, because they serve as the CSV column headers and they are required to be unique (which, for our case, is guaranteed by this approach).
The other subproblem is converting XML to CSV, of which the main challenge was making sure the last element is not followed by a comma.

In the end, I came up with the templates below to print the values of the 'childless' elements.

XML:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
    <xsl:template match="//element">
        <xsl:apply-templates select="*" />
        <xsl:text>&#x0A;</xsl:text>
    </xsl:template>

    <xsl:template match="//element//*">
        <xsl:choose>
            <xsl:when test="count(child::*) > 0">
                <xsl:apply-templates select="*" />
            </xsl:when>
            <xsl:otherwise>
                <xsl:text>"</xsl:text>
                <xsl:value-of select="."/>
                <xsl:text>"</xsl:text>
            </xsl:otherwise>
        </xsl:choose>
        <xsl:if test="position() != last()">
            <xsl:text>,</xsl:text>
        </xsl:if>
    </xsl:template>

The upper template applies the lower template to the child nodes of element nodes called 'element', one 'element' at a time. The lower template determines whether the node has any child elements. If it doesn't, it prints the node. Otherwise, it recursively applies this template to the child nodes that were present. Finally, a comma is placed for each element that isn't the last in the node-set. What's a bit tricky here is that no comma is placed for the last grandchild of an 'element', so you might expect those to be missing, but that one is provided by the comma after the element itself.

JSON to XML conversion

By Confusion on Saturday 21 June 2008 12:48 - Comments (3)
Categories: Software engineering, XML, Views: 2864

If a form POST needs to pass a lot of structured data, it might be advisable to prescribe an XML format in which the data should be passed. This enables validation of the data with an XSD and the readability makes debugging easier. However, building an XML document in javascript is neither elegant nor easy. It is much easier to just build the object structure and convert it to a JSON string, for instance using this helper library (2.3 KB when 'compressed' with a decent utility; smaller if you don't need the 'parse' method and strip it).

Allowing JSON to be posted required the receiver to convert the JSON to XML before validating it, which poses three problems
  1. JSON doesn't have any namespaces
  2. JSON doesn't distinguish between elements and attributes, like XML does
  3. JSON doesn't really care about ordering
If the elements in the XML are all from the same namespace, the first problem can be remedied by adding the relevant default namespace declaration to the root element. If multiple namespaces are required, there are two solutions:
  1. Namespace the JSON elements in some way ( { ns1_element1: "bar" } )
  2. Magically determine which element is from which namespace
Obviously, both solutions have their problems. In the first case, you need a (synchronised) mapping of namespace identifiers to the actual namespaces that would need to be declared and, depending on usage, that mapping may be necessary in the javascript as well. In the second case, different namespaces cannot declare the same elements, which is quite limiting.

The second problem can be solved in much the same way: by prefixing the attributes-to-be, for instance with xml_attr_<elementname>. Another solution is keeping a (synchronised) list of the attributes appearing in the XSD, checking for their presence and converting them if required. Again, boh solutions have problems, similar to the solutions to the first problem.

The third problem can again be solved in two ways: either by requiring the JSON to be constructed in the correct order and keeping that order intact when converting the JSON to XML (for instance by using these classes, modified by changing the HashMap in JSONObject to a LinkedHashMap) or you could write code to magically impose the order required by the XSD on the resulting XML. The last solution poses a pretty daunting task, while the first solution is easier, provided the users receive clear feedback about ordering problems when testing the JSON they constructed.

After this analysis, our conclusion was that for our case, allowing the webdevelopers to post JSON is an acceptable solution, as we
  • All our elements are in one namespace,
  • The webdevelopers see no problem in making sure the JSON is ordered correctly
  • An attribute prefix does not really limit the possibilities, as long as it is documented, the webdevelopers are kept aware of it and clear feedback points out where they forgot to mark an attribute
The only question that remains on my part is: is it really much easier/cleaner to construct JSON that to construct XML in javascript?

A namespace gotcha in XSL transformations

By Confusion on Thursday 19 June 2008 16:56 - Comments (1)
Categories: Software engineering, XML, Views: 1136

Say you have an xml document that conforms to the schema it references

XML:
1
2
3
4
5
6
7
8
<?xml version="1.0" encoding="UTF-8"?>
<root 
   xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://xml.mydomain.nl/meaningful-path/1.0 schema.xsd"
    xmlns="http://xml.mydomain.nl/meaningful-path/1.0">

    <foo>Foo!</foo>
</root>

You use the default namespace for the namespace from which you will reference the most element, to keep the document as short and readable as possible.

Now you want to transform this bit of XML using an XSLT:

XML:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns="http://xml.mydomain.nl/meaningful-path/1.0"
    version="1.0">

    <xsl:output method="xml" indent="yes" encoding="UTF-8" />
    
    <xsl:strip-space elements="*" /> 

    <xsl:template match="/">
        <xsl:apply-templates select="//root"/>
    </xsl:template>
    <xsl:template match="root">
      <bar><xsl:value-of select="foo"</bar>
    </xsl:template>
</xsl:stylesheet>

and you expect the output to read

XML:
1
2
3
4
5
6
<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://xml.mydomain.nl/meaningful-path/1.0 schema.xsd" 
xmlns="http://xml.mydomain.nl/meaningful-path/1.0">
    <bar>Foo!</bar>
</root>

Unfortunately, this won't work, because of this tiny fact from section 2.4 of the XSLT specification:
The default namespace is not used for unprefixed names.
As a result, only this will work:

XML:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
<?xml version="1.0" encoding="UTF-8"?> 
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns:ns1="http://xml.mydomain.nl/meaningful-path/1.0"
    version="1.0">

    <xsl:output method="xml" indent="yes" encoding="UTF-8" />
    
    <xsl:strip-space elements="*" /> 

    <xsl:template match="/">
        <xsl:apply-templates select="//ns1:root"/>
    </xsl:template>
    <xsl:template match="ns1:root">
      <bar><xsl:value-of select="ns1:foo"</bar>
    </xsl:template>
</xsl:stylesheet>


I'm still not sure why this is the case, but I do know it took me quite a while to figure out...

NB. I know this schema.xsd schemalocation reference will only work for a local file and even then only in some cases. Replace schema.xsd by
http://xml.mydomain.nl/meaningful-path/1.0/schema.xsd before nitpicking about syntax :)