Do not underestimate W3C specs
At many forums, I see people posting questions that they could easily answer themselves, if they would only read the specification for the relevant technology. Now of course, specifications are generally known to be voluminous and dense and consequently hard to read and sparse on relevant information. You are better of finding a tutorial, reference guide or simply using Google.
However, not so with the w3c specs. If you have a problem in html, xml, xslt, xpath, etc., the relevant w3c spec is really the first place you should look. They are very readable, littered with examples and have two other major advantages above, for instance, w3schools.com:
* Confusion pleads guilty of the last
However, not so with the w3c specs. If you have a problem in html, xml, xslt, xpath, etc., the relevant w3c spec is really the first place you should look. They are very readable, littered with examples and have two other major advantages above, for instance, w3schools.com:
- They are always complete
- They are always correct
* Confusion pleads guilty of the last
XML attributes are pointless?
When you put together an XSD to specify how certain XML documents should look, you are always confronted with the choice to make parts of the information you wish to convey either an element or an attribute. Some people seem to think attributes are pointless and proponents of XML alternatives JSON and YAML sometimes opine as much.
Now I'm not going to make a thorough analysis of when to use elements and when to use attributes, because that has already been done on a great number of occasions. What I wish to do, is to give a simple example of a case in which an attribute makes the information that is to be conveyed easier to understand:
NB. After H!ghGuy's first response I realized I forgot to give an important piece of information: my intention is to store responses to a questionnaire. This is part of a larger piece if XML, storing, for instance, a name, address, etc.
XML:
If the 'question' reference needs to be an element, you get either
XML:
in which case the fact that the element requires an awkward name is indication enough that something is wrong, or
XML:
Here D is not actually an 'element' in the way 'element' is used above, but it is an obvious alternatieve.
In the last case, the question seems to be part of the answer element, just like the actual answer is part of the answer element. That just doesn't sit right with me, as they have a decidedly different relationship to the concept of an 'answer'. In JSON and YAML the same solutions appear and I think all are equally unsatisfying.
In short: child elements together constitute the parent element. If you need to provide a piece of meta-information or want to 'annotate' an element, using an attribute is a clear way in which to distinguish this kind of information from the constitutive kind.
Now I'm not going to make a thorough analysis of when to use elements and when to use attributes, because that has already been done on a great number of occasions. What I wish to do, is to give a simple example of a case in which an attribute makes the information that is to be conveyed easier to understand:
NB. After H!ghGuy's first response I realized I forgot to give an important piece of information: my intention is to store responses to a questionnaire. This is part of a larger piece if XML, storing, for instance, a name, address, etc.
XML:
1 | <questionnaire>
|
If the 'question' reference needs to be an element, you get either
XML:
1 | <questionnaire>
|
in which case the fact that the element requires an awkward name is indication enough that something is wrong, or
XML:
1 | <questionnaire>
|
Here D is not actually an 'element' in the way 'element' is used above, but it is an obvious alternatieve.
In the last case, the question seems to be part of the answer element, just like the actual answer is part of the answer element. That just doesn't sit right with me, as they have a decidedly different relationship to the concept of an 'answer'. In JSON and YAML the same solutions appear and I think all are equally unsatisfying.
In short: child elements together constitute the parent element. If you need to provide a piece of meta-information or want to 'annotate' an element, using an attribute is a clear way in which to distinguish this kind of information from the constitutive kind.
An XSLT to turn XML into CSV
Today I needed to turn an XML file into a CSV file. I was sure someone would have solved this problem before, but I could not find an appropriate XSLT. The problem can be seperated into two subproblems: one is 'flattening' the XML, by which I mean turning it from
XML:
into
XML:
considering I am interested in converting each 'element' into a CSV line.
The 'namespaced' element names are required, because they serve as the CSV column headers and they are required to be unique (which, for our case, is guaranteed by this approach).
The other subproblem is converting XML to CSV, of which the main challenge was making sure the last element is not followed by a comma.
In the end, I came up with the templates below to print the values of the 'childless' elements.
XML:
The upper template applies the lower template to the child nodes of element nodes called 'element', one 'element' at a time. The lower template determines whether the node has any child elements. If it doesn't, it prints the node. Otherwise, it recursively applies this template to the child nodes that were present. Finally, a comma is placed for each element that isn't the last in the node-set. What's a bit tricky here is that no comma is placed for the last grandchild of an 'element', so you might expect those to be missing, but that one is provided by the comma after the element itself.
XML:
1 | <root>
|
into
XML:
1 | <root>
|
considering I am interested in converting each 'element' into a CSV line.
The 'namespaced' element names are required, because they serve as the CSV column headers and they are required to be unique (which, for our case, is guaranteed by this approach).
The other subproblem is converting XML to CSV, of which the main challenge was making sure the last element is not followed by a comma.
In the end, I came up with the templates below to print the values of the 'childless' elements.
XML:
1 | <xsl:template match="//element">
|
The upper template applies the lower template to the child nodes of element nodes called 'element', one 'element' at a time. The lower template determines whether the node has any child elements. If it doesn't, it prints the node. Otherwise, it recursively applies this template to the child nodes that were present. Finally, a comma is placed for each element that isn't the last in the node-set. What's a bit tricky here is that no comma is placed for the last grandchild of an 'element', so you might expect those to be missing, but that one is provided by the comma after the element itself.
JSON to XML conversion
If a form POST needs to pass a lot of structured data, it might be advisable to prescribe an XML format in which the data should be passed. This enables validation of the data with an XSD and the readability makes debugging easier. However, building an XML document in javascript is neither elegant nor easy. It is much easier to just build the object structure and convert it to a JSON string, for instance using this helper library (2.3 KB when 'compressed' with a decent utility; smaller if you don't need the 'parse' method and strip it).
Allowing JSON to be posted required the receiver to convert the JSON to XML before validating it, which poses three problems
The second problem can be solved in much the same way: by prefixing the attributes-to-be, for instance with xml_attr_<elementname>. Another solution is keeping a (synchronised) list of the attributes appearing in the XSD, checking for their presence and converting them if required. Again, boh solutions have problems, similar to the solutions to the first problem.
The third problem can again be solved in two ways: either by requiring the JSON to be constructed in the correct order and keeping that order intact when converting the JSON to XML (for instance by using these classes, modified by changing the HashMap in JSONObject to a LinkedHashMap) or you could write code to magically impose the order required by the XSD on the resulting XML. The last solution poses a pretty daunting task, while the first solution is easier, provided the users receive clear feedback about ordering problems when testing the JSON they constructed.
After this analysis, our conclusion was that for our case, allowing the webdevelopers to post JSON is an acceptable solution, as we
Allowing JSON to be posted required the receiver to convert the JSON to XML before validating it, which poses three problems
- JSON doesn't have any namespaces
- JSON doesn't distinguish between elements and attributes, like XML does
- JSON doesn't really care about ordering
- Namespace the JSON elements in some way ( { ns1_element1: "bar" } )
- Magically determine which element is from which namespace
The second problem can be solved in much the same way: by prefixing the attributes-to-be, for instance with xml_attr_<elementname>. Another solution is keeping a (synchronised) list of the attributes appearing in the XSD, checking for their presence and converting them if required. Again, boh solutions have problems, similar to the solutions to the first problem.
The third problem can again be solved in two ways: either by requiring the JSON to be constructed in the correct order and keeping that order intact when converting the JSON to XML (for instance by using these classes, modified by changing the HashMap in JSONObject to a LinkedHashMap) or you could write code to magically impose the order required by the XSD on the resulting XML. The last solution poses a pretty daunting task, while the first solution is easier, provided the users receive clear feedback about ordering problems when testing the JSON they constructed.
After this analysis, our conclusion was that for our case, allowing the webdevelopers to post JSON is an acceptable solution, as we
- All our elements are in one namespace,
- The webdevelopers see no problem in making sure the JSON is ordered correctly
- An attribute prefix does not really limit the possibilities, as long as it is documented, the webdevelopers are kept aware of it and clear feedback points out where they forgot to mark an attribute
A namespace gotcha in XSL transformations
Say you have an xml document that conforms to the schema it references
XML:
You use the default namespace for the namespace from which you will reference the most element, to keep the document as short and readable as possible.
Now you want to transform this bit of XML using an XSLT:
XML:
and you expect the output to read
XML:
Unfortunately, this won't work, because of this tiny fact from section 2.4 of the XSLT specification:
XML:
I'm still not sure why this is the case, but I do know it took me quite a while to figure out...
NB. I know this schema.xsd schemalocation reference will only work for a local file and even then only in some cases. Replace schema.xsd by
http://xml.mydomain.nl/meaningful-path/1.0/schema.xsd before nitpicking about syntax
XML:
1 | <?xml version="1.0" encoding="UTF-8"?>
|
You use the default namespace for the namespace from which you will reference the most element, to keep the document as short and readable as possible.
Now you want to transform this bit of XML using an XSLT:
XML:
1 | <?xml version="1.0" encoding="UTF-8"?>
|
and you expect the output to read
XML:
1 | <?xml version="1.0" encoding="UTF-8"?>
|
Unfortunately, this won't work, because of this tiny fact from section 2.4 of the XSLT specification:
As a result, only this will work:The default namespace is not used for unprefixed names.
XML:
1 | <?xml version="1.0" encoding="UTF-8"?>
|
I'm still not sure why this is the case, but I do know it took me quite a while to figure out...
NB. I know this schema.xsd schemalocation reference will only work for a local file and even then only in some cases. Replace schema.xsd by
http://xml.mydomain.nl/meaningful-path/1.0/schema.xsd before nitpicking about syntax