Avoid storing configuration data in your revision control system

By Confusion on Tuesday 20 October 2009 22:02 - Comments (3)
Categories: Java, Software engineering, XML, Views: 3.283

After a discussion with a colleague this afternoon, I thought I'd share the following: you should avoid storing configuration data in your revision control system. Especially authentication credentials should not be in there. Here's why:
  1. When securing servers and networks, things like the server hosting an RCS don't get the same priority as, say, your web facing production server. Mistakes are easy to make and you can simply use Google to find 'accidentally' web facing RCS's that expose passwords.
  2. There will be plenty of copies 'out there', outside of your control. How many developers have that data stored on their machine? How careful are they with their laptops and your production passwords?
  3. Access to the configuration is limited to those that should be able to change it: no accidental changes by a junior performing a careless check in.
  4. If you can use the exact same build for your development, test/staging and production environment, then you can cleanly separate between code problems and configuration problems. If you need to rebuild a distributable archive to have the build process include environment-specific configuration, there will always be the doubt that some other difference may have sneaked in.
  5. It's much easier to change the configuration if you don't have to make a new build to deploy the change.
Now I specifically say 'avoid' and not 'do not ever', because many frameworks do not make this separation particularly easy. In the Java world, standard frameworks like Maven, Spring and Hibernate all impose obstacles to succeed at keeping sensitive configuration data out of RCS's.

Maven is a build tool that offers all kinds of build-time placeholder substitution capabilities, which is diametrically opposed to this advice. Spring does dependency injection and the configuration to wire your application together strongly attracts other types of configuration data to be included with it. And if you are paranoid enough to give production databases different names, so you can never accidentally run a test against a production database: how do you get that name into your Hibernate OR mappings at startup time?

It takes careful thought and thorough understanding of the build and startup processes, but in my opinion it is well worth it. Every time I deploy a new version of the one application in which configuration and code are completely separated, where I just have to drop a new .jar and restart, I dance with joy.

Ultrashort JAXB tutorial

By Confusion on Tuesday 03 February 2009 20:49 - Comments (3)
Categories: Java, Software engineering, XML, Views: 6.274

Today I had to use JAXB for XML-object binding. While searching for an introduction, I noted that most articles handling JAXB seemed to be overly long and concerned with all kinds of asides that detracted from the basics, which were all I needed. Therefore I now present: an ultrashort JAXB tutorial.

I will expect you know:
  • What XML-object binding is and why you would want to use it.
  • How to use Java and solve classpath issues and such
  • How to use Linux (or translate the instructions to Windows).
  • How to determine intermediate steps I left out (like 'extract the zip')
Requirements:
  • An XML file that you wish to 'unmarshal' into an object tree. Example:
    XML:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    <?xml version="1.0" encoding="UTF-8"?>
    <rootElement xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
        xmlns="http://some.sensible.url/foo"
        xsi:schemaLocation="http://some.sensible.url/foo foo.xsd">

        <subElement>
            <foo>1</foo>
            <baz>2</baz>
        </subElement>
        <subElement>
            <foo>2</foo>
            <baz>4</baz>
        </subElement>
        
    </rootElement>
  • An XML Schema description of the structure of the XML. Example:

    XML:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    <?xml version="1.0" encoding="UTF-8"?>
    <schema xmlns="http://www.w3.org/2001/XMLSchema" 
        targetNamespace="http://some.sensible.url/foo" 
        xmlns:f="http://some.sensible.url/foo" 
        elementFormDefault="qualified">

        <element name="rootElement">
            <complexType>
                <sequence>
                    <element name="subElement" type="f:subElement"
     maxOccurs="unbounded" />
                </sequence>
            </complexType>
        </element>
        
        <complexType name="subElement">
            <sequence>
                <element name="foo" type="int" />
                <element name="baz" type="int" />
            </sequence>
        </complexType>
    </schema>
Tutorial
  1. Use a Java 6 SDK (which has JAXB) or download JAXB (and use a Java 5+ JDK)
  2. Generate the objects from the xml schema by issuing
    jaxb-ri/bin/xjc.sh -p com.company.app.pakkage.of.objects \
    -d /path/to/com/company/app/pakkage/of/objects \
    /path/to/xml-structure.xsd
  3. Assuming the root element of your xml is named 'rootElement' (and available as com.company.app.pakkage.of.objects.RootElement):

    Java:
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    package com.company.app;

    import java.io.File;
    import javax.xml.bind.*;
    import com.company.app.pakkage.of.objects.*;

    public class Example {

        public static void main(String[] argsthrows JAXBException {
            
            final JAXBContext jaxbContext =
                JAXBContext.newInstance("com.company.app.pakkage.of.objects");
            final Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
            final RootElement rootElement =
                (RootElementunmarshaller.unmarshal(new File("/path/to/xml.xml"));
        }
    }
  4. Profit.

Do not underestimate W3C specs

By Confusion on Saturday 20 September 2008 10:56 - Comments (4)
Categories: Software engineering, XML, Views: 2.850

At many forums, I see people posting questions that they could easily answer themselves, if they would only read the specification for the relevant technology. Now of course, specifications are generally known to be voluminous and dense and consequently hard to read and sparse on relevant information. You are better of finding a tutorial, reference guide or simply using Google.

However, not so with the w3c specs. If you have a problem in html, xml, xslt, xpath, etc., the relevant w3c spec is really the first place you should look. They are very readable, littered with examples and have two other major advantages above, for instance, w3schools.com:
  • They are always complete
  • They are always correct
Especially the last one is not to be underestimated, lest you spend a day on a problem that turns out to be an error in a w3schools tutorial.
* Confusion pleads guilty of the last

XML attributes are pointless?

By Confusion on Thursday 04 September 2008 20:17 - Comments (7)
Categories: Software engineering, XML, Views: 3.295

When you put together an XSD to specify how certain XML documents should look, you are always confronted with the choice to make parts of the information you wish to convey either an element or an attribute. Some people seem to think attributes are pointless and proponents of XML alternatives JSON and YAML sometimes opine as much.

Now I'm not going to make a thorough analysis of when to use elements and when to use attributes, because that has already been done on a great number of occasions. What I wish to do, is to give a simple example of a case in which an attribute makes the information that is to be conveyed easier to understand:

NB. After H!ghGuy's first response I realized I forgot to give an important piece of information: my intention is to store responses to a questionnaire. This is part of a larger piece if XML, storing, for instance, a name, address, etc.

XML:
1
2
3
4
5
<questionnaire>
  <answer question="1">D</answer>
  <answer question="3b">Cow</answer>
  <answer question="last">144</answer>
</questionnaire>


If the 'question' reference needs to be an element, you get either

XML:
1
2
3
4
5
6
7
<questionnaire>
  <answer>
    <question>1</question>
    <answer_contents>D</answer_content>
  </answer
  ...
</questionnaire>

in which case the fact that the element requires an awkward name is indication enough that something is wrong, or

XML:
1
2
3
4
5
6
<questionnaire>
  <answer>
    <question>1</question>
    D
  </answer
  ...

Here D is not actually an 'element' in the way 'element' is used above, but it is an obvious alternatieve.

In the last case, the question seems to be part of the answer element, just like the actual answer is part of the answer element. That just doesn't sit right with me, as they have a decidedly different relationship to the concept of an 'answer'. In JSON and YAML the same solutions appear and I think all are equally unsatisfying.

In short: child elements together constitute the parent element. If you need to provide a piece of meta-information or want to 'annotate' an element, using an attribute is a clear way in which to distinguish this kind of information from the constitutive kind.

An XSLT to turn XML into CSV

By Confusion on Friday 29 August 2008 21:23 - Comments (3)
Categories: Software engineering, XML, Views: 9.618

Today I needed to turn an XML file into a CSV file. I was sure someone would have solved this problem before, but I could not find an appropriate XSLT. The problem can be seperated into two subproblems: one is 'flattening' the XML, by which I mean turning it from

XML:
1
2
3
4
5
6
7
8
9
10
11
12
<root>
  <element>
    <foo>1</foo>
    <bar>
      <baz>2</baz>
      <fooz>3</fooz>
    </bar>
  </element>
  <element>
    <foo>1</foo>
  </element>
</root>

into

XML:
1
2
3
4
5
6
7
8
9
10
<root>
  <element>
    <foo>1</foo>
    <bar.baz>2</bar.baz>
    <bar.fooz>3</bar.fooz>
  </element>
  <element>
    <foo>1</foo>
  </element>
</root>

considering I am interested in converting each 'element' into a CSV line.
The 'namespaced' element names are required, because they serve as the CSV column headers and they are required to be unique (which, for our case, is guaranteed by this approach).
The other subproblem is converting XML to CSV, of which the main challenge was making sure the last element is not followed by a comma.

In the end, I came up with the templates below to print the values of the 'childless' elements.

XML:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
    <xsl:template match="//element">
        <xsl:apply-templates select="*" />
        <xsl:text>&#x0A;</xsl:text>
    </xsl:template>

    <xsl:template match="//element//*">
        <xsl:choose>
            <xsl:when test="count(child::*) > 0">
                <xsl:apply-templates select="*" />
            </xsl:when>
            <xsl:otherwise>
                <xsl:text>"</xsl:text>
                <xsl:value-of select="."/>
                <xsl:text>"</xsl:text>
            </xsl:otherwise>
        </xsl:choose>
        <xsl:if test="position() != last()">
            <xsl:text>,</xsl:text>
        </xsl:if>
    </xsl:template>

The upper template applies the lower template to the child nodes of element nodes called 'element', one 'element' at a time. The lower template determines whether the node has any child elements. If it doesn't, it prints the node. Otherwise, it recursively applies this template to the child nodes that were present. Finally, a comma is placed for each element that isn't the last in the node-set. What's a bit tricky here is that no comma is placed for the last grandchild of an 'element', so you might expect those to be missing, but that one is provided by the comma after the element itself.