By Peter Bell

Basic Heuristics When Developing an XML Schema

There are a couple of decisions you need to make when transforming an abstract grammar into a concrete XML syntax. Given this is my first time developing an XML schema, I thought I'd throw out some of my thoughts on handling them and see what other people thought . . .

In general terms, when you have a piece of information that needs to be described in your XML schema, you can make it an attribute or an element.For example:

<Property name="FirstName" />

or

<Property>
<Name>FirstName</Name>
</Property>

There seems to be quite a bit of debate over when to use each of these approaches, but coming from an abstract grammar that problem has already been well thought through. For me, you only add an element to a schema when there could be more than one of that concept within its parent or where the concept itself has attributes.

More than One
Obviously, if a BusinessObject can have 0..n Properties (where BusinessObject and Property are both concepts in your abstract grammar), you're going to want to use elements for the properties . . .

<BusinessObject name="User">
<Property>FirstName</Property>
<Property>LastName</Property>
</BusinessObject>

works

<BusinessObject name="User" property="FirstName" property="LastName" />

obviously doesn't

The Concept Also has Attributes
Let's take a Property concept which might have a DataType attribute/concept. At first you might decide just to do:

<Property name="FirstName" DataType="Name" />

. . . treating DataType as an attribute of the Property element. However, if you have the ability to parameterize a data type, you will need to treat it as a separate element:

<Property name="FirstName">
<DataType name="Name" field="TextBox" Size="20" />
</Property>

There is also a third possible use case for making a concept an attribute rather than an element. It is often described around it being "something independent" of the element it is contained within - broadly analogous to associated vs. composed objects in OOP. I haven't really come across a case yet where this has seemed like a meaningful rationale to me as I see as both elements and attributes as being capable of pointing to anything - including rich, independent, associated concepts, but then this is all really about human readability, so there is going to be an element of preference/style involved.

The only other thing I've noticed that is somewhat unique to the XML concrete syntax is the idea of element text as opposed to an XML attribute. In XML both of the following are valid statements:

<Property>FirstName</Property>

<Property name="FirstName" />

In one case the information is contained within the tag itself as element text and in the other it is a named attribute of the tag. One obvious limitation with the first option is that it can only be used for one value per tag (but it doesn't have to be limited to tags with only a single value - you can use a combination of n-attributes and a single value within the tag).

There are a number of strengths and weaknesses enumerated here and here. It is interesting to see some of the opinions and trade offs between the two. Anyone got any other "big thoughts" on attributes vs. element text?

Some additional resources here.

Thoughts?

Comments
BlogCFC was created by Raymond Camden. This blog is running version 5.005.