By Peter Bell

Writing your first XSD (XML Schema)

One of the biggest benefits of XML is the ability to validate not only the structure of your documents, but also (within limits), the data within your elements and attributes. The most common way of describing both the structure and data validations are XML Schemas (XSDs).

This article is a beginners guide (from a beginner, to other beginners) on writing simple XSDs. If you're new to them, check it out. If you're more experienced, please use the "add comments" link at the bottom to point out all of the woeful simplifications and misunderstandings I've probably introduced :-> . . .

Let's start by looking at a simple XML document:

<?xml version="1.0" encoding="UTF-8"?>
<address>
   <address1>My Street</address1>
   <address2>My Floor</address2>
   <city>New York</city>
   <state>NY</state>
   <zip>10004</zip>
</address>

It is a simple way to capture basic US address information. How would we start to write an XSD for this? Well, have a look below at the first cut.

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">

<xs:element name="address">
   <xs:complexType>
   <xs:sequence>
      <xs:element name="address1" type="xs:string" />
      <xs:element name="address2" type="xs:string" />
      <xs:element name="city" type="xs:string" />
      <xs:element name="state" type="xs:string" />
      <xs:element name="zip" type="xs:string" />
   </xs:sequence>
   </xs:complexType>
</xs:element>
</xs:schema>

It's all XML
The first thing you'll notice is that it is also a well formed XML file. It has a version (1.0), says that it uses UTF-8 encoding and uses the w3.org XML schema namespace.

Simple or Complex?
It defines a root element of "Address" which is a complex type. What does that mean? Well, every element is simple or complex. If it is comprised of sub-elements or has attributes, it is a complex type. If it ONLY has XML text or is ONLY an attribute, it is a simple type.

Sequence
The sequence states that you must have a single address1, address2, city, state and zip tag within every address for it to be well formed and that they must appear in that order.

Type
XSD supports both it's own data types and the ability to define custom data types that extend the base types. In this case it is stating that all of the (simple) elements are strings which is one of the most forgiving data types.

Other Bits and Pieces
From what I can tell, convention is moving towards using lowerCamelCase for naming elements and attributes. More important is to choose a format and stick with it (lowerCamelCase, CamelCase, alllower, ALLUPPER, all-lower-hyphenated, etc.) as XML is case sensitive, so MyAddress is NOT the same tag as myAddress.

Additional Schema Resources
Here is an introductory chapter which provides a good starting point. W3 schools has a pretty good tutorial on XSDs. You could also check out the definitive resources:

Comments
You should take a look at <a href="http://relaxng.org/">RelaxNG</a> for schema definition too. The syntax (at least in my opinion) is a bit more readable and easier to follow. You get elements like "zeroormore" and "oneormore"; the down size is that it's a bit more verbose the XSD, but there's a compact version that makes up for it. Definitely worth checking out if you're new to schema design!
# Posted By Wayne Graham | 6/14/07 4:51 PM
Peter,

Good start. I would think about enumerating state.

<xsd:element name="state" type="xsd:statetype"/>
<xsd:simpleType name="statetype">
<xsd:restriction base="xsd:string">
   <xsd:enumeration value="AL"/>
   <xsd:enumeration value="AK"/>
   <xsd:enumeration value="AS"/>
   <xsd:enumeration value="AZ"/>
   <xsd:enumeration value="AR"/>
   <xsd:enumeration value="CA"/>
   <xsd:enumeration value="GRACE"/>
   <xsd:enumeration value="CO"/>
   <xsd:enumeration value="CT"/>
...
   <xsd:enumeration value="WY"/>
</xsd:restriction>
</xsd:simpleType>

And if you don't need to worry about Canucks (in which case just pretty up the regexp) put a pattern on zip:
<xsd:element name="zip" type="xsd:zipformat"/>
<xsd:simpleType name="zipformat">
<xsd:restriction base="xsd:string">
<xsd:pattern value="\d{5}"/>
</xsd:restriction>
</xsd:simpleType>
# Posted By Ron Alexander | 6/14/07 4:57 PM
@Wayne,

I looked at Relax NG pretty carefully. A couple of issues. Firstly ColdFusion (to my knowledge) wouldn't validate against Relax NG which would have meant I'd have had to transliterate Relax NG to XSD before processing which was one more moving part than I wanted. I get that Relax NG is superior, but from what I understand it doesn't implicitly provide the range of data type validations (it thinks that is a separate concern which is a valid viewpoint) so I wasn't absolutely sure how I'd get the type validations which was what I really needed from the system.

@Ron,

Many thanks for the comment! But you're stealing my thunder (:->) - THAT goes in the third article! I am *really* building this up one step at a time - as much so as I can feel like I know what I'm doing as two days ago I barely knew what Schemas were and for tonight I have to deliver a bunch of schemas that fully describe all the imports for an e-commerce system. You gotta love deadlines :->
# Posted By Peter Bell | 6/14/07 5:08 PM
You're right...the Xalan processor with CF won't process RelaxNG. I hadn't even thought of this before I wrote it, but I dumped it in favor of Saxon for our XML needs since I needed to be able to do XSLT 2.0 transformations.

For datatypes, it's a little confusing, but basically use the W3C XML Schema datatypes with something like this:

<grammer xmlns="http://relaxng.org.ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes">
<start>
<element name="statetype">
<choice>
<value type="string">AL</value>
<value type="string">AK</value>
<value type="string">AS</value>
<value type="string">AZ</value>
</choice>
</element>
</start>
</grammer>

One tool we've had a lot of success with this stuff (because we've moved from SGML and DSSSL to XML with XSLT2) is Trang (http://www.thaiopensource.com/relaxng/trang.html) that can convert DTDs, Schemas, and RelaxNG schemas into any one of the other formats.

That being said, I still think that XSD is the more predominate language for schemas right now, a tool folks should have in their toolbox. I do think that RelaxNG is a lot easier to learn, and with editors like oXygen, you can generate all types of wonderful documentation and convert the different versions with the click of a button to deploy as needed.
# Posted By Wayne Graham | 6/15/07 10:12 AM
Hi Peter,

Nice Intro... But It would have been still great if you can put up things like how to validate the XML file against the DTD...
# Posted By Dav R | 7/9/07 1:47 AM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.