For Matt Woodward: The Limitations of XML
Case 1: Simple, Non-Hierarchical Data
The first reason not to use XML is for simple - not hierarchical data it isn't necessary. If you only have property value pairs for a given config file, you can implement that using a simple = delimited, single pair per line format which does the job with less characters, increasing the readibility of the config data.
Here are two sample XML config files for property value pairs:
<property name="Property2" value="Value2" />
<property name="Property3" value="Value3" />
<property name="Property4" value="Value4" />
<value>Value1</value>
</property>
<property name="Property2">
<value>Value2</value>
</property>
<property name="Property3">
<value>Value3</value>
</property>
<property name="Property4">
<value>Value4</value>
</property>
And here is what a config file would look like:
Property1=Value1
Property2=Value2
Property3=Value3
Property4=Value4
It allows you to fit more information on a page and requires less visual processing to grok. If your data is not hierarchical and you are confident that it will not become so (a big assumption, yet one that it valid in certain use cases) then XML is probably not the best concrete syntax to use.
Case 2: Scriptable configuration
Lets say that I want to describe my bean dependencies for a DI/IoC framework. And let's say that in my application, every business object has a Service class that depends on a DAO class. How would I express that in XML? Below is an example for three business objects.
<constructor-arg name="UserDAO">
<ref bean="UserDAO" />
</constructor-arg>
</bean>
<bean id="ProductService" class="com.Product">
<constructor-arg name="ProductDAO">
<ref bean="ProductDAO" />
</constructor-arg>
</bean>
<bean id="ArticleService" class="com.Article">
<constructor-arg name="ArticleDAO">
<ref bean="ArticleDAO" />
</constructor-arg>
</bean>
Now imagine an application with 70 or 80 business objects and with 2-3 standard dependencies per business object. I'd show a code sample, but I'm just too lazy to type that much XML. I'd rather just write:
BusinessObjectList = "User,Product,Article";
For (Count = 1; Count lte listlen(BusinessObjectList); Count = Count + 1)
{
ObjectName = ListGetAt(BusinessObjectList, Count);
variables.Bean[ObjectName].Path = "com.#ObjectName#";
variables.Bean[ObjectName]ConstructorDependencies = "#ObjectName#DAO";
};
</cfscript>
You can argue with whether this should be using a configuration bean to encapsulate the structure of the config data, but it is hard to argue against the need for looping and logic constructs for large configuration files. If you have any doubt, check out the discussions in the Java world which clearly highlight the strengths and weaknesses of XML for different use cases.
Conclusions
For safely and fully describing hierarchical data without using looping and conditional constructs, XML is the "least bad solution to the problem". Try replacing XML with something else and the most you can do without losing semantic distinctions is to replace the angle brackets with white space as in Python (and for some use cases even that isn't possible). You can remove the naming of each tag or attribute value, but then your document is much more brittle and dependent on remembering of conventions (second argument is the field length). This is actually OK for certain use cases where an expert user is looking for a terse syntax for describing a DSL that they work with frequently.
For instance, I prefer:
User extends BaseObject
@FirstName: Name: Optional
@LastName: Name: Optional
To:
<attribute name="FirstName" DataType="Name" Required="No" />
<attribute name="LastName" DataType="Name" Required="No" />
</object>
It is clearly much more concise and I can work better with it (and can write a simple parser to translate it at compile time into XML if I want to use XPath or XSLT to work with it), but it does give up meaning and safety (in terms of redundancy) for conciseness.
XML is a great solution to many problems, but not the one true solution to all problems, so I hope I've clarified some of the use cases where it is a less appropriate solution. Matt?!


When you need a lightweight format for talking to Flash, url delimited variables are much shorter and compact than XML. This is important not just from a data size angle where the Flash Lite player is only capable of parsing so much data because of the limited phone CPU, but also because users in the USA at least are charged out the yang for data use, thus you must be a responsible developer to only use what you need. It's basically your config example with & as the delimiter instead of a newline.
Additionally, CSV is another older, yet more compact format that "just works".
Finally, JSON can represent data structures more compactly than XML can.
All 3 are alternatives, but just don't get a lot of press because the Enterprises typically push these smaller formats to the limit.
<cfscript>
BusinessObjectList = "User,Product,Article";
For (Count = 1; Count lte listlen(BusinessObjectList); Count = Count + 1)
{
ObjectName = ListGetAt(BusinessObjectList, Count);
variables.Bean[ObjectName].Path = "com.#ObjectName#";
variables.Bean[ObjectName]ConstructorDependencies = "#ObjectName#DAO";
//yuck factor
if(ObjectName eq "Article"){
variables.Bean[ObjectName]ConstructorDependencies = listAppend(variables.Bean[ObjectName]ConstructorDependencies, "ProductService");
}
};
</cfscript>
OR
<!-- yum factor -->
<bean id="UserService" class="com.User">
<constructor-arg name="UserDAO">
<ref bean="UserDAO" />
</constructor-arg>
</bean>
<bean id="ProductService" class="com.Product">
<constructor-arg name="ProductDAO">
<ref bean="ProductDAO" />
</constructor-arg>
</bean>
<bean id="ArticleService" class="com.Article">
<constructor-arg name="ArticleDAO">
<ref bean="ArticleDAO" />
<ref bean="ProductService" />
</constructor-arg>
</bean>
variables.Bean.Article.ConstructorDependencies = ListAppend(variables.Bean.Article.ConstructorDependencies,"ProductService); and there would be no need to put that within my loop.
Also, indulge me. Write out the XML for 80 business objects with 3 dependencies - not just for 3 with 1. For the actual length of example either of us could be bothered to type, I think XML is better. For larger files with lots of repitition it is painful which is why there has been a big push in Spring to support easier scripted config files as many people in the Java world are already using Spring for larger projects than most people in the CF world use CS for.
Side issue, as per my previous posting, lets allow comma delimited lists! Your XML was slightly incorrect. End should have read:
<constructor-arg name="ArticleDAO">
<ref bean="ArticleDAO" />
</constructor-arg>
<constructor-arg name="ProductService">
<ref bean="ProductService" />
</constructor-arg>
I'd prefer to support aliasing elsewhere and just to use
ConstructorBeanList = "ArticleDAO,ProductService";
Agreed, although csv doesn't support hierarchical data and I think you have to be careful with JSON as it is much less self describing. For data transfer where every bit counts it is a good solution, but for a manually edited configuration file, I'd say there is a little more chance of error (as with my custom example above) so it is only best for certain use cases, but I certainly agree that in the right cases both of these are superior to XML.
You do realise at the end of the day, it's a lot easier for most people to utilise an already built xml parser with a custom XSD or DTD than write their own parser for their own DSL...?
When I look at XML Schemas and DTDs these days that's what I see - a really simple way to build a grammar for a DSL, and since the parser is pretty much handed to you on a platter with XML, it just means you have to write evaluation (tree walking).
So while I agree with you that it's a lot easier to write something like
User extends BaseObject
@FirstName: Name: Optional
@LastName: Name: Optional
I question if most people can/will write parsers to do it? Maybe with some of the newer tools coming out to write parsers? But can people be bothered? Who knows.
Just a thought.
Agreed 100%. I'm actually a fan of XML for this very reason. But Matt asked for a good reason *not* to use XML so I provided two. If he'd been anti-XML and asked for good reasons to use it I would have provided those instead. I think it is always important to be able to argue both sides of something even if you happen to have a preference - keeps you intellectually honest :->
All that said, parsing simple value/pair config files is trivial and if you just cfset into a struct, no parser is required at all. Of course, for those of us who are considering playing around with parsers, what we need is a tutorial from someone in the CF world who's given them a try. I don't know, maybe someone with the chops to create a great community framework who has experience with (let's say) ANTLR?!!!
:->
We've been playing around with actually using Ruby to generate our Spring config - we got so sick of having a 3000-plus-line beanspace definition, and the semantic hoops we had to jump through to express what we needed to express, that m'colleague Jan came up with "Springy" -
http://www.trampolinesystems.com/weblog/wiring-up-...
(Yes, I know, Springy is not a great name - but we tried and tried to come out with a furry-animal-related acronym, but we eventually failed...at least it's trampoline-related :) )
Great link - many thanks! Ended up turning into a posting . . .
http://www.pbell.com/index.cfm/2007/2/6/Language-M...
Cheers!
Are you trying to kill me here? ;)
Sometimes I think you are trying to kill me. Arg, it's 2am, and I'm going to bed now.
Anyway, what are you whining about These are all simple, lightweight postings At least it isn't all about abstract grammars and refactoring of DSLs!