The Truth About XML vs. Scripting
Programmers are really language designers – creating layers of abstraction between their general purpose programming language and the domain problems they need to solve to allow their programs to be more elegant.
Have you ever written a custom tag to abstract the rendering of a paged table list? How about a tag or CFC method that hides the SQL to perform simple access to a table? If so, then congratulations – you’re a language designer! You have added a declarative layer to your code that allows you to use a more abstract language that you designed for rendering table lists or for doing simple SQL queries. (Declarative simply means that you are telling the computer what you want – not how to do it. SQL and RegEx are two well known general purpose declarative languages, but any time you create a CFC or a custom tag (or a config file) you are creating your own little language – or as they are called, a “Domain Specific Language”. DSLs can be either horizontal – data access, validation, workflow, etc. or vertical – insurance, stock trading, etc.)
Once you start looking at programming as the art of writing layers of language to allow the clearer expression of intent, you start to get a different perspective because things that used to look very different now all look to be very similar.
Lets say that our goal is to design a very simple language to allow for the returning of a list of users from a database (I’m going to make the example trivial so we can focus on the concepts – not the problem domain which can actually get quite complex once you start to support joins and havings and all of the rest). First step might be to create a custom tag (or hopefully these days a data access object written as a CFC – lets call it UserDAO). Lets add a method with a signature of getUsers(Filter: string, Orderby: string) which allows us to get a subset of the users in the database using some kind of filter and to control the order in which they return. So, while it may seem a stretch for such a simple case, I have actually created a (very) small domain specific language for describing the filtering and ordering of users.
We could even describe the abstract syntax of the language:
getUsers ::= [Filter: string] [Orderby: string]
Now usually we wouldn’t think about this, but the next choice is the concrete syntax to use to write this “language” with. By default we all use in-language DSLs just writing the “language” using method calls with either ColdFusion or CF Script, but that is only one possible notation. It would be quite possible to create an XML implementation of the abstract syntax (which would look quite like calling a ColdFusion custom tag). It might look something like:
We could even create a custom concrete syntax for describing the same intent. Perhaps something like:
getUsers: “LastName = ‘b%’”: “FirstName”
That concrete syntax is using colons for attribute delimiters and new lines for tag delimiters, making the parsing of it fairly straightforward.
The important distinction to be made is between the process of language design and the selection of a concrete syntax for describing it. I often heard it said that a downside of XML config files is the extra tags you have to learn – this just isn’t true. Assuming that an XML document has been well designed, the tags you have to learn are just the abstract syntax for a domain specific language that allows you to more elegantly express the intent of a concept than writing it natively in ColdFusion. Now, you don’t need XML for that. You could just write a set of method calls against methods that implemented the same grammar, but if you want to use those abstractions, you’re going to have to learn the concepts – whether they are expressed as an XML schema (tags) or an in-language extension (a set of method signatures).
Choosing a Concrete Syntax
The question then is how do you choose a concrete syntax? Firstly, you have to know your options. The first option is an in-language extension. In ColdFusion the options for this are UDFs, custom tags and CFCs. The second option is XML. The third option is a custom textual syntax and the fourth option is a non-textual representation – anything from a spreadsheet or data table to a graphical interface.
Briefly, special cases are sometimes best served by a data table (think state machines or some rules driven systems) or a graphical interface (for describing layout or describing things like object models). For other cases, the first choice is between in-language and external DSLs. In-language DSLs are simple and flexible. You can mix your DSL with native language elements so you can easily use loops and conditionals without having to manually add them to your XML DSL (fusebox!). On the other hand, you don’t get any of the free grammatical validation with an in language DSL, and by providing flexibility people often misuse the language, adding imperative coding in the wrong places – flexibility is both a benefit and a curse!
If you do decide to create an external DSL, you’ll probably start with XML because all of the grammatical validation comes for free. I actually prefer custom textual syntaxes to XML for anything I’m going to be using a lot. For example if I want to describe an object and it’s attributes I’d prefer:
Article: extends BaseObject
Title: text(100): required
HTML: ntext: optional
to:
<attribute name=”Title” sqldatatype=”text(100)” required=”true”>
<attribute name=”HTML” sqldatatype=”ntext” required=”false”>
</object>
However, to do this requires writing your own parsers, which right now is still a bit of a pain (although expect advances in Language Oriented Programming to simplify this somewhat in 2007/2008).
So, when you’re solving a problem, start by working up an abstract syntax, and then select one or more concrete syntaxes to support (no reason why you can’t support diagrams, XML AND in-language scripting if you want to). As per my article in GAQU2, an in-language approach using base classes to evolve your abstract grammar is often the simplest approach, although I’m working up a system for exernal DSLs that will make the creation and editing of external DSLs as easy as modifying the syntax of a set of method calls.


"Assuming that an XML document has been well designed, the tags you have to learn are just the abstract syntax for a domain specific language that allows you to more elegantly express the intent of a concept than writing it natively in ColdFusion."
Until you said this: " Now, you don’t need XML for that."
But, the one claim I don't understand is this:
"On the other hand, you don’t get any of the free grammatical validation with an in language DSL."
Of course you do - you get the compiler and runtime environments. Unless, those aren't "free" for some reason? Certainly they check your grammar though, don't they?
Yeah, I could see why you'd take issue if you thought I was going to say XML was the ONLY way to do that :->
Free grammatical validation: the XML can be validated directly against your grammar - not a very generalized ColdFusion grammar. With scripting you can throw in custom scripts and write method calls in the wrong order and do a bunch of other things that an XML document would validate and catch. There are still use cases (semantic constraints) that the XML doesn't catch, but an XML validation will be strictly against your DSLs grammar and that adds some value. Of course, you could write the validations yourself, but now you're writing a parser - either to look at a custom textual syntax or to pre-introspect yur CF scripts to make sure that the scripts meet your grammatical constraints. Either way it is a pain.
What I *don't* have a good feeling for is how much work that saves you and the percentage of errors it catches - it might be like static typing - just replace it with good unit tests and you're done. For that I'll have to experiment, but it is definitely a valid benefit of XML - whether or not it outweight the rank ugliness (IMPO) of all of those darned angle brackets!!!
"With scripting you can throw in custom scripts and write method calls in the wrong order and do a bunch of other things that an XML document would validate and catch."
For this, I see the ability to throw in custom script as a good thing, at least in the cases I am most familiar with. For instance, in cfrials which I've been working on most recently, I'm working on an API to have the "scaffolding" highly customizable (including, as you so often like to say, having richer metadata). On the other hand, not only do I want to be able to configure this object within its class, I want to be able to add, remove, and change behavior.
At least as far as I've been able to tell, XML would only hinder this process (as would a custom language with its own parser). So of course, with different goals come different solutions.
Further, I am unconcerned about calling methods in the wrong order (I guess not unconcerned, but I feel I ought not to require a certain order). I know sometimes the limitations of the language may push you into bad abstractions, but I've yet to come across a case (at least, that I remember) where I was /forced/ into providing an API which required you to call methods in a certain order. If it is the case that you need a certain order, I'd look into providing a better abstraction. (Of course, I am guilty of requiring a certain order, and even recently, but if I am serious about it, I'll try really hard to find a way around it).
However, there is one exception to that in CF - and that is the init() method. But, I don't feel all that bad about using it, since as far as I can tell, it is pretty much standard. But, even though it is, I've still run into trouble with forgetting to call it.
That is true of most everything. Need experience and retrospectives to figure out if it worked for you.
Agree 100%. The strength of external DSLs is also their weakness - stronger syntactic validation. If you want to FORCE people to stay within a DSL, an external DSL (using XML or whatever) is ideal. If you want to have the flexibility of mixing in script in your GPL like CF or Ruby, you obviously want to do everything "in language".
It's like the debate in the Spring world between XML and programmatic configuration. The truth is that for 20 or 50 objects, an XML file is ideal, but once you have a system with hundreds of objects, you want to use a higher level of abstraction with looping and conditionals and enums full of object names to describe your object dependencies at a higher level (so you can do things like saying "for each business object #ObjectName#Service depends on #ObjectName#DAO"). One solution is to keep the XML syntax and to write an XML generator, but the other approach is to embed DI syntax in your language using language extensions or an API. All down to use cases!
Actually, if I have my choice, most times I won't be doing web projects in Java. I like more expressive languages a bit better. =)
Will post code as soon as I get a chance! Look out for updated version of CF Template plus a new CF Gen generation system for orchestrating and iterating template based code generation using a metabased approach.
I did write a generator in CF for some of my Java that interacts with the database, but is wasn't near as substantial as what you're talking about.
I have always felt like the criticism of XML for this purpose was a little too strong. XML may not always be the best solution, but you still have to learn the API whether it is XML or not.