By Peter Bell

Should you use XML for your configuration files?

I keep coming back to the question of where or when you should consider using XML for your configuration files in ColdFusion. One of the oft touted benefits in Java (you can change the configuration without having to recompile your code) is irrelevant in the ColdFusion world, but there are still a lot of reasons to consider using XML for your configuration files . . .

Why Configuration?
Often there are a number of properties that your application depends upon. Everything from the name of the data source to information required to wire your beans together in ColdSpring or LightWire. The question is where to put them. One way is just to sprinkle them over your code, setting the properties wherever they are required. In practice that can make your applications hard to maintain as you have to search through all of your files to find the various properties you might want to change.

So, most developers create some sort of centralized file (or set of files) for storing those variables. There are three common approaches: a programmatic cfinclude, an ini file and an XML file.

INI or XML?
For simple config files, as Hal Helms pointed out in the first half of a great article, you can use an INI file (in the second half he suggests that XML is a better approach – as does Joe Rinehart in a separate article - thanks Jim for the links!).

While INI files can work for simple data, as soon as you get data that has depth to it (perhaps a list of objects that each have properties), INI files get pretty messy, pretty fast. And given that most applications (and hence most configuration files) get bigger and more complex over time, it is worth starting out with XML if you want a static configuration file. Jim Collins created a nice project the other day to make it easier to get started with your own XML config files.

The Benefits of XML
Some of the benefits of an XML configuration file are as follows:

  • You can use standard third party tools to create the XML config file and to check it for syntactical and semantic accuracy if you publish a DTD.
  • There is an argument that you should distinguish between configuration (data) and programming (code). XML enforces that distinction.
  • While Jim's example is pretty simple, you can create sophisticated schemas for "deep" information which is particularly useful when you have data that contains data that contains data. Doing that in a traditional config file can be a real problem.
  • XML configs are designed to be written programmatically as well as being able to be edited manually in anything from a text editor to an XML editor.
  • Anyone technical working with your application will know exactly what an XML file is and how to use and validate it against a DTD.
  • You have much more granularity for capturing errors. If you include a struct, any syntax error will bring your application (or your try/catch block) down. With XML you can validate against DTD and then validate given values knowing you will have a syntactically and semantically accurate data structure.

Jared also make some good comments (scroll down to the comments) about why he prefers to use XML files.

And if you can think of any other benefits to XML, please post them in the comments below – this was a quick posting so I’m sure the list isn’t comprehensive.

Programmatic Configuration
However, there are things that you can not do easily with an XML file. What happens if you want the base directory path to include #application.name# or to use expand path or other functions to dynamically set other properties? What about if you want to change the datasource between your production and test db based on the value of a URL variable or a session key? What happens if you want to set a time based on Now() or to take advantage of all of the other features that a programming language provides?

Well, a configuration file is for manually entered configuration information. Anything you configure based on variables such as application.name or functions like now() or expandpath() shouldn’t be in your config file – it should be elsewhere in your configuration routine.

As for changing data source based on a URL or session key, you have two XML files with one for each data source. This does make it important that your applications support multiple XML config files. You don’t want to have to put the same directory configuration information into two different config files just because you want to be able to change data source programmatically – that would not be very DRY (Don’t Repeat Yourself). Another approach would be to include awareness of production vs. test values into your XML schema. That can work, but I tend to prefer to keep the schemas simpler and have lots of simple little XML files otherwise your schema could get a little complex if there were lots of potential classes of variability they needed to know about (although any thoughts on adding such information to schemas would be appreciated).

Should you use XML for Configuration?
Firstly, I must state: I don’t like XML. I find it verbose, ugly and with an extremely high signal to noise ration. I personally think XML is visually extremely intrusive and annoying, and the thought of explicitly repeating structural information in every record seems painfully inefficient. That said, I've looked at other formats (such as JSON, for instance) and I understand why there is much more chance for error (especially when a file is manually created) if you don't have all of that redundant information with each record.

There are valid use cases where a programmatic config file makes sense. Martin Fowler had some good comments about programmatic config files. However, most of the time, an XML config file makes more sense and if necessary you can just support programmatic as well as XML interfaces to your configuration information.

Any thoughts?

Comments
Despite your ending caveats about XML, it is interesting to see you have come full circle on this one...
# Posted By Brian Rinaldi | 11/25/06 12:42 PM
Yeah, although honestly, I'm still wavering! I fundamentally dislike XML. Every time I see an XML I want to take up Ruby programming :-> I'm just trying to be intellectually honest and point out the benefits that XML clearly provides.

I'm still looking for a weak spot - a plan of attack. A valid reason to remove XML entirely from my future development plans. Some things just seem like a lot of work in XML and I don't know that XML schemas are the right solution to describing what is basically simple domain specific languages. On the other hand, lots of little languages gets old real fast when you have to learn each ones unique syntax which is where a syntactical lingua franca like XML comes in.

So many smart people in the community use XML, I've got a feeling I'll end up using it, but we'll see!

Expect to see a lot of wavering on the blog. There are plenty of people who blog about things they know, showing you the ending point. I think one of the benefits of this blog is that it shows someone trying to figure stuff out, making mistakes, changing their mind, etc. I think it's easier to get something if you see other people going through the process of trying to figure it out rather than just hearing the pronouncements on high after they've figured it all out!
# Posted By Peter Bell | 11/25/06 12:50 PM
Your blog post is very timely as I have also been spending some time thinking about XML and its use in configuration files. I think this method makes a lot of sense. But what I am trying to think through is a way to use an XML file to build a form that edits configuration data. I am often frustrated that when I add a new feature to an app that needs a config setting, and then I have to go back and code the form changes, add a new column to the DB table that holds the config data, etc. So I think using an XML file as a configuration template, a CFC to drive the config process, and a simple database table to hold the config data is going to be very helpful. That way the next time I have to add a new entry to keep track of "somepathlocation", I add an entry to my XML file, and the new entry will make its way to the config database without any schema changes. I would like to think of it as an extensible configuration system.
# Posted By Marc | 11/25/06 12:56 PM
Hi Marc,

This is not dissimilar to issues I'm toying with. In practice, though doing this is really quite difficult for two reasons - field types and (more importantly) "deep" data.

If your config file is just a bunch of text boxes, generating the form and db to hold it (do you even NEED db to hold it?) is trivial. Adding different field types and their properties just requires a richer schema, although if you want to include anything requiring value lists (check boxes, radio buttons, multiple selects or single selects) you need to add the value list titles and values to your XML.

It is when you get into "deep" data that generating the config admin becomes difficult. If you are configuring n-objects which in turn can have n-attributes - each of which needs a form to manage their data, generating the admin system automatically is still feasible (I'm woorking on some patterns), but it does become non-trivial.

Keep me posted on how things go - maybe we can share some ideas?!
# Posted By Peter Bell | 11/25/06 1:10 PM
Peter: Thanks for the kind words. Great article. I hope the community finds config.cfc useful.
Any feedback anyone has would be welcome. I just want to thank Nic Tunney (nictunney.com) for his time and help in developing it.
# Posted By Jim Collins | 11/25/06 2:13 PM
Hi Jim,

Thanks to you and Nic for putting the resource together!
# Posted By Peter Bell | 11/25/06 2:18 PM
I just took a look at the Config.cfc. Its a nice example. But I am huge fan of the programmatic configuration file so it takes a lot to sway me. I am still not convinced of the benefits xml has over anything else.

Why is there a SetVariables() method? Shouldn't this just be part of the Init() method? I mean why call both methods when the entire point of the CFC is to set variables.

It rubs me the wrong way that the CFC refers to scopes outside of it's "known" universe. Granted, the entire point of this CFC is to set variables, but one of the first things I learned about CFCs is that they should not refer to scopes outside themselves.

I think it is not great that the path to the config file is hard coded in the Config cfc. This is probably something that should be passed in during Init() so that the programmer has more flexibility in how they want to organize their code.

Now, what happens if I want to expand the way the config works? Something more complex? I have to update both the Config.cfc and the config xml. How often does that happen? Not often.

I don't want to come off that I am attacking the project. I think it is a cool concept. I just want to make sure someone is fighting in the Programmatic corner :D

You know what would sell me... if someone took my Config.cfc and converted it to go along the lines of this new Config.cfc. If that was easy to do and useful to use, then I would be 100% converted. From what I hear about XML, this should be a fairly easy to moderate task:

http://www.bennadel.com/index.cfm?dax=blog:364.vie...

I am not messing around here. You convert my Config.cfc to work nicely with XML and I will sing the praises of xml config to everyone I know. Looking forward to some sweet solutions!! Please help me learn :D
# Posted By Ben Nadel | 11/25/06 7:03 PM
My take on this is if your config file needs to be shared with other applications then XML provides that universal compatability, but if it's uniquely for your own CF app then don't bother - use programatic CFML as it gives you a great deal of flexability if you need a smart config file. e.g. some config vars are based on where it's hosted, or if it's a dev/stage/prd environment, or perhaps the state of data in your db.

If you want a nicely formatted config file that's easily understandable between a team of programmers then a .ini file is a good way to go, but I much prefer using cfml as it comes armed with the whole array of programatic tools. I very rarely can get away with using a totally static configuration.
# Posted By Gary Fenton | 11/26/06 6:36 AM
Hi Ben,

A few comments from someone who is still swaying on this debate!

It is OK for a cfc to speak to external scopes as long as that is its purpose. You could inject an application and session facade into this, but I don't see any problem with a small number of cfcs being application or session scope aware as long as that is their primary job.

The config path could be made programmatic in 2.0.

I think the thing to distinguish is between a config file and the configuration process. I always used to think of them as the same thing, but if you think of the job of a configuration file as to allow someone to manually provide config data, it makes more sense using XML than if you think of its job as to configure the application. Imagine breaking out your config file into two (a tighter separation of concerns). The first would be responsible for getting properties hard set by the programmer - data. The second would be responsible for configuring your application using a combination of the data provided plus all of your cool programmatic stuff. So instead of having a single file with expandpath() and variable settings, you break them into two, making the process of obtaining static configuration data separate from the process of configuring your application using a combination of that data plus all of your dynamic config stuff. I think that is where XML configs (and non-programmatic configs in general) make more sense.

Make sense?
# Posted By Peter Bell | 11/26/06 3:55 PM
Hi Gary,

I'd argues that config files are seldom shared between applications. However, think of "sahred between programmers" and there is a stronger case for using XML config files when lots of people will be modifying the config file. Two obvious use cases are where you are working in a larger team of programmers (where large = > 1 in this case) and the other is if you're writing an app that other people will use/maintain - whether you are a consultant creating a project for another company to maintain in-house or whether you're writing a framework.

If it's only you writing and modifying the app I agree that the benefits of XML probably aren't worth the overhead of the small amount of extra work it takes to create the DTD and process the file.

Also see comment above. The distinction is that a CONFIG FILE DOESN'T CONFIG YOUR APP. It just provides the static data for your config file to use to config the app. You put your expandpath() and other dynamic config stuff in your configuration code - not your configuration file. Obviously this doesn't make sense for the smallest applications with small amounts of true config data.
# Posted By Peter Bell | 11/26/06 4:00 PM
Hi Peter,

I still can't see the advantage of using XML over CFML. The following couldn't be any clearer if there are 1 or 100 programmers:

<!--- relative url for images --->
<cfset imgURL="stuff/images/">

A bonus is that you don't have to run xml validation - the risk of a programmer typing invalid cfml is lower than typing invalid xml. (In my opinion.) If I was handing the app over to non-CF clients to manage then I think a .ini file is the most human-readable format and also the most forgiving - no brackets or quote marks are required so you don't have to ensure you close up after each line.

I hear what you're saying about keeping the config of the app separate from the static setup. I guess it depends on the app itself. In my current app I have about 8 static vars and 30+ dynamic ones. In this case it's less hastle to keep everything in a single file - the static stuff first, followed by the dynamic vars that depend on the static bits. I'd really need a ton of static vars to justify putting it in a separate file.

TBH I don't think there's a right or wrong answer, except when the lead programmer says one is the correct way! :-)
# Posted By Gary Fenton | 11/26/06 7:21 PM
Hey Gary,

Firstly, with 8 static variables, it really doesn't matter. Look at a ColdSpring XML file for an application with 200 beans and you'll get an idea of the scale of config file where XML becomes a no-brainer (not saying it needs THAT much data, but at 8 variables, this is definitely more of a theoretical discussion). If I had 8 variables, I'd leave them in CF file unless I was in the mood to practice my XML skills for the future and had a boss who'd let me play.

The example you give is very clear, but what happens if the programmer who is adding a new module adds some more config lines to your 1000 line config file and gets a strange error (maybe "invalid path, could not find xxxxx"). The error is due to the fact that they didn't provide a well formed path for the third of the twelve new beans they added, but that might not be obvious at all looking at the error message, so they're hunting for a bug in your 10k loc app when they should have been adding a line that they forgot to the config file. Of course you can write code to validate the config file, but the point is that if you use XML, many (not all, but many) of those validations will be performed automatically against a DTD by you just running a simple line of code, so it is easy to distinguish a whole class of errors with a simple "XML file doesn't match DTD" error.

I don't think there are right or wrong answers, but I do think there are pretty clear rules of thumb for classes of use case where a given approach will usually be better. For xml configs it is large amounts of static configuration data and multiple programmers that make them more compelling.
# Posted By Peter Bell | 11/26/06 7:40 PM
Its just esthetically cleaner to have the specific implementation details in a separate file.
The idea of having a lot of values hardcoded in my app makes me cringe, because its just ugly.
It's also a lot easier when you have more than one deployment site.
Both ColdFusion and Jrun use XML files for configuration information, for good reason.
# Posted By Jim Collins | 11/26/06 8:58 PM
I think too much emphasis is placed on pulling every possible detail into an external configuration file. If you need to make a change, change the code. If the code is well-factored, that shouldn't be a problem. If you're following the DRY principle, every detail should be expressed once, somewhere in the code, where it's not hard to find, and easy to change.

Okay, well, there are a few exceptions.

1. If you want to be able to change a detail after compiling, it's good to put that detail in a text file that doesn't require a complier or any other fancy tools.

2. If a detail needs to be changed by someone who is not capable of changing the code, obviously it needs to be defined outside the code.

3. If a detail depends on the environment in which it's hosted, it's good to keep it separate from the application. That way you don't lose your configuration every time you deploy a new version of the app.


I've circumvented #1 by using a language that I don't have to compile.

I deal with #2 a lot, but the people who aren't capable of changing the code don't have access to files on the server, so I have to build front ends for them to use.

When I'm building web sites that are hosted in one place, #3 doesn't apply. I am trying to move my dev vs. production details into a config file, but it's a really tiny file.

If you don't pull details out of the application until you have a specific reason to do so, I think you'll find your configuration files are a lot smaller and simpler, and you can get away with formats that are less verbose than XML.
# Posted By Patrick McElhaney | 11/27/06 8:46 AM
Hi Patrick,

So to take your example, where in the code would you document all of your bean dependencies (replacing the ColdSpring XML config file)? Also, how is putting well structured configuration information in a higher level programming language better than constraining it within an XML format with the ability to check for a whole class of errors just by writing a simple DTD? It seems to me that for any non-trivial configuration information (bean dependencies, the relationship between models and views, etc.)to be able to remove classes of errors by having a DTD for correctness checking will actually simplify your code by removing a lot of validation code you'd otherwise have to write.
# Posted By Peter Bell | 11/27/06 9:41 AM
Peter,

I don't have beans. I don't use ColdSpring.

I eliminate a whole class of errors by not using XML. I also avoid having to write simple DTDs that way.

I don't have non-trivial configuration information. See my previous post.

Patrick
# Posted By Patrick McElhaney | 11/27/06 2:29 PM
I don't really understand the whole DTD issue. I mean, I get it, it validates the XML config file... what I don't get is why go through the trouble of writing it? It just makes something else to maintain. Wouldn't it just be easier to update the XML file, load it, and debug it?

I mean, the Config process is expecting a validly formatted XML file. It's not just a parsing issue but a usage issue. If you require a NAME and a VALUE tag for something, and someone doesn't follow this format, the process will crap out. Then you debug it (rather quickly I assume as it's just a formatting issue) and carry on.

Since this is all done in the DEV environment, I just don't see the usefulness of making 100% sure your config XML is proper before re-initializing your application.

My 2 cents.
# Posted By Ben Nadel | 11/27/06 2:33 PM
@ Patrick, Guess I'd have to see a real world application to understand how you approach these things. As with everything else, whatever works for your use case!

@Ben, The benefit of the DTD is a little like the benefits of strong typing in a compiled language - only more so. Lets say you are maintaining my 50k loc application I wrote. You need to add a new bean, so you go into the ColdSpring XML and add the appropriate properties. If you enter badly formed XML or miss an essential property, CS can use a DTD to give you a really nice error message like "all beans require a class path". If CS used a prohgrammatic config, who knows what kind of error message might pop up where. You might get an easy to debug message, but it could also be a really obtuse failure that you never even consider has anything to do with the config changes you just made. In an extreme case the error might not even crop up until integration testing by which time you'd be hunting through a given component for an error when it was really a configuration problem. That is where the benefits of the DTD come in. Writing a DTD is much easier than coding the equivalent config file validation from scratch in CF. That's why it is so cool.
# Posted By Peter Bell | 11/27/06 3:20 PM
Peter,

I see what you are saying in terms of having a better error message. But as far as:

"Writing a DTD is much easier than coding the equivalent config file validation from scratch in CF. That's why it is so cool."

I wouldn't even bother have programmatic validation. That's what I mean about letting the config process just crap out - it doesn't validate, it just tries to run as if it was good data. I don't think it's worth arguing about this, though, as I think this is something that relates to project size and my projects aren't as big as other peoples, so its not an issue.
# Posted By Ben Nadel | 11/27/06 3:25 PM
All just comes down to how it craps out. If the config file craps out, that is cool (not ideal if it's 1000 lines long, but not horrible). The thing you have to think through is whether the config file could load successfully even if crucial data was missing and whether it could create any "hard to track down" errors. That is when you want to catch as many of those with validation of the config file as possible using a combination of a DTD and possibly custom validations for requirements that a DTD can't catch.
# Posted By Peter Bell | 11/27/06 3:39 PM
Understood. Yeah, I am sure we have all had our fare share of chasing down poorly defined errors. That can be such a nightmare in and of itself. Makes sense.
# Posted By Ben Nadel | 11/27/06 3:42 PM
I'm a bit late to the party, but somebody on the Reactor mailing list posted a couple links to your site and I just saw these entries. I'll add my anti-XML voice to the party. I HATE XML. I would even prefer to have JSON config files than XML, but even that annoys me somewhat. Like you, RoR is very, very tempting, so much so that the framework I'm building is inspired by it (yes, I know CF on Wheels exists, but I wanted to build things my way).

In the end, I really think it's a matter of preference. There are logical arguments for and against, but I think preference ought to count for something. Just like it is for frameworks. Some people love and swear by Model Glue or Mach-II. I personally don't like them, I prefer the Rails approach. It doesn't necessarily mean it's better or worse, it just clicks better for me. For others it doesn't. Pick your poison...
# Posted By Thomas Messier | 3/24/07 12:00 PM
I noticed the link on the Reactor mailing list (Thanks Dan!). I think that preference is a really valid concern. Personally these days I use config beans where I can do stuff like:
addProperty("FirstName", "Name"); to add a property to an object. I find it works better for me than any of the alternatives, but I still fully understand and support why lots of people use XML for their classes of problems.

LightWire (http://lightwire.riaforge.org) has a pretty good example now of my approach to using config files and it works quite nicely for me while still giving me benefits over just manually setting values in a struct using CF set.
# Posted By Peter Bell | 3/24/07 12:22 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.