Should you use XML for your configuration files?
Why Configuration?
Often there are a number of properties that your application depends upon. Everything from the name of the data source to information required to wire your beans together in ColdSpring or LightWire. The question is where to put them. One way is just to sprinkle them over your code, setting the properties wherever they are required. In practice that can make your applications hard to maintain as you have to search through all of your files to find the various properties you might want to change.
So, most developers create some sort of centralized file (or set of files) for storing those variables. There are three common approaches: a programmatic cfinclude, an ini file and an XML file.
INI or XML?
For simple config files, as Hal Helms pointed out in the first half of a great article, you can use an INI file (in the second half he suggests that XML is a better approach – as does Joe Rinehart in a separate article - thanks Jim for the links!).
While INI files can work for simple data, as soon as you get data that has depth to it (perhaps a list of objects that each have properties), INI files get pretty messy, pretty fast. And given that most applications (and hence most configuration files) get bigger and more complex over time, it is worth starting out with XML if you want a static configuration file. Jim Collins created a nice project the other day to make it easier to get started with your own XML config files.
The Benefits of XML
Some of the benefits of an XML configuration file are as follows:
- You can use standard third party tools to create the XML config file and to check it for syntactical and semantic accuracy if you publish a DTD.
- There is an argument that you should distinguish between configuration (data) and programming (code). XML enforces that distinction.
- While Jim's example is pretty simple, you can create sophisticated schemas for "deep" information which is particularly useful when you have data that contains data that contains data. Doing that in a traditional config file can be a real problem.
- XML configs are designed to be written programmatically as well as being able to be edited manually in anything from a text editor to an XML editor.
- Anyone technical working with your application will know exactly what an XML file is and how to use and validate it against a DTD.
- You have much more granularity for capturing errors. If you include a struct, any syntax error will bring your application (or your try/catch block) down. With XML you can validate against DTD and then validate given values knowing you will have a syntactically and semantically accurate data structure.
Jared also make some good comments (scroll down to the comments) about why he prefers to use XML files.
And if you can think of any other benefits to XML, please post them in the comments below – this was a quick posting so I’m sure the list isn’t comprehensive.
Programmatic Configuration
However, there are things that you can not do easily with an XML file. What happens if you want the base directory path to include #application.name# or to use expand path or other functions to dynamically set other properties? What about if you want to change the datasource between your production and test db based on the value of a URL variable or a session key? What happens if you want to set a time based on Now() or to take advantage of all of the other features that a programming language provides?
Well, a configuration file is for manually entered configuration information. Anything you configure based on variables such as application.name or functions like now() or expandpath() shouldn’t be in your config file – it should be elsewhere in your configuration routine.
As for changing data source based on a URL or session key, you have two XML files with one for each data source. This does make it important that your applications support multiple XML config files. You don’t want to have to put the same directory configuration information into two different config files just because you want to be able to change data source programmatically – that would not be very DRY (Don’t Repeat Yourself). Another approach would be to include awareness of production vs. test values into your XML schema. That can work, but I tend to prefer to keep the schemas simpler and have lots of simple little XML files otherwise your schema could get a little complex if there were lots of potential classes of variability they needed to know about (although any thoughts on adding such information to schemas would be appreciated).
Should you use XML for Configuration?
Firstly, I must state: I don’t like XML. I find it verbose, ugly and with an extremely high signal to noise ration. I personally think XML is visually extremely intrusive and annoying, and the thought of explicitly repeating structural information in every record seems painfully inefficient. That said, I've looked at other formats (such as JSON, for instance) and I understand why there is much more chance for error (especially when a file is manually created) if you don't have all of that redundant information with each record.
There are valid use cases where a programmatic config file makes sense. Martin Fowler had some good comments about programmatic config files. However, most of the time, an XML config file makes more sense and if necessary you can just support programmatic as well as XML interfaces to your configuration information.
Any thoughts?



I'm still looking for a weak spot - a plan of attack. A valid reason to remove XML entirely from my future development plans. Some things just seem like a lot of work in XML and I don't know that XML schemas are the right solution to describing what is basically simple domain specific languages. On the other hand, lots of little languages gets old real fast when you have to learn each ones unique syntax which is where a syntactical lingua franca like XML comes in.
So many smart people in the community use XML, I've got a feeling I'll end up using it, but we'll see!
Expect to see a lot of wavering on the blog. There are plenty of people who blog about things they know, showing you the ending point. I think one of the benefits of this blog is that it shows someone trying to figure stuff out, making mistakes, changing their mind, etc. I think it's easier to get something if you see other people going through the process of trying to figure it out rather than just hearing the pronouncements on high after they've figured it all out!
This is not dissimilar to issues I'm toying with. In practice, though doing this is really quite difficult for two reasons - field types and (more importantly) "deep" data.
If your config file is just a bunch of text boxes, generating the form and db to hold it (do you even NEED db to hold it?) is trivial. Adding different field types and their properties just requires a richer schema, although if you want to include anything requiring value lists (check boxes, radio buttons, multiple selects or single selects) you need to add the value list titles and values to your XML.
It is when you get into "deep" data that generating the config admin becomes difficult. If you are configuring n-objects which in turn can have n-attributes - each of which needs a form to manage their data, generating the admin system automatically is still feasible (I'm woorking on some patterns), but it does become non-trivial.
Keep me posted on how things go - maybe we can share some ideas?!
Any feedback anyone has would be welcome. I just want to thank Nic Tunney (nictunney.com) for his time and help in developing it.
Thanks to you and Nic for putting the resource together!
Why is there a SetVariables() method? Shouldn't this just be part of the Init() method? I mean why call both methods when the entire point of the CFC is to set variables.
It rubs me the wrong way that the CFC refers to scopes outside of it's "known" universe. Granted, the entire point of this CFC is to set variables, but one of the first things I learned about CFCs is that they should not refer to scopes outside themselves.
I think it is not great that the path to the config file is hard coded in the Config cfc. This is probably something that should be passed in during Init() so that the programmer has more flexibility in how they want to organize their code.
Now, what happens if I want to expand the way the config works? Something more complex? I have to update both the Config.cfc and the config xml. How often does that happen? Not often.
I don't want to come off that I am attacking the project. I think it is a cool concept. I just want to make sure someone is fighting in the Programmatic corner :D
You know what would sell me... if someone took my Config.cfc and converted it to go along the lines of this new Config.cfc. If that was easy to do and useful to use, then I would be 100% converted. From what I hear about XML, this should be a fairly easy to moderate task:
http://www.bennadel.com/index.cfm?dax=blog:364.vie...
I am not messing around here. You convert my Config.cfc to work nicely with XML and I will sing the praises of xml config to everyone I know. Looking forward to some sweet solutions!! Please help me learn :D
If you want a nicely formatted config file that's easily understandable between a team of programmers then a .ini file is a good way to go, but I much prefer using cfml as it comes armed with the whole array of programatic tools. I very rarely can get away with using a totally static configuration.
A few comments from someone who is still swaying on this debate!
It is OK for a cfc to speak to external scopes as long as that is its purpose. You could inject an application and session facade into this, but I don't see any problem with a small number of cfcs being application or session scope aware as long as that is their primary job.
The config path could be made programmatic in 2.0.
I think the thing to distinguish is between a config file and the configuration process. I always used to think of them as the same thing, but if you think of the job of a configuration file as to allow someone to manually provide config data, it makes more sense using XML than if you think of its job as to configure the application. Imagine breaking out your config file into two (a tighter separation of concerns). The first would be responsible for getting properties hard set by the programmer - data. The second would be responsible for configuring your application using a combination of the data provided plus all of your cool programmatic stuff. So instead of having a single file with expandpath() and variable settings, you break them into two, making the process of obtaining static configuration data separate from the process of configuring your application using a combination of that data plus all of your dynamic config stuff. I think that is where XML configs (and non-programmatic configs in general) make more sense.
Make sense?
I'd argues that config files are seldom shared between applications. However, think of "sahred between programmers" and there is a stronger case for using XML config files when lots of people will be modifying the config file. Two obvious use cases are where you are working in a larger team of programmers (where large = > 1 in this case) and the other is if you're writing an app that other people will use/maintain - whether you are a consultant creating a project for another company to maintain in-house or whether you're writing a framework.
If it's only you writing and modifying the app I agree that the benefits of XML probably aren't worth the overhead of the small amount of extra work it takes to create the DTD and process the file.
Also see comment above. The distinction is that a CONFIG FILE DOESN'T CONFIG YOUR APP. It just provides the static data for your config file to use to config the app. You put your expandpath() and other dynamic config stuff in your configuration code - not your configuration file. Obviously this doesn't make sense for the smallest applications with small amounts of true config data.
I still can't see the advantage of using XML over CFML. The following couldn't be any clearer if there are 1 or 100 programmers:
<!--- relative url for images --->
<cfset imgURL="stuff/images/">
A bonus is that you don't have to run xml validation - the risk of a programmer typing invalid cfml is lower than typing invalid xml. (In my opinion.) If I was handing the app over to non-CF clients to manage then I think a .ini file is the most human-readable format and also the most forgiving - no brackets or quote marks are required so you don't have to ensure you close up after each line.
I hear what you're saying about keeping the config of the app separate from the static setup. I guess it depends on the app itself. In my current app I have about 8 static vars and 30+ dynamic ones. In this case it's less hastle to keep everything in a single file - the static stuff first, followed by the dynamic vars that depend on the static bits. I'd really need a ton of static vars to justify putting it in a separate file.
TBH I don't think there's a right or wrong answer, except when the lead programmer says one is the correct way! :-)
Firstly, with 8 static variables, it really doesn't matter. Look at a ColdSpring XML file for an application with 200 beans and you'll get an idea of the scale of config file where XML becomes a no-brainer (not saying it needs THAT much data, but at 8 variables, this is definitely more of a theoretical discussion). If I had 8 variables, I'd leave them in CF file unless I was in the mood to practice my XML skills for the future and had a boss who'd let me play.
The example you give is very clear, but what happens if the programmer who is adding a new module adds some more config lines to your 1000 line config file and gets a strange error (maybe "invalid path, could not find xxxxx"). The error is due to the fact that they didn't provide a well formed path for the third of the twelve new beans they added, but that might not be obvious at all looking at the error message, so they're hunting for a bug in your 10k loc app when they should have been adding a line that they forgot to the config file. Of course you can write code to validate the config file, but the point is that if you use XML, many (not all, but many) of those validations will be performed automatically against a DTD by you just running a simple line of code, so it is easy to distinguish a whole class of errors with a simple "XML file doesn't match DTD" error.
I don't think there are right or wrong answers, but I do think there are pretty clear rules of thumb for classes of use case where a given approach will usually be better. For xml configs it is large amounts of static configuration data and multiple programmers that make them more compelling.
The idea of having a lot of values hardcoded in my app makes me cringe, because its just ugly.
It's also a lot easier when you have more than one deployment site.
Both ColdFusion and Jrun use XML files for configuration information, for good reason.
Okay, well, there are a few exceptions.
1. If you want to be able to change a detail after compiling, it's good to put that detail in a text file that doesn't require a complier or any other fancy tools.
2. If a detail needs to be changed by someone who is not capable of changing the code, obviously it needs to be defined outside the code.
3. If a detail depends on the environment in which it's hosted, it's good to keep it separate from the application. That way you don't lose your configuration every time you deploy a new version of the app.
I've circumvented #1 by using a language that I don't have to compile.
I deal with #2 a lot, but the people who aren't capable of changing the code don't have access to files on the server, so I have to build front ends for them to use.
When I'm building web sites that are hosted in one place, #3 doesn't apply. I am trying to move my dev vs. production details into a config file, but it's a really tiny file.
If you don't pull details out of the application until you have a specific reason to do so, I think you'll find your configuration files are a lot smaller and simpler, and you can get away with formats that are less verbose than XML.
So to take your example, where in the code would you document all of your bean dependencies (replacing the ColdSpring XML config file)? Also, how is putting well structured configuration information in a higher level programming language better than constraining it within an XML format with the ability to check for a whole class of errors just by writing a simple DTD? It seems to me that for any non-trivial configuration information (bean dependencies, the relationship between models and views, etc.)to be able to remove classes of errors by having a DTD for correctness checking will actually simplify your code by removing a lot of validation code you'd otherwise have to write.
I don't have beans. I don't use ColdSpring.
I eliminate a whole class of errors by not using XML. I also avoid having to write simple DTDs that way.
I don't have non-trivial configuration information. See my previous post.
Patrick
I mean, the Config process is expecting a validly formatted XML file. It's not just a parsing issue but a usage issue. If you require a NAME and a VALUE tag for something, and someone doesn't follow this format, the process will crap out. Then you debug it (rather quickly I assume as it's just a formatting issue) and carry on.
Since this is all done in the DEV environment, I just don't see the usefulness of making 100% sure your config XML is proper before re-initializing your application.
My 2 cents.
@Ben, The benefit of the DTD is a little like the benefits of strong typing in a compiled language - only more so. Lets say you are maintaining my 50k loc application I wrote. You need to add a new bean, so you go into the ColdSpring XML and add the appropriate properties. If you enter badly formed XML or miss an essential property, CS can use a DTD to give you a really nice error message like "all beans require a class path". If CS used a prohgrammatic config, who knows what kind of error message might pop up where. You might get an easy to debug message, but it could also be a really obtuse failure that you never even consider has anything to do with the config changes you just made. In an extreme case the error might not even crop up until integration testing by which time you'd be hunting through a given component for an error when it was really a configuration problem. That is where the benefits of the DTD come in. Writing a DTD is much easier than coding the equivalent config file validation from scratch in CF. That's why it is so cool.
I see what you are saying in terms of having a better error message. But as far as:
"Writing a DTD is much easier than coding the equivalent config file validation from scratch in CF. That's why it is so cool."
I wouldn't even bother have programmatic validation. That's what I mean about letting the config process just crap out - it doesn't validate, it just tries to run as if it was good data. I don't think it's worth arguing about this, though, as I think this is something that relates to project size and my projects aren't as big as other peoples, so its not an issue.
In the end, I really think it's a matter of preference. There are logical arguments for and against, but I think preference ought to count for something. Just like it is for frameworks. Some people love and swear by Model Glue or Mach-II. I personally don't like them, I prefer the Rails approach. It doesn't necessarily mean it's better or worse, it just clicks better for me. For others it doesn't. Pick your poison...
addProperty("FirstName", "Name"); to add a property to an object. I find it works better for me than any of the alternatives, but I still fully understand and support why lots of people use XML for their classes of problems.
LightWire (http://lightwire.riaforge.org) has a pretty good example now of my approach to using config files and it works quite nicely for me while still giving me benefits over just manually setting values in a struct using CF set.