Where Should The Metadata Go?
Setting the Scene
So, let's say a client needs a new site with content management and some custom objects, and let's assume that we can't just re-use existing business objects - they're not looking for Attorneys and PracticeAreas - they want Insects and Regions in sub-saharan Africa. With a bit of training an analyst can elicit at least the core requirements such as the Properties of the business objects. How should they enter that a Region has an AnnualRainfall, GDP and is related to one or more ClimateZones?
Entry Options
Well, let's start by looking at some of the possible ways the analyst could enter that data:
- Code - We could focus on analysts with a background in programming and get them to use an IDE and to write code (method calls, setting properties, naming classes, adding annotations, in-language DSLs, literate programming, etc) that describe the requirements. Benefits: This is a common approach used in Ruby on Rails and Django amongst others. It is one of the simplest approaches to implement and allows for a single set of artifacts that can be easily understood by programmers without having to understand how various external files are parsed. There are also no real dependencies other than the framework that powers the app. Downsides: All of the metadata is tied to a specific programming language and generally code is quite hard to automatically transform into other concrete syntaxes so this is an approach that is least likely to be suitable for a software product line where you need to be able to automatically transform statements across hundreds of projects over a number of yours as your architectural decisions evolve.
- Simple Config Files - Excel documents, comma delimited files and text files with property:value pairs can all play a part in allowing for the configuration of applications, but because they can't handle hierarchical data very well, they are usually limited to simple configuration properties. Benefits: Fairly easy to manage, can use tools business users are comfortable with like Excel.Downsides: Lack of support for hierarchical data limits the classes of problems such files can solve.
- XML - XML has been proven as a capable way of encoding and sharing hierarchical metadata within and between applications. While many (myself included) don't love the signal to noise ratio, a good XML editor and the tooling support available for editing, validating and parsing makes XML files (especially with xsd's) a good default approach to encoding and editing metadata.Benefits: Great tooling support, easy to provide users with a validating editor, easy to process and transform in any language. Downsides: Sometimes perceived as ugly. Inefficient repetition of data required slowing data entry compared to a "little language".
- Little Languages - Writing a parser for your own little language allows you to have more control over your syntax, so if you prefer indentation over brackets or ending tags to denote the end of a block or want to implement any other syntactical conventions you are free to do so. Benefits: Complete control over syntax so you can create a very tight syntax for efficient data entry. If a lot of data is to be entered, this can be an important point. Downsides: You now need to document and teach users your syntax, so it better be easy to learn and worth the overhead of learning it compared to a syntax like XML which is already understood by many. There is also the time and effort required to write the parser - even using tools like ANTLR.
- Content Management - Metadata is just data. If you want to create an interface for data entry, why not create a content management system? Benefits: Support for a lot more hinting, validation and the like. The ability to create wizards for occasional users and grids for quicker entry by more experienced developers. Downsides: The overhead of creating the system and the server round tripping which can make data entry painful for experienced devs with a lot of data to enter.
- Boxes and Arrows - One of the most popular approaches in the wider MDA and DSM community is diagrammatic tooling. There is a benefit in being able to display your DSLs visually, but with the overhead of creating the editor and the extra time it often takes to enter data using a diagrammatic tool, this certainly wouldn't be my default approach. Benefits: Provides a nice visual overview of your statements and for some use cases like workflows, it's more intuitive to build them graphically. Downsides: The overhead in creating the tooling - even using Eclipse EMF or Microsoft DSL Tools, and the fact that data entry often takes longer in a graphical system when compared to something like a grid.
And the Winner Is?
For my use case, putting the DSLs in the code is a non-starter. It seems cool, but I KNOW I'm going to be making changes to just about all of my architectural assumptions and to encode my DSLs into a textual format that I can't perform model transformations on would just be sheer madness. The Ruby crowd have some cool ideas with literate programming, but I'd still like to see someone show me how to perform model transformations on hundreds of Ruby projects to change the API they are coded to without manually modifying the statements. I know some people who are looking into that kind of problem space for statically typed languages, but it's a lot easier to take data in a database or an XML file and to perform model transforms, so while I kinda like the idea of annotations to encode my intent within my class files, it doesn't work for my personal use case. The simple text files are too limiting and I often have to deal with hierarchical data, so that leaves me with XML, little languages, content management and/or a visual editor.
For me, the overhead of writing little languages doesn't really make sense right now. I'm interested in the idea of literate programming and I don't really love angle brackets or end tags, but if it takes me an extra 10-20 minutes to build an e-commerce system using XML DSLs rather than little languages, well, it just isn't worth the overhead of writing the parser to save a few minutes per project, so XML works fine.
As for the visual editors, they tend not to be my preference, but honestly even if they were, the overhead of me getting up to speed with Eclipse EMF would just be too high for this revision, so I'm going to disallow visual editors as an option for now. I think I'll end up revisiting visual editors and making them an optional projection for editing metadata, but that's something we can get into in 2008!
So, for 2007, that leaves a shoot out between XML and a content management system for entering my metadata. But I think this post has run long enough already, so my thoughts on those two options will have to wait for my next posting!
Thoughts?




