The Power of Active Data Models
When was the last time you wrote an e-commerce store that kept the products in a text file? Of course it is possible, and there are use cases where this is the best approach, but I think most developers have now standardized on storing data in relational databases for most of their web based applications. Why? Well, the search, the filtering and the reporting capabilities make it easy to pull the appropriate subset of the data required by a given screen, we have well established patterns for easily accessing and editing records, and the solution scales well to large data sets.
But metadata (configuration information, bean dependencies, object models, workflow rules, etc.) is also data, so would it not make sense to store the metadata in a database as well (an approach often called an “Active Data Model”)? As always, it depends.
If you are only going to reuse the metadata occasionally and are only going to have a few hundred pieces of metadata, a database for storing the metadata is probably overkill. If, however, you want to create a powerful Software Product Line, storing your metadata in a database makes it much easier to re-use and to create feature modeling and configuration tools to deskill the process of associating the appropriate data to a given application you’re trying to build.
There are two major constraints with a metabased approach. The first is that it works best with fairly simple grammars. As your abstract grammars become more complex it becomes much less useful to store your languages in a database. The other is that historically, the evolving of metabased DSLs has been very painful as every time you want to change your syntax you need to edit the metabase schema and the code for accessing the data (as well as the templates and transformations which always have to be changed when you refactor a DSL – irrespective of concrete syntax). This meant in practice that the benefits of a metabased approach were typically limited to mature DSLs which didn’t change.
The problem with this is that as agile development becomes more important, it is becoming (in my mind) essential to practice “agile application generation” with a system designed for the easy refactoring of DSLs over time (in fact I’ve actually submitted an experience report on this for Agile 2007 although I have no idea whether it’ll be selected or not!).
I’ll be posting more shortly on an approach I have developed as a first step towards a “language workbench for active data modelers” which will automate the process of regenerating your metabase schemas and data access code as you modify the abstract grammar of your DSLs using a simple text file. Eventually I’d hope to add quite powerful refactoring support for DSLs, but that raises some interesting issues I haven’t had time to think through yet.



One of the benefits of a database is that it is in fact, a rigid structure. This is quite nice when analyzing an implemented system, or providing a common ground for teams of developers to work from. After all, the data and relationships typically model a good bit of the business rules in an application.
One of the drawbacks of a database is the same rigidity. Changing a relationship in a currently implemented system can break a lot of code. Adding a new field often results in having to drag the data through multiple layers of code and also account for the new field in several places, client side validation, server side validation, business objects, gateways, DAOs, etc.
Having a metabase is seemlingly quite dynamic by nature. I can't wait to see the solutions you have for addressing this.
DW
I guess you could look at one element of what I'm playing with as providing support for a bunch of refactorings at a domain level. It should be easy in an application to perform refactorings such as "add object attribute" or "add object relationship" with a single declaritive statement flowing out to generate all of the code and the schema. The only reason why such changes are difficult is that we're programming too "close to the metal" a lot of the time. There will always be edge cases, but the more artifacts we can automatically generate the easier life will get!
That's where my interest lies. Specifically lately, in making changes to the database reflect instantaneously through the code, without breaking it (which, I've pretty much accomplished to my satisfaction - for that part anyway). Like you, I'm building "richer" metadata as I need it - to use your example, the DB knows SSN as a string, when in fact it is really a string matching xxx-xx-xxxx where x = any digit. But, currently I'm not storing that information in a database.
Typically, to add just one column you'll need to change code in at least 2 spots - your model and your view. And that doesn't count the places you haven't followed DRY principle, and gateways/daos/beans/whatever other places you might need to change it. I'd like to change it in one spot - the database - and be done.
Sounds like you'll enjoy LightBase and CFGen!
a) when its available, and
b) when I've got extra time
I'd also love to see what you (and others there) are saying at Frameworks conference, but for school reasons, I won't be there (plus, if I do go anywhere, it'd be to New Orleans to see BB King if he's still playing this weekend).
When I said "I've pretty much accomplished it": I'm using the DB as a starting point for that metadata, and letting the programmer describe it in more depth in the code through an in-language DSL, as opposed to using a DB to store that information. No particular reason other than I'm just not that far along (was working on doing that for an in-house thing, but its been dropped for the moment because I'm too busy with work and school).
Nevertheless, since I'm building it up as I work on a "real" project, I haven't put in a lot of rich meta-data since I haven't yet had the need for it. But, it does have things like month-year dates, dates, and date-times (as opposed to just a date-time that the db offers). Likewise, there is no support for binary files to be stored, and some other not-often-used data types.
I'm glad you're around blogging about this. It certainly helps me flush all my ideas out a lot better, and makes me think about things I wouldn't normally think about. Thanks for that!