Metadata Storage: Where should it go?
The great thing about configuration in the code (in-language DSLs) is that they are very easy to implement and modify. Often the starting point when developing a simple language is to encapsulate the implementation of the concepts using class files and implement the language using method calls. In-language DSLs also allow for a mixture of DSL and native language concepts so if you need to support looping, conditional logic and other concepts often best implemented in a 3GL an in-language DSL can be a good choice. Of course, there can be a value in enforcing constraints, so sometimes it is better to have an external DSL that allows you to manage the statements that can be made in a language. In addition, external DSLs are more likely to be maintainable by less technical users and can be programming language independent (you only have to replicate the generator or interpreter in each language - not all of the statements).
If you decide to go with an external DSL two of the most common formats for storing the metadata are databases and XML files. On the whole, XML files work well for relatively small amounts of metadata (tens or hundreds of statements) that you don't need to be able to reuse too efficiently (try searching and merging collections of hundreds of XML documents - it can be done, but it isn't as simpler or performant as a single query against a database). They are fairly easy to parse and process and are very simple to develop and view (the tooling for working with databased metadata is much less sophisticated - although that is something we are looking into improving). On the other hand, databases excel at storing, sorting, filtering and accessing large quantities of data.
At SystemsForge, we use a database to store our master metadata library so we can easily associate the appropriate standard, optional and custom metadata to a project, but we then often create XML files for specifying the functionality of a given app.
Any thoughts on where to use what kind of storage mechanism?



When you say you can put metadata "in the code (configuration scripts) [or] in text files", what type of file is your source code? I assumed you are talking more about the representation of your metadata than /where/ it is stored?
I like a combination of DSLs (either internal or external, or perhaps both) and DBs. In fact, if you could do a combination of the two it might be even better, except regarding individuals who need to figure out WTF is going on will need to look in two different places to find out. But, I imagine you could come up with some canonical view and use that, which would then do all the mapping to sources behind the scenes.
When you say that for an "external DSL two of the most common formats for storing the metadata are databases and XML files. " What metadata are you talking about? The DSL has metadata?
You also said "there can be a value in enforcing constraints." What kinds of value?
Finally, when you said "external DSLs are more likely to be maintainable by less technical users," I suspect you meant that scripts using the DSLs can be maintainable by less technical users, not the DSLs themselves, right?
Sorry, not trying to be overly critical. Only trying to make the content even better! =) (But you know than anyway!)
Good distinctions. I considered noting that 3gl code is usually also stored in text files, but didn't want to muddy the core point. Regarding the metadata, I'm basically looking at the statements within the DSL as my metadata. If I have a DSL for describing business objects, then I'm calling that my business object metadata, although we both know there is no real distinction between data and metadata - it is all just data (for that matter, so is #GL code if you want to take a sufficiently inclusive view).
The value in enforcing constraints? Now I *know* you're a Ruby guy (we don't need no stinking XML with constrained grammars when we can use flexible in language DSLs we can mix with the core language :->). Lets say you want to be able to reuse your statements across n programming languages, or store them in a database as part of a feature modeler to allow for industrial reuse of metadata. Try doing that with lines intermingled within a 3gl language. If you don't constrain your grammar using an external DSL (whether using boxes and arrows, XML, Excel spreadsheets or a custom textual format) you're going to have a really hard time reusing metadata without manually cutting and pasting and your code is likely to be limited to the 3gl you embedded it into. That's find if you want to code a *little* faster but if you want to generate thousands of custom applications it just isn't efficient enough.
Good distinction on DSL vs statements within them - I did indeed mean the statements rather than the languages themselves.
NP re keeping me on my toes. Honestly I am going through this stuff very quickly as background to help me to make some new code gen decisions, so I was more writing it for myself than others, but please keep on keeping me honest!
Well, asking about the types of value in enforcing constraints was mostly an exercise more than a question for /me/. Thanks for the clear answer.
As for the metadata question, really I felt it was unclear the way you phrased it in the initial post. I also see how DSLs can be metadata for describing applications, but the way you put it sounded like you had metadata describing your DSLs. That may be the case, but I don't know how that might work. (other than a DSL to describe DSLs being viewed that way, which is basically BNF, I think)
I know the value of posting these for yourself. It gives good documentation of your thoughts and design decisions and has memory implications and a slew of other benefits. But I like to ask the questions to flush it out for /my/ use... Well, I don't mind if others get some benefit too. =)
Keep up the good work.