To create a DSL you don’t need to know much about the structure of the DSL at all. Just create your API or XML file or whatever and you’re good to go. If, however you want to create a system that’ll make it easy to manage the structure (grammar) of your DSLs (which is important if you want to be able to easily change the structure of your DSLs as they evolve), it turns out that you need to be able to formally describe the structure of any DSL you might want to create.
Traditionally, Backus Naur Form has been a common way to describe language grammars. More recently the Extended BNF has become more popular. Extended BNF is just a simpler, more concise way of writing BNF and can be automatically translated into BNF – the two are semantically identical (you could look at them as two different concrete syntaxes embodying the same abstract syntax – or to put it more simply, two different ways of writing the same thing).
In its most simple format, you could say that in BNF, a symbols is comprised of an expression of other symbols. BNF allows you to document the required syntax for a given expression to be syntactically valid (in most languages there are also semantic constraints which mean that some syntactically well formed constructs are not valid because they don't meet the semantic constraints of the language - I'll get into providing a semantic constraint language later :->).
Interestingly when you look at it, XML definition languages are semantically very similar to BNF in terms of the grammar sets they can describe. Look at DTDs, Schemas and Relax NG and they have very similar capabilities (mainly because grammars are hierarchical data sets so they have a fundamental set of properties which any grammar description language will probably converge on over time). The main distinction with XML grammars is that they make a distinction between elements (they get their own angle brackets) and attributes (which don’t get their own angle brackets – they are contained within their parents brackets). Example:
As I look at the features that such grammar description languages provide, the only thing that is possibly missing is the concept of inheritance – they don’t provide the ability for an element (or symbol) to extend and inherit the attributes of another element/symbol. More importantly, the formats with the most momentum (EBNF for language grammars and Relax NG for XML grammars) don’t have syntaxes that I love, so I’m going to be playing with a few different concrete syntaxes until I come up with one that I love.
In the meantime, the abstract syntax I’ll be using to describe all of my DSLs can be described as follows:
- A Language is composed of 1..n Concepts.
- A Concept has 0..n Attributes.
- Each Attribute can be optional or required and capable of having up to 1 or up to n values ([0/1..1/n]).
- An Attribute can either be a primitive data type (integer, float, string, enumeration/list, datetime, etc.) or another Concept.
- And possibly: Any Concept can inherit from any other concept and can extend or override any of its Attribute type or cardinality (0/1..1/n) definitions.
The beauty of having such a formal grammar definition that it makes it possible to automate the code to store and access databased metadata with changes to the schema definition being used to automatically modify the data tables and MetaDAO making it much easier to evolve and refactor your DSLs. More about that over the next few days as I lock down a syntax, write a simple parser and find a few hours to hack a prototype together!