Code Generation 101: Implementing Active Code Gen
Active vs. Passive Code Generation With passive code generation, you generate code once and then edit it manually. Passive code generation typically really isn't ideal as we spend so much more time modifying and maintaining apps than creating the first cut (even during the original development phase) that it's not nearly as big a win as you deserve from code gen.
With active code generation, every time your metadata changes (whether you're using XML, programmatic config files, reading database metadata or whatever else), you re-generate your files, thus effectively raising the level of abstraction and allowing you to change your applications much more quickly as well as to create them much more quickly.
The most important point to realize is that if you're doing passive code generation, you could be doing active code generation just by tweaking your architecture. Code generators aren't active or passive (well, some wizards are purely passive, but most tools like Illudium can be used to generate code at any time). It's your architecture that determines whether you can regenerate or not. And the only thing you need to do to create a system using active code generation? Don't edit the generated files.
A simple Solution There are LOTs of approaches for handling this. You can use AOP, class or object based mixins, or even protected code blocks (where you DO edit the generated files but tell the generator not to mess with the protected blocks), but by far the most common default approach is inheritance. Here's a really simple starting point that'll work for generating most class files:
1. Create a tree under your components directory for generated components. Don't ever edit the files in there.
2. Get your generator to generate files prepended with Generated. E.g. GeneratedUserService.cfc.
3. Write a separate generator to passively one time create your custom files (e.g. com.user.UserService.cfc) which extend from the generated classes.
4. When you want to add custom methods, put them in the passively generated files.
5. Add to your continuous integration/build process a call to your generator so that you blow away generated files every time you commit code and run tests. That way you're never tempted to hack the generated files as they'll be blown away before you run your tests.
With this simple process you can keep using the generator to do what it does while still having a safe place to put all of your custom code. Inheritance isn't the only solution and it isn't always the best solution, but for generating services, DAOs, business objects and the like it's a pretty good starting point. You can always get fancy with model weaving or code weaving techniques once you've got this simple win in place.



I'm really glad you mentioned that. It's the main problem I have with passive code generation, and the problem I was trying to solve with cfrails (though, with what I always called synthesis - nothing is ever written).
Good post, and good point. I hadn't thought about using an extra layer to take it a step further. I know you didn't mention it in those terms, or even in the terms that gave me the idea, but still, you gave me the idea. Thanks!
(I know that sounds awfully cryptic - that's because the wheels in my head are turning and I'm not quite sure of what I'm thinking, so putting it in words is that much harder.)
Right now I actually use all synthesis too, and I'm also playing with annotations on class files instead of XML. It's amazing how many different approaches you can try and each one has its own set of benefits and limitations . . .
Hopefully when those wheels stop turning there will be a blog post there somewhere?!
There's something to be said about using hooks inside methods like before() and after() where they could use for further customization as well. CF8's missing method handler could make that easily implementable, with before and after handlers for any method called without having to actually create one for every method in the framework.
I imagine that even using passive generators, if you made even /that/ small a change, it could have very positive effects for not ever having to touch the generated code. I see most things in the backend side as unit-level calls, so that wouldn't be a problem for them. In the view, however, even the logical units have things within them you'd want to change, so that would require a bit more thought in going about creating something reusable.
There are definitely lots of ways of merging generated and custom code. I think inheritance is a good baseline solution for class files, but you may want to add explicit events/notifications. You might also want to use AOP and if you're being really fancy, you could consider model weaving which is AOP for models. There are are mixins, protected blocks and the like. View code is definitely hardest problems. View helpers help as you can generate and then extend those as per any other class file, but actual layouts are difficult to active gen. View snippets/fragments can help, but views aren't trivial to actively gen.
Good post.
I have very little experience with active code generation, but it seems like the "don't edit the generated code" rule restricts your flexibility in the types of code that you can write or requires everything to be extensible (at which point I get a suspicion that you could just include your meta-data in your code somehow).
How do you allow for complexity of resulting code without making the code generator or the resulting code unwieldy? (hopefully this doesn't come of sounding incredulous, I really do intend this as a legitimate question)
Really you can look at code gen as just a way of implementing a higher level of abstraction. I look at generation as "compiling" DSLs and frameworks/code synthesist/APIs as "interpreting" them. Most DSLs for describing facets of your application can be either interpreted or compiled. So, saying don't edit the generated code is like saying don't edit the framework you're written. In both cases, with an elegant API, you don't need to edit the framework/generated code as you provide the necessary extension points.
For most class files, you want to actively generate files with the generated code and have each one extended by a custom class which is passive (1 time) generated and where the coder can add their custom code. It really isn't unwieldy. For instance, if you use Reactor (or IronSpeed or Deklarit to name just a couple of .net generators) you're already doing this. Typically you hide away the generated code and by extending it, have access to the generated methods without having to mess with them.
Of course, you can also include metadata in your code. You can create structs with metadata, you can use XML, you can read metadata from a database table, you can add annotations t your classes and methods - these are all valid ways of raising the level of astraction in your programming. You can write custom tags or APIs and call them - that also achieves the same goal. The important thing to realize is that while each of these approaches can be used, each also has strengths and limitations, so the more different approaches you are comfortable with, the more tools you have in your toolbox to solve problems as writing elegant, maintainable, DRY code is fundamentally hard. If you just use code gen or APIs or db metadata or custom tags or (any other single approach), you're seeing everything as a nail because you only have a hammer. By practicing with all of the different tools, you're much more likely to have the allen key or power driver you need when you come across a problem where the hammer isn't working so well.
Of course, I know I'm probably preaching to the choir here!
Thanks for the great answer! I'm still not sure I am sold on active generation (at least for my own purposes), but that really does help put it into a larger perspective.
I really like your comments about using lots of tools. I have found that it is common to have people use only code generation and not other methods of abstraction or only other methods of abstraction but never any code generation.
I think you provided a good explanation of why it helps to use different approaches in concert.
That is encouraging to hear.
Currently, I use a code generator (passive, not active), but I try to continually refactor my code so that the code generator is doing less and less of the work.
Of course! However, I don't know enough in .NET yet to take on that task, and the types of things I was doing in Ruby didn't lend to code generation - mostly algorithms for AI/Bioinformatic stuff, nothing repetitive enough to generate. How would you generate a genetic algorithm, for example?
As for the generic algorithms, it's the same as anything else. If you find yourself writing repetitive code where you can abstract out higher level concepts into a DSL, code generation (or an API/framework) make sense. If you're not seeing such patterns, you *could* generate the code just by writing templates with no variables and with a 1:1 correspondence between template files and generated files, but it would be a little pointless!!!
I will of course consider it when I have enough experience to see where it would be helpful (or when I'm doing repetitive webwork and have already explored the .NET community offerings)
The other part - you missed the key letter in geneTic, not geneRic =). It would be like trying to say, "I need to sort something, let me see if I can write a program that will generate quicksort for me," without telling it what quicksort is, just that you need it.
Of course, you'd need to know quicksort first - unless you can write a generator that can think on its own.
Agreed re: .NET - no point generating until repetition is a problem. I think you have to write something a couple of times by hand before it makes sense.
Re: genetic vs. generic, apologies. I knew you meant genetic, just answered too soon after getting out of bed :->
Was just making the point that in the trivial edge case, you can write a generator for code with no repetition. You just write a template for each script with a 1:1 correspondence and no metadata and use the generator to transform each template into the appropriate script by applying no transformations to the templates (so it kind of becomes more of a build than a generation process). It is generally pointless to do this, but it makes the important point that you CAN generate any code that you can write manually. The only question then is which code is WORTH the extra overhead of a generator with a build process to generate scripts from templates.
The answer then becomes based on how much repetitive code exists within the solution. For example if you generate 90% of your code based on a higher level of abstraction, you can take two approaches for the custom 10%. You can either have an architecture with extension points for the 10% which you write in your IDE, or you can create custom templates for the 10% which map 1:1 to the generated scripts. Benefits of first approach is better IDE support for the custom programming. Benefits of the second approach is a single build process and set of templates used to generate 100% of code and easier refactoring from a template which generates code 1:1 to higher level templates with metadata doing more of the heavy lifting because you already have a generator, templates and a transformation process.
I guess the closest example is do you use CF to generate all of your HTML files or do you just include HTML snippets? Benefits of HTML snippets is that they can be created by designers in Dreamweaver, you don't need to worry about #'s, etc. Benefit of CF is that it'll be easier to drop in a CFIF or CFLOOP if and when you need to drive the HTML dynamically based on metadata - you don't have to change the file extension and the calls - you can just add the data driven constructs right into your CF template.
Would like to have a generator that could figure out quicksort without me writing code, but I'm guessing best I could do is a simple DSL for describing/selecting sort algorithms and something that'd pull in the appropriate sort algorithm script based on my configuration DSL.
Although, it might be fun to put all of those algorithms in an AI application generator, or even just as part of a search solution in a shopping cart. (things like SIPs and Other customers bought = related items)