By Peter Bell

Introducing Delta Driven Development

You've just built an e-commerce application. The client comes back asking for some changes. They've decided that despite all of their protestations to the contrary in the design phase, products actually must be associated to multiple categories, not just one. This is when traditional development starts to suck . . .

With traditional development you would write a script to modify the database schema to create a joining table, write another data transformation script to copy all existing product to category associations to the joining table, write a third script to get rid of the old tbl_Product.CategoryID field and then modify your code, tests and documentation to capture the new relationship between products and categories.

If this was a rare occurrence it might not matter too much, but we spend almost all of our development time modifying systems - not creating them. We build a first cut of the system and then often spend weeks or months modifying it from what the client originally asked for to what they actually want. And then the application goes into production and needs to be maintained and modified for years to come. Maintenance is the *big problem* in software engineering and a traditional artifact based approach (working on database schema, data, program code, tests and documents as separate artifacts) can make small changes to the application a large undertaking.

Imagine if instead, we applied transformations to our applications. Instead of manually going through all of the steps to add a property to a business object or to change a relationship from 1.1 to 1..* we just described the parameters of the transformation required and allowed an automated system to regenerate all of the artifacts (code, docs, tests and scripts to transform the db structure and to transform the data based on the transformation rules). In this way we would be able to easily transform any given code base or database to a new version of the system (and for reversible transformations, back again). We would also be writing much DRYer applications as we would be expressing our real intent (move from 1..1 to 1..*) rather than describing all of the things we want the system to do (changing the db schema, copying the data, cleaning up the schema, modifying the code, the docs and the tests).

What do you think? Obviously this is non-trivial to solve in the general case, but I've been working a lot with my Software Product Line and for a useful set of transformations this seems like an approach that might be implementable.

I have been fascinated ever since I read a number of papers on transformation based program development while doing background research onto my "automated evolution of statements in evolving domain specific languages" paper for the DSM forum at ooPSLA last year, but I'm really starting to think that a mainstream capability to allow you to express intent in terms of the required deltas could be incredibly useful. After all, the intent is to "change relationship" and that would be an incredibly concise way to express that intent.

Putting aside (for a moment) the practicalities of implementing such a system (expect some more postings on that over the summer when I get the time to work on this for real), how would you like to be able to simply express the changes required to your application and to have the changes in the underlying artifacts implemented automatically?

Right now I see it as something I would use for applying transformations to models rather than code. So if you wanted to change the relationship between business objects, I could automate that as I describe those relationships using a DSL so there are lots of techniques I could apply to implement the changes to the model (and therefore to the underlying artifacts that the model generates) automatically. Writing something that would parse and apply transformations automatically to code written in a 3GL would be much more difficult (one more reason why writing code in a 3GL should be a last resort rather than a first choice for implementing most application intent).

Thoughts?

Comments
Migrations for code? Sounds interesting. But I agree - I don't think it's possible in the general space - you'd need to /tightly/ define the constraints and my initial reaction is that you couldn't be anywhere near Turing complete.
# Posted By Sammy Larbi | 3/26/08 10:07 AM
Yeah, attempt isn't turing complete, but a DSL for describing transformations in terms of core domain, controller and view concepts that would allow for the automated migration of entire apps which are primarily described in terms of DSLs.
# Posted By Peter Bell | 3/26/08 10:23 AM
@Peter,

As far as this:

"how would you like to be able to simply express the changes required to your application and to have the changes in the underlying artifacts implemented automatically"

I think that you are making a big leap of faith in saying that this can even be done "simply". My gut tells me that there is almost an infinite number of small cases of data transformations:

* Take a phone number and break it into three fields.
* Take a state field - varchar, "New York" and replace it with a "state_id" that maps to a table that has abbreviated states (NY).
* Replace field company name with a company ID, but make sure all company names are spelled correctly such that JP Morgan and J.P. Morgen map to the same ID.
* Take all of the "contact data" out of "Attorney" and create a Contact record that associates to an "Attorney" records so we can re-use contact type information.
* Have "DateUpdated" column update not only when "Attorney" is updated, but also when any associated "PracticeArea" is updated as well.

Just a few things that pop into mind to demonstrate that the implementation of ALL of these are rather simplistic when done manually, right? Even your use cases are pretty simple to implement; in fact, I would argue that reaching right into the database for some of it would be even faster than creating scripts. The difficulty here is not the implementation, but rather describing of what needs to be done in a really "easy way" that can be automated.

I am very curious to see if what people suggest as far as "simply express" such a large problem space.

Is this where you might create some sort of domain specific language? I feel like this concept is so big that I can't even wrap my head around it. I look forward to seeing other people's answers as I do agree this is a problem - that we spend SO much time updating our applications, not building them.
# Posted By Ben Nadel | 3/26/08 10:24 AM
Hi Ben,

There is no question that there are a large number of special and edge cases. The goal of any system such as this is to elegantly dissect the problem space into a small number of large, parameterizable chunks that cover a substantial subset of the problem space.

This really builds on the work I've been doing in describing web applications in terms of DSLs and the work on automating the transformation of statements in evolving DSLs as the same classes of transformation that apply to grammars can also apply to statements within those grammars.

For example, taking the state field and replacing it with a state ID is an "attribute to element" style transformation which is something I've already played with automating.

You can't really just reach into the db as you want something that can be run against a dev, test and production server, so for a single server manually making the change would be faster, but you'd have a broken app for a chunk of time which is often not acceptable.

This is definitely about creatinga DSL for describing the transformations, and I'm hoping I can reuse the work I did on this for grammar evolution. Will keep you posted when I get a chance to play with it . . .
# Posted By Peter Bell | 3/26/08 10:37 AM
I would think the key would be to generalize a database schema - no mean feat in an of itself. Somehow create a data schema whereby all fields were standardized and the data structure was standardized based on a model-entity relationship structure. Then, assuming you were strict with your apps in adhering to that form, you could /theoretically/ create a generic transformation of relationships, or even just use a descriptor language to define the app, and have your meta-app actually create the db tables, model code, etc. Though, as Ben pointed out - dirty data would kill you faster than anything - JP Morgan vs. J.P. Morgan.
# Posted By Jeremy French | 3/26/08 10:40 AM
@Peter,

Good point about the dev / production environments; I always forget about that. I spend so much time in the dev world, I forget that things done there also need to be done in production :)

I am looking forward to getting a better feel for this DSL. I spend so much time in the nitty gritty of things that I lose the ability to see things Abstractly.
# Posted By Ben Nadel | 3/26/08 10:43 AM
Hi Jeremy,

Thanks for the comment! The starting point I have used was to come up with a standard grammar for describing business objects, properties and relationships. The db schema is indeed generated based on the object model (with annotations reflecting the preferred for of table inheritance and distinguishing calculated from persisted properties). So the implementation would apply transformations to the model (which would flow down to the code, docs, tests and db schema) and to the data.

As for dirty data, there are always classes of transformations that require human assist, so the app would have to be able to run a non-intrusive pass to determine whether there would be any issues and then provide a UI or a set of heuristics for working with elements of the transformation that would need human intervention. An example in grammar evolution is that while add transforms can usually be automated, remove transforms often break applications (which is why deprecation is such an important concept when working with languages). The solution is to run a pass to report on whether the deleted element is used anywhere within a given application being migrated and allowing for some kind of UI for making case by case decisions for remediation where a delete causes problems with a given app.
# Posted By Peter Bell | 3/26/08 10:46 AM
I suppose if you can control your requirements (and perhaps being the great Peter Bell you can since you probably reject more work than many of us combined get offered), this would be easier accomplished.

My DRY method is to have a vanilla code generator. Then have a generator that makes the special modifications for an applications requirement. That way, all of the redundant tasks are seperate from the special features tasks.

As for modifying an existing application, I suppose it would be just as good to create a generator that creates the changes.... but i would think that would have to integrate into the special feature generator somehow. I'm not real sure on this one.

I think there is a lot of similarity, but what you describe is much more robust. Is it achievable? I think it is. At what cost? Please let me know.
# Posted By shag | 3/26/08 11:36 AM
@Shag,

As I work with resellers, I often have to take on projects to keep the reseller happy that I might not otherwise accept, so while I have plenty of options, I often have to work with some funky requirements. That's why all my DSLs allow for custom code to extend them. This solution wouldn't work for the custom code, but for most clients that is a small part of the project so not a big deal and for the other clients, I inform them that the custom code will make maintenance more expensive, so the trade off is there to make.

As for how easy, quick or cheap to build this system is, I have no idea, but I'll certainly keep everyone posted!
# Posted By Peter Bell | 3/26/08 12:03 PM
This is a great idea. Change control is big in the world of project management with methodologies such as Prince2 and there is often a big disconnect between project management and development when it comes to requirement changes.

In reply to Jeremy French's comments there has been quite a lot of work done on this in the area of Web MVC Frameworks. Ruby on Rails has migrations.
http://wiki.rubyonrails.org/rails/pages/Understand...

The Cakephp MVC Framework has competing suggestions for this. I have to say that I agree with the post linked to below on this subject that the YAML makes a much better notation for this than the rather obtuse format of PHP Arrays.
http://joelmoss.info/switchboard/blog/2562:Migrations_making_their_way_into_the_CakePHP_core

Clean, well though out and tested notation is absolutely vital to the success of this whole idea as we humans are not very familar with analysing data on the deltas of a system. I don't know if YAML is part of the answer, but certainly some notation of the direction and order of changes is important. I think a tool to filter the data would be vital as in a sense what the developer has to do is unit-test and debug the changes not code them.

One thing I will say about this idea is that if you create what you might describe as delta use-cases, i.e. the changes that you can expect then they are THE test cases for the correctness of a given architecture for a system. Within bounds one could say that any system architecture is usable for any requirements set (don't quote me on that though!), it is when the requirements change that the suitability of a given architecture is tested.
# Posted By thatstephen | 3/31/08 11:34 AM
Hello Peter Bell and others. My friend Shag ask me to look at this conversation and I want to try and contribute if I can.

First of all, "how would you like to be able to simply express the changes required to your application and to have the changes in the underlying artifacts implemented automatically?" my answer is "I would like that very much". Through out my application development career, I have been trying to solve this very problem.

I'm afraid I am not up to speed with the majority of terminology going back and forth, so I don't know how to map what I have done on this problem to what y'all have done. So, if I may briefly describe my current approach, could someone kindly translate, then maybe I can contribute better...

My approach to this problem has been 2 fold, from the "top" and from the "bottom", and I have not met in the "middle" yet. But, am I working on the same problem as you, and does this contribute?

From the top: I use a structured SDLC, including Planning, Analysis, Design, Implementation and Support. (I know this is really out-dated, but it still works in my environment). Each of these stages produces design documents. So, I made a web-based interface to define all of the documents in xml, and then I made code generators to produce database definitions and source code from these documents. (From a practical point of view, there is still "tweeking" to be done, but I am able to generate a lot of code up front).

Now, suppose you take these xml design documents, that implement my SDLC, together with the generated database definition and source code, zip it all together and call it version A. Now, when it's time to make a change, I start at the top with the first design document, and flow down through all the other documents making changes, then re-generate the database and source code, and what you have is version B and the difference between the two is the Total Change for the new feature or bug fix. Because the dosuments are in xml, I think it is possible to come up with an explicitely defined transformation between version A and B. Is this similar to what you are working on?

From the bottom: on my job I extend the functionality of our main ERP with new modules. This means logically extending the ERP object model with new attributes and relationships, independent of the ERP.

I have designed something called "Bmod" or "Business Modeler" which I call an "arbitrarily complex object relational database". This tool allows me to create what-ever business model I need through a web interface, independent of the ERP business model, yet dynamically connected to it so that I don't have to store too-much redundant data, and so that changes in the ERP automatically flow into my extended modules.

In SQL Server, there are ONLY TWO MAIN TABLES: Objects and Attributes. Through a web interface I define virtual tables, and then behind-the-scenes SQL stored procedures dynamically generate flat tables that match the virtual definitions in the Objects and Attributes. These flat tables are a COPY of the source data housed in the ERP system and the Objects and Attributes, which are then used by client applications.

But if at any time I change one or more virtual table definitions, including how tables are related together, then ALL of the flat tables are automatically regenerated.

That's not all. Only attributes beyond my ERP are stored by Bmod. For example, in the ERP are Parts, and there are X attributes tracked, but I need X + 2 for my custom module, so I define a virtual table in Bmod that links to X attributes in the ERP, but then stores the other 2 in the main Attributes table. Then, when Bmod creates the flat tables, it goes first to the ERP to get the X attributes, then to the internal Attribute table for the extra 2, and generates a new flat table with X + 2.

I could on-and-on about other Bmod details, but for now, is this work I have done related to your over-all goal, and how does my SDLC code generator and my Bmod database tool relate to the concepts you have been using?

Thanks!
# Posted By Lance Denton | 4/2/08 4:29 PM
@Lance,

Sounds like you've been working on a very similar kind of problem. You're right that having the requirements in XML makes it fairly straightforwards to apply transformations to the requirements to implement the delta in requirements, and I also have a somewhat similar system where I generate code and db tables based on declaratively described business objects, attributes, behaviors, relationships, validations, etc.

Would love to learn more about your system at some time! Are there any conferences you're planning on attending this year?
# Posted By Peter Bell | 4/2/08 4:52 PM
No conferences scheduled. Just lofty goals for pushing monster projects through...

I've been casually looking for a free, yet reliable wiki site to perhaps share the design of the Bmod tool, as I think there is more to it than just solving my current business problems. Any suggestions?

btw - my company I work for does global supply chain management, and we are constantly changing the way we do business to stay competative, so our business models and applications are always changing. This is the environment where I am working to solve this topic of conversation. So, it's not IT theory to me, but real business survival.

Lance
# Posted By Lance Denton | 4/2/08 5:16 PM
@Lance, Know exactly how that goes. I'm completely at the other end of the spectrum - generating lots of relatively simple web applications, but creating the tooling for generating them all efficiently has required me to go deep into IT theory - just to be able to achieve the business goals (which is what got me into this in the first place as a business owner).
# Posted By Peter Bell | 4/2/08 5:20 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.