By Peter Bell

Rethinking the Data Access Layer

There are a number of approaches to implementing a data access layer. You're got procedural tools like Steve Bryants excellent DataMgr and ORM systems like the extremely popular Transfer from Mark Mandell and Reactor from Doug Hughes.

In this short series I want to examine exactly what I need out of a DAL and how I plan to approach solving this problem in a somewhat different way that Steve, Mark or Doug have taken . . .

Speaking in Tongues
What do I need my DAL to do? Well, firstly I need to abstract SQL specific differences between MySQL and MSSQL, so I need it to take declaritive query requirements that can be expressed cross platform and to transform those into SQL specific dialiects if necessary/appropriate.

Get Associations
Secondly, the most common requirement is to get associated has-one or has-many relationships. If a category has-many products, I want to have something like a CategoryService.getAssociatedProducts(CategoryID) for getting all of the products associated to that category.

Cascading Deletes
I also need the ability for deleting composed objects and object specific joins. If I delete product 3, I also want to delete all of the ProducttoCategory joins for product 3. If I delete a user and addresses are composed as opposed to associated in my object model for a given project, I want to delete their addresses (but not associated objects like the company that they work for which could be associated to multiple users).

Manage Joins
I also need my DAL to manage joins, so if I save a product with a ProductCategoryIDList of 2,4,5,6, I probably need to write four records to a ProducttoCategory table as well as saving the ID list (for easier editing) in the Product table. I also need a strategy for rich joining tables - such as a ProducttoAttributeOptions table which not only allows you to associate the colors red, green and blue to a given product, but also to change the weight, price or SKU based on the option picked by an end user.

Magic Fields
I'm also going to be adding some "magic fields" and naming conventions for my own specific use case to keep track of last updated, created date, who edited the record, who owns it and also a bunch of versioning and revisioning features that I happen to need and will wrap either in or (more likely) just above my DAL.

I considered seriously implementing mapping of property names to database column names, but decided that my use case is for green field apps, so while it is fairly easy to implement, it is a layer of complexity I don't need.

Everything I Need - and Nothing More
True ORMs take on the extremely difficult job of automatically mapping between the object and the relational paradigms. With ORMs you can save a fully loaded object and it'll do all of the necessary inserts and updates to all of the affected tables. However (and maybe it is just me), I really haven't come across a use case where I needed the full power of an ORM.

There are three times I feel like an ORM would be useful. When I want to get an object loaded with some or all of its associated objects (I want to display a category and want its associated products and subcategories), when I want to auto populate joining tables (when I save a product and want to save the ProducttoCategory joins), and when I want to automate the clean up of deleting join records and composed objects on deleting a specific object. Other than those three requirements (and the ability to speak different dialects of SQL), I don't find myself needing a full blown ORM. For instance, while I may need to get the addresses associated to a user, when it is time to save them - I can just save them. I don't need to save them as part of the user - I can just insert or update them based on their unique identifier. Loading them into a User object and then User.save()ing is more work than I need to do to just to ensure that AddressUserID is set correctly.

Any thoughts? Are there any other things you find that your ORM does for you that you couldn't easily achieve in another way? (I'm talking about support for query types - caching and other such features are a discussion/addition for another day).

Comments
The requirements you list seem like the basic things an ORM would handle, so I am a little confused on what the difference is between that and a "full-blown ORM". An ORM is after by definition about relating objects data to relational data...it is not actually, at its core, about abstracting your SQL for multiple databases. This to me is a side-benefit of using an ORM that is only made fully realistic with something like Hibernate's HQL, Transfer's TQL and Reactor's OO Queries.

In fact, the not needing to write the CRUD queries is in and of itself not the problem ORMs were meant to solve IMO. They were meant to solve the often complicated code required to translate the relationships in your database into the relationships between your objects - it handles CRUD more or less as a means to achieve this. I mean if all you want is your CRUD handled, you could simply use a code generator right?

My point is, the requirements you listed in your next to last paragraph are *exactly* what ORM is about.
# Posted By Brian Rinaldi | 3/22/07 12:13 PM
Actually, I'm going to disagree with you a little. There is no question that ORMs DO solve those problems, but they also solve a number of other problems for saving fully loaded objects and the like. I'm just pointing out that there are three specific use cases where a DAL of some type is really important to me. Next I'm going to show how I approach solving each of the three problems and what I end up with is something much less flexible or powerful than Transfer or Reactor but that meets my use cases perfectly (actually, one thing I forgot to include was support for aggregate subqueries in the select which neither Transfer or Reactor is terribly focused on but which is a common use case for me).

My point is not that ORMs do this, but that most ORM implementations do *more* than this.

Make sense?
# Posted By Peter Bell | 3/22/07 12:34 PM
Good list. I think that cascading deletes should be implemented in or by the database, not the ORM/DAL. I understand the convenience of doing it in the DAL, particularly when it comes to invalidating object graphs from the cache. This is all fine and good for most use cases; small apps, simple data structures, small data sets. However, in even modestly sized systems, this can give you extremely undesirable performance.
# Posted By Paul | 3/22/07 1:35 PM
Hi Paul,

Good point. I think there are a number of optimizations that can be implemented depending on your use cases. I'm implementing everything fairly naively and can always re-write the templates to generate more within the db layer (e.g. a bunch of triggers), but then you get into potentially db specific SQL generation, so that for me is definitely a phase 2 undertaking - not least because the systems I'm currently working on don't have horribly large cascading deletes or frequent delete actions (it is something an admin does occasionally - not that a user does regularly).
# Posted By Peter Bell | 3/22/07 1:48 PM
Great list!

I couldn't help myself from responding to how well DataMgr meets these needs.
http://steve.coldfusionjournal.com/datamgr_rethink...

I'll say that I think DataMgr is a much better approach than code generation. A code generator still leaves you having to look through a bunch of SQL to perform a simple task and still requires you to write a bunch of trivial SQL for simple changes.

As to handling cascading deletions in the database. This works very well if you know what database you are using and if you always use hard deletes (often a risky move), but if not then you need to handle that in your code somewhere.
# Posted By Steve Bryant | 3/22/07 3:00 PM
Hi Steve,

Definitely a thought that keeps running through my mind!!! Can you remind me - does DataMgr support non-integer ID's such as ProductSKU or UserEmail as unique identifiers and for joins?
# Posted By Peter Bell | 3/22/07 3:53 PM
the biggest problem I see with most ORMs is that they try to over do thing and in the process complicate things to no end.

To me an ORM should do two thing, speed up the mondain task of creating CRUD statements and beans.

I see no need for caching and all the other things that the ORM's out there throw in. It gets very hard to try and use them especially when there is little or no documentation for most of them.

I'm already thinking about borrowing some code from Reactor and creating my own simple basic ORM to distribute.
# Posted By Tony Petruzzi | 3/22/07 3:56 PM
I would add caching to the list. At first glance it seems like an extra layer that isn't intrinsic to the DAL. However, I'd argue that for true ORM, you must be able to rely on object identity and not just key or property equality, and unless your ORM only emits value objects that implies some form of caching.
To put it another way, key equality is a relational concept, so if your ORM leaves you relying on key equality it hasn't quite mapped all the way - it has left you with part of the relational world protruding into your object world.

- Jaime
# Posted By Jaime Metcher | 3/22/07 3:59 PM
Peter,

DataMgr should be able to support a primary key of any data type. I have only tried it with integers and GUIDs myself, but I can't think of any reason why other data types wouldn't work as well (though DataMgr couldn't create primary key values for other data types).

Tony,

Although I haven't played with any ORM yet, Transfer does actually have pretty good documentation.
# Posted By Steve Bryant | 3/22/07 4:09 PM
I'm really hoping to be able to "roll my own" subset of ORM features and to do it sitting on top of the heavy lifting being done by DataMgr as I really like the fact that it is NOT an ORM, so I can wrap my own objects and rules as I want to without actually having to take care of all of the SQL. I also just love the DB simulation and "ActiveSchema" generation of tables on the fly which both would be very helpful in my particular use case (ActiveSchema is probably less useful if you have a DBA who won't let you create or modify tables without them running the script personally).

I'll give it a good go tomorrow and if I'm able to do it, I will definitely blog about how it goes . . .
# Posted By Peter Bell | 3/22/07 4:20 PM
@Jaime, I agree that for a complete ORM, that is a funcitonal requirement, but I don't need an ORM, I just need to write less SQL :->
# Posted By Peter Bell | 3/22/07 4:21 PM
Before I begin, I should say that I haven't used your DataMgr, Steve. I took a quick peak and it looks neat. I do have to admit that I'm a bit incredulous of your performance tests. DataMgr faster than cfquery for selects?? Something is amiss. But I digress...

I've been doing development for quite a long time, and in all of those years, I've only had to migrate an application to a different database platform twice. I understand the 'cool' factor of this feature - but in the real world I don't think it's terribly useful. Granted, I don't distribute any products.

>As to handling cascading deletions in the database. This works very well if you know what
>database you are using and if you always use hard deletes (often a risky move), but if not
>then you need to handle that in your code somewhere.

If I'm using a database, I *always* know what kind it is. I can hardly imagine meeting a CF developer who had no idea what kind of database they were using on a particular project. I just think that's a preposterous qualifier for an argument.

At the risk of being nit-picky...having the database handle cascading delete does not necessarily imply hard deletes or out-of-the-box DBMS functionality (i.e. ON DELETE CASCADE). I agree that in some situations you'll have to handle this in 'your code somewhere.' For me, that code is in stored procedures, not CF. It just as easy to generate SPs as it is CFCs or ad-hoc queries.
# Posted By Paul | 3/22/07 4:23 PM
@Paul,

Sean Corfield asked a question at one of his presos - how many of you have had to move a project from one db platform to another. Believe it or not, about 60% of the hands went up.

However, I don't see that as the main benefit (although now I'm deving on both Mac/MySQL and Parallels/MSSQL it is more important than it used to be). I like the auto joining, the db schema creation and the db sim features. That said I can completely see lots of use cases where none of those would be relevant. No app works for all use cases, but there are definite use cases where DataMgr rocks. Same goes for Transfer or Reactor or ObjectBreeze - although they would be different sets of use cases that may just overlap to some extent.
# Posted By Peter Bell | 3/22/07 4:29 PM
Paul,

Yeah, I couldn't figure out how DataMgr could be faster than cfquery either. After all, it just ends up running cfquery internally. Probably some weird thing where one cfquery just happened to run slow and throw the average off. I certainly don't mean to suggest code will run *faster* with DataMgr.

I will try to put together a good performance test application so that you can see the test code and try it yourself.

I'm sure SPs are easy for cascading logical deletes. For me, I liked that I could quickly put together an application with that feature that could run on any database DataMgr supports. Then if someone wants my application to run on their database, they just have to add support for that database to DataMgr (by adding a new file of about 250 lines of code).

As Peter said, DataMgr isn't the right solution for every problem - just one more tool available.
# Posted By Steve Bryant | 3/22/07 4:41 PM
Peter - 60%! I'm shocked. I'm sure that number would plummet if the survey could reach the non-blog-reading, non-conference-attending coldfusion crowd (which is enormous)...but it's shocking none the less.

Steve - I wholeheartedly agree with you, writing less code is a good thing. Getting a project off the ground quickly is a good thing. DataMgr and the like certainly make people's lives easier, including mine. That being said, I don't think that custom, app-specific, written-from-the-ground DALs are a bad thing. I think I'm shuttling this discussion off it's original topic.
# Posted By Paul | 3/22/07 4:57 PM
@Paul - apart from the scenario of someone migrating their database systems, if you develop applications for distribution you need to provide alternate database platforms. I guess alot of people are in this situation (BLOGCFC is an example)

@peter - I was looking at the Managejoins requirement and I wonder about storing join ids in the product(eg) table as a delimited list. Wouldn't like to manage deleteProductCategory function, although I might be missing something
# Posted By Richard | 3/25/07 11:28 AM
Hi Pete. While you listed DataMgr, Transfer and Reactor, you've overlooked a solution that I personally know does all of these things (with the possible exception of not being entirely certain what is being described in the "Manage Joins" paragraph). And I suppose it could owe a lot to the fact that the ORM solution is a built-in part of the larger onTap framework rather than being offered as a wholly separate API purely for the purpose of ORM. The ORM features in the framework (which I've always described as "sql abstraction") could be refactored as a totally separate system inside of an hour. I could help you with it or maybe just pull it out myself if you're interested in looking it over. Although what is actually an implementation of the Active Record design pattern is described in the framework as a DAO.

I've been using this system to do all these things across Access, SQL Server, MySQL and Oracle for a while now, and with dare I say more intuitive code than Reactor. The guts of CategoryService.getAssociatedProducts for example would be something like getDatasource().getStatement("select").init("ProductTable").join("ProductToCategoryTable").filter("categoryid",arguments.categoryid).execute()
Cascading deletes are handled automatically in the DAO (ActiveRecord) object unless you specifically disable them.

The system is also designed with named-locks based on table names to eliminate the race condition in which one user attempts to update a related record with a foreign key that another user is simultaneously deleting (which normally results in the expected error from the database to maintain data integrity the way the foreign key is supposed to). In the event that the foreign key is not a required field it simply nulls the field, otherwise it throws a custom error that's easier to handle to allow you to determine how to tell the user "hey, we think x got deleted, pick something else". Using the table name and primary key in the lock name ensures that the locks don't appreciably degrade performance.
# Posted By ike | 4/20/07 12:37 PM
Incidentally the onTap implementation also eliminates most or all of the problem Paul is describing in which the cascading deletes wouldn't scale up well. It doesn't load and delete() each object it just creates a query that deletes them all. In some uncommon use cases this may require some additional work to implement the delete for an object, but the DAO is designed with the ability to register listeners for its actions (create, update, delete and read), so you can register a listener on your DAO to perform any special-case cleanup required for your delete while the object is still locked.
# Posted By ike | 4/20/07 12:43 PM
I met Isaac at cf.objetive() last year and have had a look around the onTap framework in the past. The site is down right now, but I will certainly have a look as it is always good to learn from different sources.

That said, I have a system in and working quite nicely and I have support built right into it for optional versioning, rollback, soft deletes, approval required, auditing, rollback and the like, so I think I'll probably keep going down this road. Also, I'm really looking for a consstent tool I can just implement in n-languages down the line. Thanks for the comments though!
# Posted By Peter Bell | 4/20/07 1:02 PM
Ike = Isaac??? If that's you, it was great to meet you and I did enjoy looking around the code base you gave me :-> The way you wrote the comment I thought it was by a third party!
# Posted By Peter Bell | 4/20/07 1:03 PM
Actually part of the way the cascading deletes in the onTap DAO maintain performance as the application scales up is by using the sub-queries you mentioned... Iirc they're fairly simple... here's some pseudocode that should be relatively close (without looking, 'cause I haven't written one of these in a while):

ds = getDatasource();
del = getStatement("delete").init("table");
subqry = getStatement("select").init("table2","fk").filter("col",id);
del.filter("fk",subqry,"in").execute();

This becomes the query:

DELETE FROM table WHERE fk IN (SELECT fk FROM table2 WHERE col = <cfqueryparam value="#id#" cfsqltype="cf_sql_something">)

Which is one of the other things I really like about the system -- it ensures cfqueryparam in every query and I never have to write out the cfqueryparam tags. :)
# Posted By ike | 4/20/07 1:06 PM
Very cool - I'll definitely have to think through approaches to doing that!
# Posted By Peter Bell | 4/20/07 1:13 PM
Hey, yeah, it's me... I just started commenting before I'd finished reading all the comments. :P That's why I commented a couple o' few times. :)

Good to know you've got something that works. :) I honestly have no idea how this would be implemented in for example PHP although I can't imagine it would be terribly different. It has to be an OO language (you don't want to know what this looked like when I was doing it with CF5), but beyond that I guess it would only depend on how the language handles ad hoc sql statements.

Unfortunately I've been having problems with the CrystalTech server the framework site is on -- it's a dog honestly, I didn't realize how lame it was before I took it over. It's an old Celeron machine with only half as much RAM as my notebook! Anyway it goes down a lot and I really want to put the framework site on something more recent and reliable but haven't had the cash. But I'll restart it now that I know it's down again. :) Anyway, it's up now and if it goes down you can get the core framework download here: http://www.fusiontap.com/ontap.zip

I myself am recently trying to find funding for a new commercial project.
# Posted By ike | 4/20/07 1:32 PM
Hi Isaac,

Yep. PHP probably not on the short list. I'd be picking from Java, C#, Ruby and Python.

Sorry to hear about hosting problems. Will download latest code base now.

Best of luck with the new project!
# Posted By Peter Bell | 4/20/07 1:47 PM
Thanks. :) I'm not familiar enough with C# to make any kind of judgements, though I wouldn't think Ruby or Python would be much different or terribly difficult. I wouldn't want to have to do it in Java myself, not that it couldn't be done but merely because Java is so time-consuming (in spite of the success I've had working with it). I will admit that some of the underlying code for it is pretty dense and not always easy to grasp just because it's designed to emulate so much of the language of SQL. But the ultimate goal of it was much like the goals of Ruby / Rails with regard to making it easy to produce platform-independant code and reduce the amount of query-writing. But having emulated much of the language has allowed me to do some really nice things with regard to reducing SQL-writing as well as allowed me to encapsulate some complicated CF-code related to querying, such as statement.andOrFilter("column",form.search) which allows the user to enter and/or keywords in a search form or statement.filterTimeSpan("datecolumn",startdate,enddate) which (assuming I hadn't butchered the syntax) returns all the records where the datecolumn is between the start and end dates. I believe there's even an overlapping filter for finding records that represent overlapping time spans. On the whole it just makes me really happy to have an easy way to reduce my coding footprint, 'cause then I don't have to spend so much time in repetition and it frees me up to work on ergonomics. I imagine from your comment about writing less sql, much the same sort of reason you had for posting the original blog entry. :)
# Posted By ike | 4/20/07 3:44 PM
Sounds nice. Yep. I've been genning my applications since about 2002 - my goal is not to have to write any code at all - although I'm not all the way there yet! I have an old procedural generator that works, but I'm rewriting a full OO gen (for about the twelfth time - I guess practice makes perfect) an I'm just upgrading the way I do ORM.
# Posted By Peter Bell | 4/20/07 3:51 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.