Rethinking the Data Access Layer
In this short series I want to examine exactly what I need out of a DAL and how I plan to approach solving this problem in a somewhat different way that Steve, Mark or Doug have taken . . .
Speaking in Tongues
What do I need my DAL to do? Well, firstly I need to abstract SQL specific differences between MySQL and MSSQL, so I need it to take declaritive query requirements that can be expressed cross platform and to transform those into SQL specific dialiects if necessary/appropriate.
Get Associations
Secondly, the most common requirement is to get associated has-one or has-many relationships. If a category has-many products, I want to have something like a CategoryService.getAssociatedProducts(CategoryID) for getting all of the products associated to that category.
Cascading Deletes
I also need the ability for deleting composed objects and object specific joins. If I delete product 3, I also want to delete all of the ProducttoCategory joins for product 3. If I delete a user and addresses are composed as opposed to associated in my object model for a given project, I want to delete their addresses (but not associated objects like the company that they work for which could be associated to multiple users).
Manage Joins
I also need my DAL to manage joins, so if I save a product with a ProductCategoryIDList of 2,4,5,6, I probably need to write four records to a ProducttoCategory table as well as saving the ID list (for easier editing) in the Product table. I also need a strategy for rich joining tables - such as a ProducttoAttributeOptions table which not only allows you to associate the colors red, green and blue to a given product, but also to change the weight, price or SKU based on the option picked by an end user.
Magic Fields
I'm also going to be adding some "magic fields" and naming conventions for my own specific use case to keep track of last updated, created date, who edited the record, who owns it and also a bunch of versioning and revisioning features that I happen to need and will wrap either in or (more likely) just above my DAL.
I considered seriously implementing mapping of property names to database column names, but decided that my use case is for green field apps, so while it is fairly easy to implement, it is a layer of complexity I don't need.
Everything I Need - and Nothing More
True ORMs take on the extremely difficult job of automatically mapping between the object and the relational paradigms. With ORMs you can save a fully loaded object and it'll do all of the necessary inserts and updates to all of the affected tables. However (and maybe it is just me), I really haven't come across a use case where I needed the full power of an ORM.
There are three times I feel like an ORM would be useful. When I want to get an object loaded with some or all of its associated objects (I want to display a category and want its associated products and subcategories), when I want to auto populate joining tables (when I save a product and want to save the ProducttoCategory joins), and when I want to automate the clean up of deleting join records and composed objects on deleting a specific object. Other than those three requirements (and the ability to speak different dialects of SQL), I don't find myself needing a full blown ORM. For instance, while I may need to get the addresses associated to a user, when it is time to save them - I can just save them. I don't need to save them as part of the user - I can just insert or update them based on their unique identifier. Loading them into a User object and then User.save()ing is more work than I need to do to just to ensure that AddressUserID is set correctly.
Any thoughts? Are there any other things you find that your ORM does for you that you couldn't easily achieve in another way? (I'm talking about support for query types - caching and other such features are a discussion/addition for another day).



In fact, the not needing to write the CRUD queries is in and of itself not the problem ORMs were meant to solve IMO. They were meant to solve the often complicated code required to translate the relationships in your database into the relationships between your objects - it handles CRUD more or less as a means to achieve this. I mean if all you want is your CRUD handled, you could simply use a code generator right?
My point is, the requirements you listed in your next to last paragraph are *exactly* what ORM is about.
My point is not that ORMs do this, but that most ORM implementations do *more* than this.
Make sense?
Good point. I think there are a number of optimizations that can be implemented depending on your use cases. I'm implementing everything fairly naively and can always re-write the templates to generate more within the db layer (e.g. a bunch of triggers), but then you get into potentially db specific SQL generation, so that for me is definitely a phase 2 undertaking - not least because the systems I'm currently working on don't have horribly large cascading deletes or frequent delete actions (it is something an admin does occasionally - not that a user does regularly).
I couldn't help myself from responding to how well DataMgr meets these needs.
http://steve.coldfusionjournal.com/datamgr_rethink...
I'll say that I think DataMgr is a much better approach than code generation. A code generator still leaves you having to look through a bunch of SQL to perform a simple task and still requires you to write a bunch of trivial SQL for simple changes.
As to handling cascading deletions in the database. This works very well if you know what database you are using and if you always use hard deletes (often a risky move), but if not then you need to handle that in your code somewhere.
Definitely a thought that keeps running through my mind!!! Can you remind me - does DataMgr support non-integer ID's such as ProductSKU or UserEmail as unique identifiers and for joins?
To me an ORM should do two thing, speed up the mondain task of creating CRUD statements and beans.
I see no need for caching and all the other things that the ORM's out there throw in. It gets very hard to try and use them especially when there is little or no documentation for most of them.
I'm already thinking about borrowing some code from Reactor and creating my own simple basic ORM to distribute.
To put it another way, key equality is a relational concept, so if your ORM leaves you relying on key equality it hasn't quite mapped all the way - it has left you with part of the relational world protruding into your object world.
- Jaime
DataMgr should be able to support a primary key of any data type. I have only tried it with integers and GUIDs myself, but I can't think of any reason why other data types wouldn't work as well (though DataMgr couldn't create primary key values for other data types).
Tony,
Although I haven't played with any ORM yet, Transfer does actually have pretty good documentation.
I'll give it a good go tomorrow and if I'm able to do it, I will definitely blog about how it goes . . .
I've been doing development for quite a long time, and in all of those years, I've only had to migrate an application to a different database platform twice. I understand the 'cool' factor of this feature - but in the real world I don't think it's terribly useful. Granted, I don't distribute any products.
>As to handling cascading deletions in the database. This works very well if you know what
>database you are using and if you always use hard deletes (often a risky move), but if not
>then you need to handle that in your code somewhere.
If I'm using a database, I *always* know what kind it is. I can hardly imagine meeting a CF developer who had no idea what kind of database they were using on a particular project. I just think that's a preposterous qualifier for an argument.
At the risk of being nit-picky...having the database handle cascading delete does not necessarily imply hard deletes or out-of-the-box DBMS functionality (i.e. ON DELETE CASCADE). I agree that in some situations you'll have to handle this in 'your code somewhere.' For me, that code is in stored procedures, not CF. It just as easy to generate SPs as it is CFCs or ad-hoc queries.
Sean Corfield asked a question at one of his presos - how many of you have had to move a project from one db platform to another. Believe it or not, about 60% of the hands went up.
However, I don't see that as the main benefit (although now I'm deving on both Mac/MySQL and Parallels/MSSQL it is more important than it used to be). I like the auto joining, the db schema creation and the db sim features. That said I can completely see lots of use cases where none of those would be relevant. No app works for all use cases, but there are definite use cases where DataMgr rocks. Same goes for Transfer or Reactor or ObjectBreeze - although they would be different sets of use cases that may just overlap to some extent.
Yeah, I couldn't figure out how DataMgr could be faster than cfquery either. After all, it just ends up running cfquery internally. Probably some weird thing where one cfquery just happened to run slow and throw the average off. I certainly don't mean to suggest code will run *faster* with DataMgr.
I will try to put together a good performance test application so that you can see the test code and try it yourself.
I'm sure SPs are easy for cascading logical deletes. For me, I liked that I could quickly put together an application with that feature that could run on any database DataMgr supports. Then if someone wants my application to run on their database, they just have to add support for that database to DataMgr (by adding a new file of about 250 lines of code).
As Peter said, DataMgr isn't the right solution for every problem - just one more tool available.
Steve - I wholeheartedly agree with you, writing less code is a good thing. Getting a project off the ground quickly is a good thing. DataMgr and the like certainly make people's lives easier, including mine. That being said, I don't think that custom, app-specific, written-from-the-ground DALs are a bad thing. I think I'm shuttling this discussion off it's original topic.
@peter - I was looking at the Managejoins requirement and I wonder about storing join ids in the product(eg) table as a delimited list. Wouldn't like to manage deleteProductCategory function, although I might be missing something
I've been using this system to do all these things across Access, SQL Server, MySQL and Oracle for a while now, and with dare I say more intuitive code than Reactor. The guts of CategoryService.getAssociatedProducts for example would be something like getDatasource().getStatement("select").init("ProductTable").join("ProductToCategoryTable").filter("categoryid",arguments.categoryid).execute()
Cascading deletes are handled automatically in the DAO (ActiveRecord) object unless you specifically disable them.
The system is also designed with named-locks based on table names to eliminate the race condition in which one user attempts to update a related record with a foreign key that another user is simultaneously deleting (which normally results in the expected error from the database to maintain data integrity the way the foreign key is supposed to). In the event that the foreign key is not a required field it simply nulls the field, otherwise it throws a custom error that's easier to handle to allow you to determine how to tell the user "hey, we think x got deleted, pick something else". Using the table name and primary key in the lock name ensures that the locks don't appreciably degrade performance.
That said, I have a system in and working quite nicely and I have support built right into it for optional versioning, rollback, soft deletes, approval required, auditing, rollback and the like, so I think I'll probably keep going down this road. Also, I'm really looking for a consstent tool I can just implement in n-languages down the line. Thanks for the comments though!
ds = getDatasource();
del = getStatement("delete").init("table");
subqry = getStatement("select").init("table2","fk").filter("col",id);
del.filter("fk",subqry,"in").execute();
This becomes the query:
DELETE FROM table WHERE fk IN (SELECT fk FROM table2 WHERE col = <cfqueryparam value="#id#" cfsqltype="cf_sql_something">)
Which is one of the other things I really like about the system -- it ensures cfqueryparam in every query and I never have to write out the cfqueryparam tags. :)
Good to know you've got something that works. :) I honestly have no idea how this would be implemented in for example PHP although I can't imagine it would be terribly different. It has to be an OO language (you don't want to know what this looked like when I was doing it with CF5), but beyond that I guess it would only depend on how the language handles ad hoc sql statements.
Unfortunately I've been having problems with the CrystalTech server the framework site is on -- it's a dog honestly, I didn't realize how lame it was before I took it over. It's an old Celeron machine with only half as much RAM as my notebook! Anyway it goes down a lot and I really want to put the framework site on something more recent and reliable but haven't had the cash. But I'll restart it now that I know it's down again. :) Anyway, it's up now and if it goes down you can get the core framework download here: http://www.fusiontap.com/ontap.zip
I myself am recently trying to find funding for a new commercial project.
Yep. PHP probably not on the short list. I'd be picking from Java, C#, Ruby and Python.
Sorry to hear about hosting problems. Will download latest code base now.
Best of luck with the new project!