By Peter Bell

OO and the DB – Approaches to ORMs

In an ideal world, all of our objects would just exist. No databases, no joins, no discussions over table inheritance. We’d just go to an object and the data we want would be there.

This is not an ideal world.

There is no such thing as the perfect ORM as database access is one of the tasks that has the greatest impact on application performance (this is more than just premature optimization) and every application wants different information in a different way. I want to look in this posting at some broad approaches to retrieving relational data for an OO application and some of the implications of each.

If you were an OO programmer who’d never seen a database (if such a creature existed), you might well decide that ORM is easy. Every time someone calls a getter or setter, just generate the SQL to get or set that particular property (assuming you want to persist the current state rather than waiting for a “save” method call) and you’re done. Fact is, such an ORM WOULD be pretty easy to write. But if you wanted to display 8 attributes for a list of 50 records, you’d also have 400 sequential database calls which might have a small impact on your page performance (think “click” – make tea, drink tea, come back in time to watch page load).

At the other extreme, you might suggest just fully loading an object, doing whatever you want and then persisting everything when a save method is called. Unfortunately, that also has a (usually unacceptable) performance penalty. Let’s say you want to view a customer. But a customer has 200 invoices, each of which has 20 products and a couple of addresses. A customer also has a three hundred employees each of which as a home and work address in a separate address table. This is a pretty simple object, yet returning it naively (assuming n-level deep loading) would require 1 customer query, 200 “products in invoice” queries, 200 “invoice address” queries, and 300 “employee address” queries. Give that you only wanted to check the customers fax number, waiting for 701 queries to be performed doesn’t seem to be ideal.

And thus was born lazy loading. Lots of implementations are available, but a common one is where (in the above case) you’d just run the customer query, and only if your code called MyCustomer.getInvoices() would you run the invoice query. And then only if your code called MyCustomer.getInvoice(17).getAddresses() would your code get the addresses for invoice 17. This is still not ideal as if you wanted the addresses for a bunch of invoices it might be more efficient to write a single query to return addresses for a group of invoices, but it’s kind of like democracy – the least bad system we’ve managed to come up with so far!

However, there are a couple of use cases that I also want to address in my “baby ORM” (just enough code to keep my clients websites working). One is partially loaded objects and the other is accessing aggregates.

Selective Selects
If we go back to the world we knew before OO, we knew that Select * was bad. Even “Select” followed by a list of all of the field names was only to be used if you NEEDED all of the field names. If I want to return a list of 40 page titles for an admin system there is no reason for me to return the HTML for every page so I’d Select PageID,Title, not Select PageID,Title,HTML. I’d like my ORM to do the same, allowing me to select a subset of my attribute name list to return in a given method.

Appropriate Aggregation
Another problem I often have is that I may need to return a list of categories and the number of products in each one. Using an ORM this can often create an n+1 query problem where if I have 12 categories, I need to run 13 queries, making this unacceptably non-performant.

If I was just writing the SQL, I’d just use a subquery in the select statement (warning: this works great in MSSQL, I’ve been told you’re better to use a join in MySQL). It might look something like:

SELECT cat.categoryTitle,cat.CategoryID, ( SELECT count(prod.pordictID) FROM tbl_Product prod WHERE prod.productCategoryID = cat.CategoryID ) AS productCount FROM tbl_Category cat

(my thanks to Al Davidson for the code snippet).

I just think it would be nice for me to be able to pass the name of any property of any associated object and an aggregate operation (avg, count, sum, min, max or valuelist*) to my ORM and get back that field as part of a single query using an autogenerated subquery. So, if a Category has many products, I could just ask for Category.Title and Product.ID.Count() to get my list of category titles along with the number of products in each.

* Yeah, I know valuelist isn't a standard db aggregate, but you can return a comma delimited list of (say) product titles within a category (which is a use case that comes up quite often) using Coalesce in MSSQL.

Related Queries
The above features are cool and I’ll try to fit them in later this week, but the starting point for an ORM is related queries. getProductsinCategory(), getUsersinCompany(), etc. Relationships can be has-one or has-many, may or may not include an intermediate joining table (which may or may not have associated data in that table) and must have some kind of filter defined for the relationship (almost always based on a “foreign key” – whether or not it’s defined as a FK in the db schema).

If you look at related queries, the first question is where should they go? Obviously if I want the products in a category I should go ask the category service (or object). I should call CategoryService.getProductsinCategory(CategoryID) or Category.getProductsinCategory() (which I find a little easier and more elegant but plenty of people dislike). What should it do? I won’t spend a lot of time on this, but I keep my DAOs protected (package access so only UserService or UserObject can call UserDAO and actually I route all calls through #ObjectName#Service) and as much as possible I don’t want the category DAO to have to know about the product table (the aggregates above are the one exception to this for performance purposes), so I basically just get Category.getProductsinCategory() to call ProductService.getProductsinCategory(CategoryID) which calls the ProductDAO that returns the necessary products. Specifically the product DAO returns a recordset, the Product service loads that into an IBO and then returns the IBO to the category (or category service) that requested it.

The really nice thing about this approach is that it just requires a single base DAO method and a single IBO getAssociated() method. The ProductService already has a base getbyFilter (which is all this really is), so without intermediate joining tables I only need to write two methods and then set a bunch of metadata to have about 10-20% of the benefits of a decent ORM. For intermediate joining tables I’ll probably need another method or two.

Comments
Of course, if anyone wants to check out a real ORM (who hasn;t done so already), start with Doug Hughes' Reactor, then compare and contrast Mark Mandells Transfer. Finally have a look at Hibernate in the Java world to see what is probably the most comprehensive ORM around. You could even ask Kurt W. whether cfhibernate is still in play - I've kind of lost touch with that since CF United.
# Posted By Peter Bell | 10/3/06 7:25 AM
And of course Nic Tunneys ObjectBreeze. Not 100% sure if that is still being developed/supported though.
# Posted By Peter Bell | 10/3/06 7:26 AM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.