By Peter Bell

What Should a Persistence Framework Do?

At its simplest, the goal of a persistence framework is to simplify and speed up the process of handling persistence of your applications. What a persistence framework needs to do depends heavily on your use cases. Here are the core problems I would personally like a persistence framework to solve . . .

Write My SQL
I don't want to have to write my select, insert, update and delete statements. I just want to provide a little metadata describing how my objects are persisted and then I want my persistence framework to create the appropriate SQL (whether through pre-generation, runtime generation or runtime code synthesis).

Create Joins Automatically for Has-one Relationships
If a User has-one Boss, I'd like to be able to ask for FirstName, LastName and Boss.Title and have my persistence framework generate the appropriate left-outer-join code for me.

Aggregate Subqueries
If a User has-many Auctions and I want a list of FirstName, LastName and AuctionsTotalWon, a little bit of metadata describing the has-many aggregate operation (AuctionsTotalWon = Count(AuctionsWon) or TotalAuctionSpend = Sum(AuctionsWon.Price)) and would like the persistence framework to generate the appropriate SQL to return that in a single statement.

Get Associated
There should be a facility for me to ask an object to getAssociated(RelationshipName) and get back an IBO loaded with the associated object (has-one) or 0..* objects (has-many) so I can easily Order.getAssociated("OrderItem") or User.getAssociated("Address") - ideally with support for specializing the filter and overloading the default order by so if I want to get the addresses for a user ordered by three different properties, I don't have to define three different relationships between users and their addresses.

Support Paging
I need all list queries to be able to accept a page number and number of records per page so that I can easily get paged lists.

Allow for Versioning
I tend to build version and soft-delete capable cms's a lot of the time, so I need to be able to simply describe my versioning and delete rules in one place and have those applied to all of the objects implementing that versioning/deletion strategy.

Cascading Deletes
If I delete an object, any composed (as opposed to associated) sub-objects or joins should also be deleted automatically. For example, if a Order has-many OrderItem (composed), deleting the order would delete the related OrderItems. If a User has-many Roles (associated - a security role may be shared by more than one User), the Roles wouldn't be deleted with with User. In the case of a many-many (Product many-to-many Catgeory), if I delete a Product, it should delete the Product and the ProducttoCategory joining records, but not the Category.

Nice to Have's
In addition to the above items which for me are requirements, there are a number of other things that might be nice from a persistence framework:

Handle Column Aliases
This isn't a biggie for me, but if you're working with legacy db's it might be nice to be able to refer to the FNAMEVCHAR150 column as FirstName instead.

Support Inheritance
The ideal system would support single table per class hierarchy, table per class (joins) and table per concrete class inheritance strategies. At the very least it needs to support single table inheritance with a boolean field for each concrete class so a record can be of more than one type (a company might be both a vendor and a customer).

Value Objects
Value objects (those that are defined only by their values and that don't have an identity per se) are often best composed into other tables. If an Order has a BillingAddress which is a value object (it's a simple commerce system without an address book feature and you don't want it to have a lifespan or editing profile different from that of the order), you want to be able to persist the BillingAddress within the Order table.

Provide Caching
Some kind of caching mechanism for improving performance would be useful.

Not So Sure's
There are also some features that I've not had much of a need for to date.

Cascading Saves
I don't often find myself persisting an object graph. Usually I'm saving an object (its direct properties) or adding, editing or deleting one or more related objects. The only time I would find a cascading save useful was if I had value objects that I wanted to compose within the same row as their parent (e.g. BillingAddress within Order in a simple commerce system where there was no Address book feature).

What else do you want out of a persistence framework? What do you find in such frameworks that you don't need? Input appreciated.

Comments
Personally, I think the phrase "persistence framework" has it backwards -- it's really a de-persistance framework.

Take a step back and look at the low level of adoption of object databases and even technologies such as object serialization in Java. The deadliest aspect of the "impedance mismatch" between databases and relational systems is really a difference between the kind of data structures that can live in memory in an application and the kind of persistent data structures that last a long time on disk and will need to be gradually migrated over time to reflect changing business realities.

The strategy that works best in the long term is to start with the design of the persistent data structures and work outward: so it's more of a matter of extending the domain of SQL into object systems rather than extending object systems into the database.

Minimizing configuration is important. I've developed a "passive record" in PHP that configures itself by introspecting the database. When I'm doing rapid development projects, I just don't have time to maintain multiple artifacts to describe the database structure. The experience of working with symfony plug-ins, in particular, shows how ORM configuration files doesn't scale.

I've been thinking about Microsoft's experience with LINQ, which uses some really clever ideas to integrate SQL queries with languages such as C# and VB. LINQ is successful because it's ducked the hard questions which may not be answerable: it supports a simple 1-1 mapping between object structures and database structures and does it with flair. On the other hand, developers are cold on the Entities framework which is trying to tackle the general ORM problem -- ultimately because that problem doesn't have a solution, or rather, beyond a certainly level of complexity, ORM systems have a way of introducing more problems than they solve.
# Posted By Paul Houle | 7/14/08 10:59 AM
@Paul, Always nice to have a reasoned, thoughful comment I can completely disagree with :-)

I also deal with RAD development of systems, but do so by describing the domain model and using that (with some annotations) to generate the db schema as well as the code to interact with it. I find the object model way more useful for developing non-trivial applications than starting with a schema.

I've seen Hibernate used in some pretty large scale systems. How specifically do you find that it doesn't scale?
# Posted By Peter Bell | 7/14/08 11:16 AM
Peter,

Have you checked out my Groovy/Hibernate project for CF? http://www.barneyb.com/barneyblog/projects/cf-groo...? It handles every item on your list, and while I doesn't let you persist CFCs, if you squint your eyes, Groovy classes sure look like CFCs written in CFSCRIPT. Plus they don't have a lot of the syntactic problems that CFML has.

Regarding Paul's comment of eliminating duplication, Hibernate supports that very well, I think. You have your entities, you have your mapping, and you have your schema. If they're the same, you only need entities, and Hibernate will figure out the mappings and even build your schema for you. So you only need use what you actually need.
# Posted By Barney | 7/14/08 12:20 PM
@Barney, Most of what I'm doing here is reviewing the latest "Java Persistence with Hibernate" book and I'm trying to identify which of the features of hibernate I need along with which ones I don't. Then I'll lock my next version of my persistence API. From there I'll then determine the best way to implement the API. Hibernate is definitely on the short list, although having written a simple data mapper that does much of what I need, my own simple little framework might also be an option.

I like that Hibernate/nHibernate is available in most languages as eventually I need to support n-3gl's with my solution, but I'll reserve the decision on implementation until I've locked my API and then I might do a quick spike both ways to see what's quickest/easiest.

I have definitely been watching your Groovy/Hibernate work and will no doubt be pinging you with questions about that as it looks great and Groovy was on my short list anyway! Interestingly I'm not seeing the adoption of Groovy in the Java world that I'd expected (and the cool kids are already onto Scala already :-) ) but I like the promise of what it offers, so I'll definitely give your stuff a play!
# Posted By Peter Bell | 7/14/08 12:28 PM
@Paul: I'm right there with you in regards to starting with the DB and working outward. Sorry Peter!

I've created a homebrewed CF ORM that does database introspection to generate objects on the fly, with all the column names from the associated db table as properties all set up. It also uses Peter's awesome IBO methodology to step around CF's object instantiation performance penalty.

The huge bonus is that I can drop this ORM into any project that already has a db established, and get up and running creating, manipulating and persisting objects right away. And if I add a column to a table, I just need to refresh the application and I have the property available to me without any additional changes.

Peter's approach would be great if you were starting from scratch in a project, I.E. no database existed yet, but I'm not often in that situation.
# Posted By Josh Nathanson | 7/14/08 3:38 PM
@Josh, I don't do a lot of brown field development (existing db), but if I did, my answer would be a one time import from the db schema to provide a model for the application that I could then add richness to. For all the reasons I've mentioned before, I don't think the db is a good source of metadata - it only know that you have a varchar(9) when really it's a SSN so it isn't smart enough to generate a rich admin UI, rich validations and the like as the metadata just isn't specific enough. I find I can save a LOT of coding by spending a little more time annotating my model with richer metadata about the intent of properties, relationships, validations, etc. If your workflow was based on changing fields in the db, you'd have to write a script to capture deltas in the db schema, but personally, I'd rather describe changes in the model, introspect the db to see what is different and then either transform the db schema or generate the SQL for the migration to give to your dba.
# Posted By Peter Bell | 7/14/08 3:43 PM
Well being in the position of having created an ORM myself, dating all the way back to CF5, I'm not sure how helpful my input would be, given its obvious bias. I will say however that the tools that have become DataFaucet recently do and have done most of these things for a long while. At a minimum I'm going to add the paging feature you described as a "must have", not because you described it that way but simply because it makes sense given how common that is in general web development. I may also add some features for handling composed objects like the BillingAddress you described using a varchar or clob column and an XML packet (although it's hard to imagine not wanting a separate table / object if the data is going to be long enough to want a clob).
# Posted By ike | 7/14/08 9:55 PM
"I don't think the db is a good source of metadata - it only know that you have a varchar(9) when really it's a SSN so it isn't smart enough to generate a rich admin UI, rich validations and the like as the metadata just isn't specific enough."

I have to agree with Peter on this, I like to import the important metadata from the database and add the richness to it also. I have a generator that i've been perfecting along the way and making it smarter each revision so that it can generate more of the work. At the moment, the most time consuming task is selecting validation types (server and client side) and a few other config items that I like to have in place.

@Peter, I'm glad to see I have a simliar approach to what you described in the last comment, I remember asking you this on IM regarding storing the metadata in the db and it has been a great help!
# Posted By Hatem Jaber | 7/14/08 10:00 PM
When I said that ORM configuration files "don't scale", I was talking about a specific problem that turns up in the symfony framework and other places where there are multiple modules created and managed by different entities.

The configuration management problems involved are a bear: to add one column to a table I had to fork the project for a plug-in, and I need to do a complex and error-prone procedure to make sure that the database, models and everything are synchronized.

I'm obsessive about having the ability to maintain multiple development, test and production servers (Requirement 0.) To support that, I need to be able to migrate the system between versions by a procedure that's almost entirely automated. I generally do that with ruby-influenced migration files: these are generally SQL scripts, but they can be programs written in another language if something can't be easily expressed in SQL.

I see configuration management as the most basic technology underlying a web system. I find that open-source and commercial web applications and frameworks are often sorely deficient in that area. For instance, I really should be able to make a clone of my wordpress blog at another URL in just two minutes, just by copying the files, copying the database, and changing

1) the database name
2) the location of files on the disk
3) the root url

Instead, wordpress encodes full paths to url's throughout the database, so you need to do a set of undocumented operations on the database to do this kind of migration.

If any web framework makes it hard to do CM the way I want, it can jump in the lake for all I care.
# Posted By Paul Houle | 7/15/08 10:12 AM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.