By Peter Bell

ORM: Approaches to Inheritance

Inheritance is one of the core concepts in OO programming, but supporting inheritance in your relational database requires you to make some tough decisions. Implementing an ORM requires support for some or all of those decisions.

Lets say you have a Company abstract class which is extended by concrete Customer and Vendor classes. This means that you have customers and vendors which share certain “company” attributes, but there are no companies in your systems that are not either customers or vendors (or possibly both). How do you store (and retrieve) information for customers and vendors?

There are three solutions to this problem: one table per class, one table per concrete class, and one table per abstract class/class hierarchy. Each has strengths and weaknesses.

One Table Per Class
In this case, you would have a company table for all of the shared properties, a customer table for all extended customer properties and a vendor table for all extended vendor properties. This is an extremely common solution and is often the best way to go. The downside is that to access a customer or vendor, you need to join two tables, so there is a performance hit and if you’re not using an ORM, all of your queries are a little more difficult to write. However, it keeps all of the tables normalized and allows a single company to be both a customer and a vendor. Usually, you distinguish whether a company is a customer and/or a vendor using a simple inner join (as opposed to a left outer join from the company to the customer or vendor tables which would return all companies). The theory is that if a company is a vendor, it’ll have a record in the vendor table so it’ll be returned as part of the inner join and if it doesn’t have such a record it won’t.

One Table Per Abstract Class
This is the opposite of the first approach. In this case you just have a single table for customers and vendors. Your dba will probably commit seppuku before allowing this, however as it can provide for highly denormalized tables. If there are 10 company properties, 10 customer properties and 10 vendor properties, any company that is only a customer or a vendor will have 10 empty fields in the database table which everyone learnt in database design 101 is a really bad thing. To make it worse, you also add an additional bit field for each concrete class, so you have a Vendor bit and a Customer bit so you can easily return all vendors and/or all customers in a query.

One Table Per Concrete Class
In this case you simply have a customer table and a vendor table and put your customers in one and vendors in the other. If there is no overlap between customers or vendors this is the most performant solution, but it is seldom used as there is usually at least the potential of an overlap between the sets which them makes this a messy approach with a real chance of data getting out of synch.

Which to Choose?
If you are absolutely sure there will never be any overlap in membership between concrete classes (a customer will never be a vendor), one table per concrete class is both normalized and performant and is the ideal choice. In practice this is almost never the case.

If you don’t have too many records and want to keep your SQL simple, it is often expedient to have just one table (one table per abstract class/class hierarchy). It is messy, tacky and really works quite well! If you have a large number of records, having to search through all of the records for each query can become a problem, and if you have a lot of concrete classes this becomes impractical as you might have hundreds of unused fields in your base class, but where you can get away with it, this is often more performant and easier to code that the one table per class approach.

If you do have a very large number of records or lots of concrete classes (or a dba who won’t allow you to commit sacrilege in their database), one table per class is often the approach you’ll end up using. In practice I think an ORM must support one table per abstract class/class hierarchy and one table per class. The one table per concrete class is a “nice to have” rather than an essential for most use cases.

Comments
I propose a compromise with the DBA -- one table per class, but put the join in a view so as to simplify the queries.
# Posted By chuck | 12/4/06 5:01 PM
Hi Chuck,

It is definitely an approach, but bear in mind you can't edit and insert into views so you need to have or write an ORM that handles the joins on insert and update.

More importantly, with deep inheritance trees and related objects you can get real performance issues. Imagine getting a list of all of the construction projects an admin user manages where AdminUser extends SiteUser extends User and ConstructionProject extends BusnessProject extends Project. This is a bad example, but real world situations like this come up all the time in complex apps. I've just got a six table join for a simple query!

One table per class has its place, but there is no perfect relational solution to class hierarchies - it's a matter of compromises!
# Posted By Peter Bell | 12/4/06 6:30 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.