By Peter Bell

Integrating Legacy and Web Specific Data

I’m working on an interesting project right now where we need to handle a mixture of legacy data from a small accounting system and web specific data that needs to be added to some products using a web based admin. This post looks at a possible approach to the problem. Got any better ideas?! . . .

Extending Products
For this particular application, the authoritative source for all product data is in their accounting system. Unfortunately, the accounting system doesn’t provide a mechanism for managing things like a WYSIWYG product description, so the client wants to be able to upload products from the ERP system while managing a handful of product fields using a web based admin system.

Complete or Incremental
The main type of variability is that a given application might want to use a complete or an incremental upload. With a complete upload, all products are deleted and re-imported, with an incremental upload, only additions, edits and deletes are transferred. In general terms an incremental upload is more efficient (especially with large data sets) but many smaller clients don’t have an easy way to report on which products have and haven’t been edited or added in a given period of time in their ERP systems. That is certainly true for this client so they are requesting a complete upload on a regular basis.

As a side note, the one issue with an incremental system is an appropriate mechanism for notifying of deletes. If you have a “soft delete” system (where deletion is really just the setting of a status or a hide bit), it is simply another type of edit, but with hard deletes the best solution I have come up with to date is to support both a delete list where the user provides a comma delimited list of unique identifiers for the records to delete and an ImportAction column in each import file with 0 for save (insert or update), 1 for insert, 2 for update and 3 for delete. Anyone got a better approach to that?

Handling Extended Attributes
The question then is what to do with extended product attributes that are only managed in the web system. One approach is to break them out into a separate joined table and allow your ORM of choice to handle the join, returning the appropriate object (or you could use a db view). Another is to include the extended attributes in the same table. For an incremental upload there is no particular need for an additional table, but with a true “delete and re-create” process, it is a practical necessity.

Just thinking about this for a while it seems to me more elegant to keep the extended data in the main table (assuming that the only reason it is being handled differently is due to limitations in the accounting package and not because it fundamentally is a separate composed object). Of course, implementing this would mean that we’d have to replace a naive drop and re-import with a pseudo-incremental update by importing the new data into a separate table, comparing it to the current data, creating a list of the necessary additions, edits and deletes and then executing them. While this is quite a bit of work and processing, on the whole data is more expensive than processor cycles so I’m going to propose an approach to implementing pseudo-incremental updates which has the benefit of not disturbing extended data and of allowing for auditing and rollback of “complete” uploads by turning them into a set of discrete loggable, auditable operations.

[update] Just come up with quite a nice pattern for implementing this. Will blog about it shortly. Does anyone else have any ideas about this?

Comments
Peter, we do quite a bit of this type of updates where I work and before I give what I would do, are the fields in the Object imported editable on the web-based system?
# Posted By Phil Duba | 12/15/06 4:06 PM
Most of the time, the imported fields are read only, although in some cases the imported fields do need to support being edited via the website with us exporting those changes to the accounting system via a nightly scheduled file export process.
# Posted By Peter Bell | 12/15/06 5:50 PM
Well, in that case our applications that deal with legacy data or data from another system that's a system of record for that piece of data usually have two types of feeds: one insert/update and one delete/disable. The insert/update feed will always have the legacy applications primary id in it and we store that in some field in our destination table(s). The import tool will look for that field first, and if found perform an update, otherwise, it does an insert. As for the delete/disable, usually our applications will just receive that id and our import tool will set the deleted flag for that set of data.
# Posted By Phil | 12/15/06 11:04 PM
Hi Phil,

Yeah. that's where I'd like to go, but my clients don't have the resources in house to generate such feeds! All they know is what is in the accounting package now - not when changes were made. We both know it isn't that hard to write a logging tool that logs changes, but it is more than they are up for and I don't know the API for their package (or want to learn it), so I just have to have a system capable of taking a full upload and discerning any changes automatically. It isn't actually that bad as long as the number of records isn't crazy and most of my clients have thousands rather than millions of records, so performance isn't too much of an issue.

I'm going to support the two feeds, but also add a layer capable of generating them from a complete update file that I'll just drop in an Import table and query against.
# Posted By Peter Bell | 12/16/06 12:24 PM
One other small distinctoin. To cover all use cases there is an argument for supporting save (insert OR update), insert AND update as explicit insert and update feeds can catch errors and allow for special cases that a "save" feed (which lacks the "is your intent to insert or update" info) may not catch. Not usually an issue, but there are special cases. Blogged about it at:

http://www.pbell.com/index.cfm/2006/11/15/Insert-v...

Thanks again for the comments!
# Posted By Peter Bell | 12/16/06 12:27 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.