By Peter Bell

Approaching Imports

It is funny how you can know the benefits of a technology and yet miss obvious use cases.

I have a number of clients that need to import simple, non-hierarchical data for providing addresses, historic order information and the like. The problem I sometimes have is that the agreed format and the data provided usually don't match . . .

Me: "The customer number is always an integer like we agreed - right?"
Client: "Sure customer number is an number."
Me: "Well, the validation is failing and the import isn't working."
Client: "Must be something wrong with your import routine."
Me: "Actually, it is on line 47 of your import file, you have a customer number that isn't an integer."
Client: "Not possible, ALL of our customer numbers are numbers - that's why we call them customer NUMBERS" (clearly getting exasperated with what must be a mentally defective programmer who doesn't get this VERY BASIC concept).
Me: "On line 47, the customer number for Mr. Smith is M1276. That's not a number."
Client: "Oh yeah, well of course - he signed up in our Manhasset store in person, so of course it'd start with an M. But there are only a handful of those - we can make an exception for them, right? Almost all of the customer numbers are numbers. What's the big deal anyway? It works in MAS-90 . . ."

The dialog is made up, but it is a pretty representative sample of what happens when data formats and small business owners (and their consultants) collide. Not a pretty process. The problem for us is that the cost of communication is the main driver of our costs - we can generate code pretty quickly, but a half hour phone conference still takes us 30 minutes like anyone else.

The trick is to agree the data format (including constraints like data types and maximum lengths) as executable documentation. So, what you need is some kind of constraint language. There are plenty of those around. What would also be nice would be to use one that would allow the customers to use third party tools to verify any problems with their data so there could be no argument about whether the data was invalid - just a decision for the client to make on whether to modify the contraints or the data,

I just (VERY belatedly) realized that if we agreed a Schema as the documentation of an input (or some simple more comfortably human readable format we could automagically transform into a well formed Schema) then the client themselves could just test the validity of their data and we could cut down on the amount of communication required. Yes, I know this is pretty much equivalent to having a eureka moment that the Pope is probably a Catholic, but for some reason the neurons never fired, the pathways never connected, and I never "got" that I might be able to do this. To be fair, I think it was in part due to the well publicized complexity of XSDs. I've heard it suggested they can be a little complex for a programmer that doesn't use XML often, and it is safe to say that in their native form, Bill who runs the local concrete distributorship (who needs the export feature) is not going to be very happy working with them. But by creating a fairly simple syntax (probably with meaningful whitespace) and then either a parser, or (with a bit of luck) a set of transformations, I should be able to automagically transform a more human readable syntax into XSDs automatically so clients can work with the human readable syntax, understand the XSDs well enough to see that they're just expressing the same concepts with more characters and angle brackets, and then use third party tools to verifiably prove whether or not their data is good.

Thoughts?

Comments
Watching as Peter (Archipetees?) runs naked and wet through the agora, shouting "Eurika!" as he makes his way through the crowd. =)

Good catch - and I enjoyed reading the story.
# Posted By Sammy Larbi | 6/14/07 4:33 PM
I think I like the comment more than the story :->
# Posted By Peter Bell | 6/14/07 4:41 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.