By Peter Bell

Ideas on Performant, DB Agnostic Export Solutions?

Let's say you have tens of thousands of records that you need to be able to export from a db.

Ideally you'd like to write a db agnostic script (so not relying on MSSQL DTS or any other DB specific technology) for handling fairly large exports (the kind that are going to time out if you using a simple cfquery/cfoutput/cffile solution).

Any thoughts on how best to approach this? I need to do something that'll work for MSSQL2000, MSSQL2005, MySQL5 and ideally (but optionally) for PostGres.

Any thoughts appreciated!

Comments
I'd pay the 13$ and purchase http://www.sqlscriptbuilder.com/

I've reviewed this tool before.

http://www.nodans.com/index.cfm/2007/5/21/A-tool-t...



DW
# Posted By Dan Wilson | 1/17/08 4:55 PM
Hi Dan,

Thanks. Looks like a great tool for db migration, so I'll definitely check it out for that!

Here I'm looking for a way to export a specific table automatically to a csv or excel file so when client clicks button on the website it exports the data to file and then provides them with a link to download the file. Obviously as the number of records grows, just running a cfquery/cfoutput/cffile is becoming an issue . . .
# Posted By Peter Bell | 1/17/08 5:18 PM
i remembered someone writing about this. after a couple of searches, i remembered it was ray camden. here is the link:

http://www.coldfusionjedi.com/index.cfm/2007/11/28...

there are comments about large db's. i know in cf7, there is a limit on how many loops cfoutput would make (somewhere around 10-15K). not sure how timeouts and such work if you were to just run a cfc job, but i could see a query that take a count and loops through several iterations of smaller queries.

another tool that allows you to run from a command line is aquafold, and it will dump to csv or excel and other cool things. they cross platform/db. anything you can connect with odbc/jdbc, linux, unix, mac-os, windows.. all around killer tool.

hope this helps
# Posted By shag | 1/17/08 7:19 PM
Hmmm...

select * from table

Shouldn't be too bad with 15k rows, unless there is a bunch of nasty joining going on, or perhaps filtering on non-indexed data. But what about having a view? Or maybe the problem is elsewhere? (are you missing a cfsetting tag?)

If it is a problem, how about spanning a new process that does it, writes the file, and then alerts the system which could then alert the user?
# Posted By Sammy Larbi | 1/17/08 10:42 PM
The problem seems to be the cfloop which seems to die at around 10-15k records irrespective of setting the requesttimeout to 0. That was the issue I was running into . . .
# Posted By Peter Bell | 1/17/08 11:12 PM
Could you use the dbinfo bits in CF8 (or a bit of manual config) to locate the PK column, and then batch your loop up into blocks of 500 or something, appending to the export file ?
# Posted By Tom Chiverton | 1/18/08 5:47 AM
I've not set the timeout to 0, but instead to something ridculous like 50000.

What about the separate process, something running in the background?
# Posted By Sammy Larbi | 1/18/08 8:29 AM
# Posted By Aaron Longnion | 1/18/08 9:59 AM
Perl.
# Posted By Zach | 1/18/08 2:36 PM
Peter, I wonder if Steve Bryant's DataMgr component would suit your needs?
http://datamgr.riaforge.org/
# Posted By Tom Mollerus | 1/21/08 4:21 PM
@Tom, That'd be the backup if I couldn't do anything nicer.

@Sam, tmeout 0 is equivalent of "don't time out". Problem appears to be a limit on looping throuygh records in CF7 of 10-15,000 records.

@Aaron, thanks for the link, although that doesn't appear to avoid the core problem of looping through lots of records in CF.

@Zac, don't think I hadn't thought of it! I don't really think a web scipting language is optimized to do this kind of work so while I'm not a big Perl fan, I was considering firing off a Python or Ruby script - it's still on my list of possibles.

@Tom, I'd think Steve's component would run into the same issues as I doubt it's optimized for exporting large data sets, but I do generally like DataMgr and while I haven't used it for a real project to date, it's always on the short list of solutions I evaluate when looking at any given project.
# Posted By Peter Bell | 1/21/08 6:43 PM
Peter - I know 0 is the equivalent of don't time out... what I was wondering is if CF behaves differently in your case for "don't time out" vs. "time out after a really long time."

Was just wondering is all.

That said, I'd probably still rather fire off an external process that could notify when complete. (or maybe just an external process that was quite a bit quicker).
# Posted By Sammy Larbi | 1/21/08 7:04 PM
@Sam, yeah, have tried both ways in the past. Seems like the 0 is fine, but the cfoutput is an issue.

You got a favorite language for doing this kind of work?
# Posted By Peter Bell | 1/21/08 7:25 PM
No real favorite - just whatever works going from simplest to most complex (or what I believe to be that order).

First thing I'd do is look to see if the SQL server (small "s") had an API that I could use to say "generateXLSFromTable()" (first looking to load via COM in CF, otherwise via Java or C#).

If that didn't exist, I'd give Active Record in Ruby a shot first, since it wouldn't be more than a few lines (or equivalent in other scripting language - I just happen to like Ruby).

If that was still slow, I'd try Ruby without ActiveRecord, or maybe JRuby, but you'd probably like to use a quick-started version of it since it is slow to load compared to run (I forget what it's called, but could probably find it with some searching).

If that was still too slow, I'd drop down and do it in Java or C# "by hand".

If that was still horrendously slow, I'd figure out a way to spawn a thread and notify the user when it was complete. It wouldn't be too difficult. At worst you could update a value in a DB or just check the date on the file, and when it matched what it should be (checking every so often via AJAX/CF), then notify the user.

That starts at the simplest and moves forward in increasing complexity, and I prefer that path. Some people would rather start at the most complex thing they know will work just to save the hassle on the rest.
# Posted By Sammy Larbi | 1/21/08 9:13 PM
Oh, in the last option I assumed a single-user system. If there is a possibility of two people running reports, you'd want to do some synchronization. Probably telling it to name the file as a GUID and storing that in an array in the session would be good enough.
# Posted By Sammy Larbi | 1/21/08 9:16 PM
Something that I've done in the past to deal with very large recordsets is to have the request deal with small blocks of the large recordset, then redirect to itself while passing the current startrow in the URL. Each iteration could append the results of the export to an external file.
# Posted By Tom Mollerus | 1/21/08 9:17 PM
@tom, i was thinking something similar when reading sammy's update.

can you take a count and divide it into 10k records and append the same file until you have all records?

is this to be in csv or something? does cfreport allow >10-15k iterations? not familiar with the report writer.
# Posted By shag | 1/21/08 11:57 PM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.