Is Search Really Easier in Ruby than in CF?
Sami Hoda amongst others was using Lucene and while it wasn't difficult, it didn't seem to be trivial to do (check out the comments which include Samis code and a link to a CFDJ article).
To add insult to injury, I just read this post talking about how easy it is to integrate Lucene with Ruby. There are many words I associate with Ruby. Elegant, fun, terse and "unproven but promising" are all terms that come to mind (and I'll admit the last one is getting better, but if I needed 24x7 I'd still sleep better with solid Java middle-ware instead).
But easier? Than ColdFusion? The language that brough us cfquery, cffile and cfsearch?!
To be fair the article doesn't even MENTION spidering, so I think the simplicity is being overplayed, but it did hit a raw nerve. Why is spidered full text search so (relatively) difficult and (relatively) undocumented in CF? Are there really so few people who have that use case? I just never though of spidered full text search as an esoteric requirement or an edge case . . .
I'll be putting together some kind of simple tool for automating the vspider stuff, but if we're talking wishlist, adding vspider reliably to CF Admin is a lot higher for me than interfaces and nulls!


I know you didn't ask, but...
The CFSEARCH implementation in BlueDragon is based on Lucene, and includes the ability to spider web sites. See Section 4.2.10 of the BD CFML Enhancements Guide:
http://www.newatlanta.com/products/bluedragon/self...
BD 7.0 adds support for Word and PDF docs, and multiple languages (BD 6.2 only support text documents in English).
And, yes, all of this is included in all BD editions, including BD.NET and the free BD Server edition.
Cheers,
Vince
You are right that I didn't ask, but I'm glad that you pointed it out. BD is an important part of the community (as are you) and it's great to get a BD take on all such issues.
Agreed 100%. The question is "why"?! The use cases for document based search and spider based search are completely different. Almost every site I build wants to have sidered search and then some also want content specific document (well, usually database) searches.
One of these is handled beautifully and the other - well, the other isn't yet handled beautifully!!!
I'd always run into problems where I'd see CF code in my search results, and the classic - how do I remove directories, files, etc from the results... It would work but it was never 100%.
That's when I started looking at vspider and it turned out to be an elegant solution - AFTER I struggled to get it working. The vspider is such a powerful tool and as Peter said I'm really surprised more people don't use it... Maybe in CF9 ?
Agreed - I had exactly the same problems. Especially as people often expect a Google like page based search where groups of content have a context. The file based approach just doesn't work for that when you have multiple independent content areas dependent on the "page" you are on. The same article may display in different parts of the site with different ancillary content and even different formatting depending on the context of the page.
The pages don't really exist so you can't just search the html files, and the underlying content items don't map directly to a single page so you can't replicate page context using Verity against the db.
There are good use cases for document based (not spidered) Verity search, but I actually come across those less often than clients who just want a spidered site search.
As Jim says, maybe CF9, although I will try to package up a little utility/generator tool for creating and running the appropriate commands - kind of like a vspider.cfc that handles all of the grunt work. When I do that if it seems valuable enough I'll post it on RIAForge.
Guessing I'll also need cfexecute for the bat files I generate and I'll have to tie into some kind of scheduling mechanism. I'll probably just generate a single scheduled task for a vspider.cfm that then runs all of the bat files in a given directory or something. Should only be an evenings worth of messing around as long as I don't run into any silly stuff. Will try to drop it in over the weekend or next week.