By Peter Bell

Is Search Really Easier in Ruby than in CF?

I was having some real problems getting full text site search to work using Verity's vspider, and from the information I could find out there, I wasn't the only one (eventually got it working with a bunch of help from Jim and Tony!).

Sami Hoda amongst others was using Lucene and while it wasn't difficult, it didn't seem to be trivial to do (check out the comments which include Samis code and a link to a CFDJ article).

To add insult to injury, I just read this post talking about how easy it is to integrate Lucene with Ruby. There are many words I associate with Ruby. Elegant, fun, terse and "unproven but promising" are all terms that come to mind (and I'll admit the last one is getting better, but if I needed 24x7 I'd still sleep better with solid Java middle-ware instead).

But easier? Than ColdFusion? The language that brough us cfquery, cffile and cfsearch?!

To be fair the article doesn't even MENTION spidering, so I think the simplicity is being overplayed, but it did hit a raw nerve. Why is spidered full text search so (relatively) difficult and (relatively) undocumented in CF? Are there really so few people who have that use case? I just never though of spidered full text search as an esoteric requirement or an edge case . . .

I'll be putting together some kind of simple tool for automating the vspider stuff, but if we're talking wishlist, adding vspider reliably to CF Admin is a lot higher for me than interfaces and nulls!

Search Alternatives to Verity

Over the weekend when I was having problems with Verity I spent some time researching mid-priced alternatives. I finally got Verity working, so this isn’t an issue for me now, but I thought I’d post what I found just in case it might be of use to anyone else.

[More]

What is UP with vspider?

All I want to do is create a simple spider based site wide search. I did this for a project in about 10 minutes using a third party hosted solution, but that’s $250/project/year and I’d rather have an in-house solution.

[LATEST UPDATE] Got it working - follow the comments to see the gotchas to look out for and how to use the utilities for testing. I'll try to wrap this craziness with some kind of simple generator, but it may be a while before I get around to it.

I have CFMX 7.0.2 so I don’t need to download the style files. I’m on Windows, so I don’t have to worry about the Linux problems. I don’t want to spend $3,000 for the standard edition of the ColdFusion Search Expansion Pack, but I can map all of my sites to http://localhost/site_name and with my architecture that works (subdirectory is OK and I don’t have any absolute URLs or SSL pages to index so I can do this), so no problems there. I can get both a batch file or a command line with the extended parameters in a .txt file using the –cmdfile syntax recommended by Adobe to create a collection successfully. I can add the collection to the CF Admin successfully and it shows 104 documents and 631KB so that is working.

But I’m getting “There was a problem executing the CFSearch tag with the following collections” with a 1705 error code. I read that Verity can require a restart after indexing, so I’ve tried that without luck.

When I have these kind of problems getting a simple search working, I feel there is something wrong. Especially when I see people like Doug Hughes (who is not a stupid person) give up in desperation and decide it would be easier to integrate Lucene (an OS Java search engine) than to run what should be a simple batch file and single command in the cf admin (plus cfsearch which I have no problems with at all for db or file based collections).

I’m also amazed how few people seem to have tried this. Doesn’t anyone need spider based search (which is fundamentally different from content item based search – each has different use cases)? There seem to be very few resources online for making this work. I’ve spend maybe 5-6 hours Googling and while I came across a fair number of links (including some old Daemon tutorials which refer to CF5 so may be out of date suggest future problems in terms of the collections from vspider not including the URLs correctly) I didn’t find anything useful for the 1705 errors.

Am I the only CF developer who’d like spider based search or did everyone just conclude vspider wasn’t up to the job and use a third party solution?! What is up with vspider?! I know Steve Erat used to be involved with Verity and Matt Woodwards blog says he uses it quite a bit, so maybe someone will be able to enlighten me on this. I’m sure it can’t be as difficult as it seems.

Related links:

Much as I like OO programming, maybe I should put a simple way to do spider based searches for Verity through the admin on my wishlist instead of all those interfaces and nulls and the like?!

[update] Isn't it sad that I am almost considering following this advice from 2001 to get around using vspider while still using Verity? It is actually quite an intelligent approach and I got most of the way through the process before I decided that in 2006 there should be a better solution (although I'm not convinced yet that I've found one!). Congrats to Michael Barr for a great article - especially given it is 5 years old and still seems to be a less bad approach than anything else I can think of!

[update 2] I was thinking for a very short time of writing my own spider until I realized that a single threaded language with sedate performance was not exactly the development platform for a spider (it would work for small sites, but writing a real spider is pretty non trivial). Still for anyone who wants to try, here is a "spider writing 101 article which covers some of the very basic features you need to write your own spider in .net. If you want to see why this is a bad idea, look at the configuration settings available for vspider and you'll get an idea of the kind of capabilities you'd have to write. Still, cool article anyway.

[update 3] Another sufferer. At least she got far enough that CFSearch returned (useless) results. Believe the Daemon article above speaks to her problem. [new update] Appears this was fixed in hotfix 3 in MX 7 according to this page (search for vspider).

[update 4] Someone ran into another issue moving 6 to 7 with the CF search not accepting a full path. Unfortunately I was already just putting the collection name in, so that wasn't my problem.

How do YOU provide site wide search?

Traditionally when I wanted to implement site search I used Verity, creating all of the necessary collections and writing the appropriate display mappings for all of the content items.

The problem is that as your model gets more complex it becomes a real pain to implement what (to the client) is just a "simple" text box and search field.

[More]

BlogCFC was created by Raymond Camden. This blog is running version 5.005.