All I want to do is create a simple spider based site wide search. I did this for a project in about 10 minutes using a third party hosted solution, but that’s $250/project/year and I’d rather have an in-house solution.
[LATEST UPDATE] Got it working - follow the comments to see the gotchas to look out for and how to use the utilities for testing. I'll try to wrap this craziness with some kind of simple generator, but it may be a while before I get around to it.
I have CFMX 7.0.2 so I don’t need to download the style files. I’m on Windows, so I don’t have to worry about the Linux problems. I don’t want to spend $3,000 for the standard edition of the ColdFusion Search Expansion Pack, but I can map all of my sites to http://localhost/site_name and with my architecture that works (subdirectory is OK and I don’t have any absolute URLs or SSL pages to index so I can do this), so no problems there. I can get both a batch file or a command line with the extended parameters in a .txt file using the –cmdfile syntax recommended by Adobe to create a collection successfully. I can add the collection to the CF Admin successfully and it shows 104 documents and 631KB so that is working.
But I’m getting “There was a problem executing the CFSearch tag with the following collections” with a 1705 error code. I read that Verity can require a restart after indexing, so I’ve tried that without luck.
When I have these kind of problems getting a simple search working, I feel there is something wrong. Especially when I see people like Doug Hughes (who is not a stupid person) give up in desperation and decide it would be easier to integrate Lucene (an OS Java search engine) than to run what should be a simple batch file and single command in the cf admin (plus cfsearch which I have no problems with at all for db or file based collections).
I’m also amazed how few people seem to have tried this. Doesn’t anyone need spider based search (which is fundamentally different from content item based search – each has different use cases)? There seem to be very few resources online for making this work. I’ve spend maybe 5-6 hours Googling and while I came across a fair number of links (including some old Daemon tutorials which refer to CF5 so may be out of date suggest future problems in terms of the collections from vspider not including the URLs correctly) I didn’t find anything useful for the 1705 errors.
Am I the only CF developer who’d like spider based search or did everyone just conclude vspider wasn’t up to the job and use a third party solution?! What is up with vspider?! I know Steve Erat used to be involved with Verity and Matt Woodwards blog says he uses it quite a bit, so maybe someone will be able to enlighten me on this. I’m sure it can’t be as difficult as it seems.
Related links:
Much as I like OO programming, maybe I should put a simple way to do spider based searches for Verity through the admin on my wishlist instead of all those interfaces and nulls and the like?!
[update] Isn't it sad that I am almost considering following this advice from 2001 to get around using vspider while still using Verity? It is actually quite an intelligent approach and I got most of the way through the process before I decided that in 2006 there should be a better solution (although I'm not convinced yet that I've found one!). Congrats to Michael Barr for a great article - especially given it is 5 years old and still seems to be a less bad approach than anything else I can think of!
[update 2] I was thinking for a very short time of writing my own spider until I realized that a single threaded language with sedate performance was not exactly the development platform for a spider (it would work for small sites, but writing a real spider is pretty non trivial). Still for anyone who wants to try, here is a "spider writing 101 article which covers some of the very basic features you need to write your own spider in .net. If you want to see why this is a bad idea, look at the configuration settings available for vspider and you'll get an idea of the kind of capabilities you'd have to write. Still, cool article anyway.
[update 3] Another sufferer. At least she got far enough that CFSearch returned (useless) results. Believe the Daemon article above speaks to her problem. [new update] Appears this was fixed in hotfix 3 in MX 7 according to this page (search for vspider).
[update 4] Someone ran into another issue moving 6 to 7 with the CF search not accepting a full path. Unfortunately I was already just putting the collection name in, so that wasn't my problem.