By Peter Bell

How do YOU provide site wide search?

Traditionally when I wanted to implement site search I used Verity, creating all of the necessary collections and writing the appropriate display mappings for all of the content items.

The problem is that as your model gets more complex it becomes a real pain to implement what (to the client) is just a "simple" text box and search field.

One approach to this is to write a search generator which takes metadata about each object and uses it to automatically generate the display mappings and the like, but that would be a chunk of work and I'm pretty backed up right now.

The other approach I'm toying with is just either a third party service or a downloadable script that treats your site as a collection of static pages, spiders them and provides a simple Google like listing. The strength of this approach is that it better matches what people expect to see - a way to find "pages" in your site with the search terms rather than a way to access content items matching the search terms (the content items often loose context being accessed by search).

Just to clarify that last point, what does the "back to category" button do on a product page if the product is in n-categories and you accessed it using a verity search? All of those kind of contextual data provide dilemas when using a "search each table with verity and collate the answers" kind of approach to searching content items rather than just spidering the site.

Of course, there is still a place for an "employee search" or "product search" that would clearly want to use either a db query or verity, but what do you think of just using a spidering service or product for your site wide search and does anyone have any good recommendations for solutions in the hundreds of dollars or below that they've implemented successfully in the past?

Comments
Google AJAX Search API (Beta ) -- http://code.google.com/apis/ajaxsearch/

Google, again, offers a great, simple solution that seamlessly can be integrated into a site.
# Posted By Mike | 9/21/06 9:55 AM
Hi Mike,

Very interesting! Experimental right now, so for safety I can't use it for this project, but I'll definitely sign up for a key.

Please let me know if you ever have any sample code snippets you'd be willing to share on implementing this (or if you find any!)!
# Posted By Peter Bell | 9/21/06 10:08 AM
Did you try vspider - the Verity spider - I've used that on the last few sites I did and had good results - this was on CF6.1... Haven't played with the new CF7 features...
# Posted By Jim | 9/21/06 10:31 AM
Hi Jim,

Interesting - hadn't really looked at it.

Here are some examples on how to do this:

Adobe has an article:
http://www.macromedia.com/devnet/coldfusion/articl...

Seth Duffey summarizes his experiences using this in CF7:
http://www.leavethatthingalone.com/blog/index.cfm/...

Here is a tech note on additional files required and how to use them:
http://www.adobe.com/cfusion/knowledgebase/index.c...

For today, I'm still probably going to use the hosted solution from http://www.freefind.com/ as it is SO easy, but I will definitely check this out some more and try to get it working for future sites ($200/yr for one site is fine - doing that for every project is a little more than I'm willing to spend!).
# Posted By Peter Bell | 9/21/06 11:09 AM
I've used Thunderstone's Webinator on some fairly large sites, and although it can be a pain to set up, it works fairly well and requires little maitenance or tweaking on an ongoing basis.

They offer a free version:
http://www.thunderstone.com/texis/site/pages/webin...
# Posted By Nick | 9/21/06 11:12 AM
I emailed some of my rough code to your whois contact. It was too much for a comment posting. The documentation and examples Google provides seemed very good also. This solution is FREE (and still beta) so you have to weigh the neg/pos. Good luck!
# Posted By Mike | 9/21/06 11:21 AM
For our corporate website, we're using Verity. Internally, we use a combination of things, but are about to roll out a new unified search that cuts across our corporate data stores using the Google Search Appliance. I've been working with it for about a year and a half now, and it's one fantastic piece of hardware/software.
# Posted By Rob Brooks-Bilson | 9/21/06 11:41 AM
My organization has Google search appliances as well, although we are in the process of ditching them for a more complete solution. This is for a customer-facing content and ecommerce site with a large domain of content and products.
<p>
Pricing is a big issue. If I were looking at something free, I would try Nutch, part of the Apache Lucene project, or I might roll my own in Java using the Lucene engine, which gives you tremendous flexibility and scales extremely well.
# Posted By Rob Munn | 9/21/06 1:33 PM
Rob,

What in Google do you find lacking? The only real neg I have right now is that I can't create "categories", but other than that, I'm finding it pretty powerful. I especially like the new OneBox features.
# Posted By Rob Brooks-Bilson | 9/21/06 1:39 PM
Rob B,

The biggest issue I have with Google is that it only stores and displays a few fields of data. That becomes problematic when you are indexing structured content and you want to return a richly formatted display of the structured data.

Other than that, Google just lacks capabilities that enterprise search systems have- autmoatic clustering of results (using, I think, Bayesian filtering- see clusty.com), faceted navigation (good for large catalog sites- see homedepot.com), personalization, merchandizing, rank-rating (the ability to affect ranking of results). FAST actually plugs right into our analytics provider, Omniture, and allows you to tune search results based on conversion patterns. Very cool for an ecommerce site.

The cost difference between the Google solution and enterprise players like FAST and Autonomy is substantial, but so are the benefits. We might re-purpose our Google appliances for internal use, where they have a better fit, especially with the OneBox stuff.
# Posted By Rob Munn | 9/21/06 1:50 PM
Thanks Rob - I can see your point. For us, the searching we're using it for is much more basic. Lots and lots of documents, but no real structured data.
# Posted By Rob Brooks-Bilson | 9/21/06 4:37 PM
Peter,

Take a look at http://www.picosearch.com

They charge annually (around $250US) and provide a very customizable templating system, automatic re-indexing and good search statistics.

See http://www.mohawk-flooring.com/ for an example.

- Tom
# Posted By Tom Young | 9/21/06 9:26 PM
Thanks everyone for all of the great comments and ideas - much appreciated!
# Posted By Peter Bell | 9/22/06 7:54 AM
BlogCFC was created by Raymond Camden. This blog is running version 5.005.