What is UP with vspider?
[LATEST UPDATE] Got it working - follow the comments to see the gotchas to look out for and how to use the utilities for testing. I'll try to wrap this craziness with some kind of simple generator, but it may be a while before I get around to it.
I have CFMX 7.0.2 so I don’t need to download the style files. I’m on Windows, so I don’t have to worry about the Linux problems. I don’t want to spend $3,000 for the standard edition of the ColdFusion Search Expansion Pack, but I can map all of my sites to http://localhost/site_name and with my architecture that works (subdirectory is OK and I don’t have any absolute URLs or SSL pages to index so I can do this), so no problems there. I can get both a batch file or a command line with the extended parameters in a .txt file using the –cmdfile syntax recommended by Adobe to create a collection successfully. I can add the collection to the CF Admin successfully and it shows 104 documents and 631KB so that is working.
But I’m getting “There was a problem executing the CFSearch tag with the following collections” with a 1705 error code. I read that Verity can require a restart after indexing, so I’ve tried that without luck.
When I have these kind of problems getting a simple search working, I feel there is something wrong. Especially when I see people like Doug Hughes (who is not a stupid person) give up in desperation and decide it would be easier to integrate Lucene (an OS Java search engine) than to run what should be a simple batch file and single command in the cf admin (plus cfsearch which I have no problems with at all for db or file based collections).
I’m also amazed how few people seem to have tried this. Doesn’t anyone need spider based search (which is fundamentally different from content item based search – each has different use cases)? There seem to be very few resources online for making this work. I’ve spend maybe 5-6 hours Googling and while I came across a fair number of links (including some old Daemon tutorials which refer to CF5 so may be out of date suggest future problems in terms of the collections from vspider not including the URLs correctly) I didn’t find anything useful for the 1705 errors.
Am I the only CF developer who’d like spider based search or did everyone just conclude vspider wasn’t up to the job and use a third party solution?! What is up with vspider?! I know Steve Erat used to be involved with Verity and Matt Woodwards blog says he uses it quite a bit, so maybe someone will be able to enlighten me on this. I’m sure it can’t be as difficult as it seems.
Related links:
- Adobe article on vspider
- Great getting started article
- Additional files required for CFMX7 (not needed on 7.01. or 7.0.2)
- hints from Steve Erat
- Matts problems (and solution) with vspider
- Daemon hints and tips (may be obsolete)
- HoF on the 1705 issue
- More 1705 goodness
Much as I like OO programming, maybe I should put a simple way to do spider based searches for Verity through the admin on my wishlist instead of all those interfaces and nulls and the like?!
[update] Isn't it sad that I am almost considering following this advice from 2001 to get around using vspider while still using Verity? It is actually quite an intelligent approach and I got most of the way through the process before I decided that in 2006 there should be a better solution (although I'm not convinced yet that I've found one!). Congrats to Michael Barr for a great article - especially given it is 5 years old and still seems to be a less bad approach than anything else I can think of!
[update 2] I was thinking for a very short time of writing my own spider until I realized that a single threaded language with sedate performance was not exactly the development platform for a spider (it would work for small sites, but writing a real spider is pretty non trivial). Still for anyone who wants to try, here is a "spider writing 101 article which covers some of the very basic features you need to write your own spider in .net. If you want to see why this is a bad idea, look at the configuration settings available for vspider and you'll get an idea of the kind of capabilities you'd have to write. Still, cool article anyway.
[update 3] Another sufferer. At least she got far enough that CFSearch returned (useless) results. Believe the Daemon article above speaks to her problem. [new update] Appears this was fixed in hotfix 3 in MX 7 according to this page (search for vspider).
[update 4] Someone ran into another issue moving 6 to 7 with the CF search not accepting a full path. Unfortunately I was already just putting the collection name in, so that wasn't my problem.



I have vspider spidering www.sheriff.org and it works perfectly. I will admit that it did take me quite sometime to get working.
I think that the problem that you describing is because you created the collection BEFORE you did the spidering. The biggest gotcha with using vspider is that you must first spider the site that you want, then go into CF Administrator and create the Verity collection and point it to the all ready create vspider collection. If done correctly you should see the number of doucments and size of the collection on the administrator page.
If you want to contact me directly so I can help you over the phone, please email me at my email address I will send you my cell phone number.
Vspider can be very fustrating to get up and running. I really have to sit down one day and do a video tutorial on how I got it working since I've seen post like your's before.
For years I've been using hacks to accomplish this same thing and it never works like I want it to. CF makes everything else simple and I think Adobe could help us out here to. It would also be nice for this to work in a shared hosting environment.
Actually I followed the directions and used a command line/batch file (did both - both worked equally well) to create a completely new collection. I THEN went into cfadmin and "created" a collection with that name. It found the follection and showed both a number of documents and a size, so that worked well, but then when I tried my cfsearch it gave me the 1705 error. I then restarted the CF Search service (I'm on Windows) as some posts suggested that Verity needs to be restarted after indexing a collection using vspider. Still the same 1705 error.
I also made sure not to use ANY of the cfadmin commands, and just in case there were funky K2 caching issues I wasn't aware of, I created a new collection for every single test I ran just to be sure. Any help would be appreciated, though so I'll email you and if you want to sprak or IM some time or if there's anything I'm not thinking of I'd love any help you could provide!
Agreed 100%. If this was PHP or Ruby (no disrespect to either language intended) I'd kinda expect everything to be all command line and difficult to use. But this is from the company that brought us cffile and cfquery (and - for that matter - cfsearch!).
It took awhile but I had great success with VSpider on CFMX 6.1. I basically started the spider with as few options as possible and then added options one at a time til I got everything working... There are some logging options you can turn on I think to get more info out of it... I've switched jobs since then and haven't really messed with VSpider in CFMX 7.
I do agree though that it should be easier, and I'm suprised more people don't use it...
Jim
Good point.
I started by trying a bat file which I put into c:\cfusionmx7\verity\k2\_nti40\bin. It contained the following single line control (only pause was on second line):
C:\CFusionMX7\verity\k2\_nti40\bin\vspider -style C:\CFusionMX7\verity\Data\stylesets\ColdFusionVspider\ -collection C:\CFusionMX7\verity\collections\SPIDERTEST1 -start http://localhost/ -cgiok -abspath -reparse -indmimeinclude text/* -indmimeexclude text/css
pause
c:\cfusionmx7\verity\k2\_nti40\bin\vspider.exe -cmdfile c:\verity.txt
c:\verity.txt had the following content:
-style c:\cfusionmx7\verity\data\stylesets\coldfusionvspider\
-collection c:\cfusionmx7\verity\collections\SPIDERTEST2
-start http://localhost/generalfinishes
-indinclude "*/generalfinishes*"
-cgiok
There is a command line tool (and I'll have to dig around to find the name) but with it you can check the collection directly vs. having to use cfsearch. A bit easier to debug that way - you can at least see if anything is in the collection. Think it's 'mkvdk' but it's been awhile.
Jim
Looks like that one creates and indexes collections. Here is a good set of links to the documentation in version 7 although the docs are a little sparse in places . . .
http://livedocs.macromedia.com/coldfusion/7/htmldo...
Jim
That was mentioned as a possible issue here:
http://software.groupbrowser.com/archive/t-206192....
However, I just created in vspider so it was englishx and then cfadmin picked up the language correctly, so that doesn't seem to be the problem.
FYI, here are the latest comments on languages and locales in Verity:
http://livedocs.macromedia.com/coldfusion/7/htmldo...
-language
Syntax: -language name
Specifies the Verity locale to use in indexing. This option is being replaced by the semantically consistent the -locale option, and is still supported for backwards compatibility.
-locale
Syntax: -locale name
Specifies the Verity locale to use in indexing, such as German (deutsch) or French (français). The default is English (english). This option is identical to the -language option.
Many thanks! I just gave that a shot. For anyone who wants to follow along at home:
Go to Verity bin directory (or add it to your path)
rcvdk (starts util)
a c:\cfusionmx7\verity\cellections\spidertest4 (attaches the spidertest4 collection)
s (runs basic search and shows how many documents returned - 104 in my case)
r (returns all of the documents returned in the search)
So bottom line, collection looks good, it just seems to be cfsearch is having a problem speaking to it. Hmmm.
Last time I didn't explicitly set the language in the cf admin, but it picked up the language as englishx so it all seemed to be fine. However when I deleted the collection, created a new one using vspider and then explicitly set the collection language in CF admin as english x (advanced) then it all worked fine, so Jim - you were right a couple of comments back - sorry about that!!!
Thanks everyone for all the help, and please Adobe add this to the list for whatever nthe next version is after Scorpio. There has got to be an easier way to wrap this in the CF Admin - just drop me a line if you'd like input on the use cases and screens ->
Awesome to here you got it working. I totally forget to mention the englishx thing. Excellent cal Jim. Peter make sure you throw something up on USENET on how you solved this problem so people in the future will have a reference.
The purpose of the blog post was so people would have a reference (I get picked up pretty well on Google). I know this is going to sound crazy, but I've not used usenet for years - I'm not sure I'd even know how (other than googling for a web based news reader).
Collection creates, I register it fine using CF Admin afterwards with right language and it sees the the number of documents and their size correctly.
I access command line using rcvdk and the collection displayed just perfectly. My collection looks good.
But when I try to access it using cfsearch (which I tested against another collection created in CF which worked fine) I got the following error:
An error occurred while accessing a Verity collection.
Could not find the ColdFusion registered information for [test3].
where test 3 was the name of the collection that does exist, that I did successfully add to the cf admin using the correct language.
When I Google this the only thing I found was a couple of people ages ago with the same problem and no substantive answers.
Any ideas wildly appreciated - any thoughts on where to go next?
Is this server exactly like the other?? Have you tried copying your code from the working box to the new one? I think you can actually copy over the collection as well - I remember reading that somewhere - you could in theory build collections on one box and move them elsewhere to use.
Jim
http://groups.google.com/group/macromedia.coldfusi...
I do remember adding something to my Verity scripts to restart the search service - I'll try to dig up my batch files when I get home... I'm not sure if it helped - but was one of the things I tried...
Jim
It wouldn't solve my problem to copy collections across because of the localhost limitation, but for fun I did successfully copy a collection across. Again, I was able to add the vspider collection to the cfadmin just fine, and again I got the message (this time for collection "gf3":
An error occurred while accessing a Verity collection.
Could not find the ColdFusion registered information for [gf3].
Any other thoughts at all?
Thanks for the link! Link suggested calling cfcollection to persuade CF the new collection was really there. That didn't work, but I called
<cfcollection action="list" name="test">
<cfoutput query="test">#Name#<br></cfoutput>
Interestingly it didn't see the new collection even though the admin sees it. I restarted search service and even restarted CF - still the same problem even though the admin sees them.
However I upgraded 6.1-7 and I'm wondering if 6.1 is handling the page requests or something. Let me see if I can get something on which version of CF is doing what . . . maybe that is the issue . . .
I had 6.1 and 7 running. When I upgraded it didn't automaticlaly upgrade an extension I'd added (obviously) to let CF process .html files. Reason CF couldn't see the collections was they were registered with 7 but 6.1 was processing those pages.
All now working just fine.
Thanks for the help yet again!
:)
Glad you got it working (again)! I've never run two instances of CF at once before - interesting that it works at all! :)
Jim
Nice to know
Good stuff here. I ran into almost this exact issue about a year or so ago and was pulling my hair out just as much as you are. I finally figured it out through trial and error. Ironically the project I need this for is probably never going to see the light of day.
I recently added a couple new pdf's to the site and ran vspider so they would be added to the index but now however vspider is suddenly choking on indexing pdf files. I've made no code or structural changes to the site. The only difference is that I believe when I did the initial development that I was running CF 7.0 and now I am running 7.02.
Here is the error I get: Warn: [vspider] (ind002006) VDK: Warn E0-1514 (Drvr): Tst
rIOFilter:flt_kv: KV failed on filtering document: error = 17.
Here is the command that I use:
vspider -style C:\CFusionMX7\verity\Data\stylesets\ColdFusionVspider -collection C:\CFusionMX7\verity\collections\splan4 -start http://127.0.0.1/sp/ -exclude PDF=true -cgiok -indinclude *
I found a couple people running into the same issue searching Google but it doesn't appear anyone has found a solution. I would be more than welcome to hear suggestions. This could forc
I have to agree with you that it would be nice if Adobe made a few enhancements to vspider to make it easier to use. I use Fusebox as my framework and the only effective solution using Verity is to use vspider so that pdf files get included in the index.
It will be interesting to see what happens with CF and Verity in the future since Verity was bought by Autonomy about a year ago. It looks like Autonomy is strongly committed to K2 but you don't know if that extends to their relationship with Adobe and ColdFusion.
Ray
- exclude PDF=true
See this on the Livedocs:
http://livedocs.macromedia.com/coldfusion/7/htmldo...
To specify a file, path, or URL that you want followed but not indexed, use the -indexclude option. For document types, use the -mimeexclude option instead; for example, specify
-mimeexclude application/pdf rather than -exclude *.pdf.
But that suggests you want to exclude PDFs ?
I would get rid of that statement and rerun the spider and see what happens.
Good advice from Jim. Let us know what comes of it!
- exclude PDF=true excludes any urls with that variable attached to it from being indexed. There is a link on each page of the site that when clicked will render the page as a pdf (using the cfdocument tag). All of those urls have pdf=true appended to the end of them. If I indexed those urls every page on the site would be indexed twice.
It definitely indexed static pdf files in the past using the command with the exclude PDF=true syntax in it.
Ray
Did you try quotes around the exclude parameter - -exclude "PDF=true" ?
I remember something about windows version being picky about this or something. Could be completely wrong but might be worth a shot (long shot I know . . .)
I tried it with quotes and completely removing the exclude parameter and still got the same error.
The only change from when I first figured this out and it was indexing pdf's correctly are the upgrades to 7.01 and then 7.02.
Strange. I'm on 7.02 and all works fine for me, but clearly you're not the only person having problems. Let us know if you figure it out. Don't suppose you have a support contract with Adobe?!
So, eventually, I'll want to search the database, but I have a separate search for that already. All I want is to be able to create an overall site search that doesn't return vti and include files. Also there are password protected directories on the site that people can't access, but I don't want them coming up in the list.
It was suggested that VSpider would allow me to exclude certain folders from the process.
Can you tell me if this is so? Also, can someone tell me if there is a simpler way to exclude folders from an index process? I can't imagine having to go through all this just to have something ignored.
I'm coming up on a deadline of Thursday, and I can see that VSpider is not happening by then.
Help! And Thanks! WCW
That's why I ended up using Vspider...
Vspider simply follows links across your pages - just like any other spider (Google/Yahoo) so if your files are setup correctly you should have no issues.
Even with your tight deadline - I'd give yourself an hour and see if you can't get Vspider working - if it works - then you are set. If not you can always fall back on using CFSearch with a custom tag.
Unfortunately the ColdFusion exchange looks like it is down.
http://www.adobe.com/cfusion/exchange/
I can dig around tonight and see if I can a copy of that tag in my code at home.
Good luck!
http://www.thecrumb.com/wiki/code/coldfusion/dirse...
Should be enough to get you started. This tag is better than nothing - there may be better ways to accomplish this - and I'd still give VSpider a shot if you have time.
The querystring parameters tell the main page to load include of files with the parameter values. Long story short is that I want to exclude pages with certain parameters so I tried:
-exclude http://mysite.com/basicpage.cfm?cat1=100&cat2=...*" target="_blank">http://mysite.com/basicpage.cfm?cat1=100&cat2=...
and so on for each subparameter I wanted to excluded within my vspider cmdfile. Needless to say that this doesn't work presumably because the exclude doesn't contain a true directory.
Any way to files individual page either through a vspider cmd file or using cfindex?