Vertical Search Engine

A Search Engine which indexes documents identified on multiple public web servers, grabbed by a WebSpider. Could be for Intranet research use, or as a public service. (Could also be a Personal Web Archive on steroids.)

At MedScape we intended to use the Verity spider product for this, but never got around to it. Verity was (in 1999 at least) very expensive for this, if you want to spider more than a dozen hosts.

If you're doing this for a public site, you probably don't want to drive users to a local cache of the ultimate content, since it would be a copyright violation - you want to point them to the original destination. Which limits you to spidering free servers, probably. Also, your search engine has to support some structured data (to associate the ultimate URL with the index entry) and the ability to use that in a result listing (so that the HREF points to that field, rather than the local cache copy URL). For an Intranet, you might be less concerned about this (though legally it's just as much of a copyright issue).


Edited: |

blog comments powered by Disqus