(2003-10-24) Amazon Fulltext Search

Jon Udell on Amazon Search Inside of all its books (they have 120k+ included so far, but are aiming for completeness). They actually had to resort to OCR? (Oh joy they used Off Shoring.)

Here's a longer Gary Wolf piece about it (plus good section on Brewster Kahle, with reference to the Library Of Alexandria). How was it possible to create a publicly accessible database from material whose ownership is so tangled? Amazon's solution is audacious: The company simply denies it has built an electronic library at all. "This is not an EBook project!" Manber says. And in a sense he is right. The archive is intentionally crippled. A search brings back not text, but pictures - pictures of pages.

This is so huge: books are no longer Invisible Content. Now we just have to integrate it with Google (hmm, maybe finally a good reason for those API-s) and layer Purple Numbers over each page (can't do that, and can't even link to an individual page!). (Actually, it looks like you can link to an individual page, but maybe that URI has a session token in it or something that makes the link useless...)


Edited:    |       |    Search Twitter for discussion

No twinpages!