(2017-04-20) Somers Torching Library Of Alexandria Google Book Search

James Somers on Torching the Modern-Day Library of Alexandria: Google Book Search. You were going to get one-click access to the full text of nearly every book that’s ever been published. It was to be the realization of a long-held dream. “The universal library has been talked about for millennia,” Richard Ovenden, the head of Oxford’s Bodleian Libraries, has said.

“This is a watershed event and can serve as a catalyst for the reinvention of education, research, and intellectual life,” one eager observer wrote at the time

On March 22 of that year, however, the legal agreement that would have unlocked a century’s worth of books and peppered the country with access terminals to a universal library was rejected under Rule 23(e)(2) of the Federal Rules of Civil Procedure by the U.S. District Court for the Southern District of New York.

When the most significant humanities project of our time was dismantled in court, the scholars, archivists, and librarians who’d had a hand in its undoing breathed a sigh of relief, for they believed, at the time, that they had narrowly averted disaster.

Google’s secret effort to scan every book in the world, codenamed “Project Ocean,” began in earnest in 2002 when Larry Page and Marissa Mayer sat down in the office together with a 300-page book and a metronome

Page had always wanted to digitize books

By 2004, Google had started scanning. In just over a decade, after making deals with Michigan, Harvard, Stanford, Oxford, the New York Public Library, and dozens of other library systems, the company, outpacing Page’s prediction, had scanned about 25 million books. It cost them an estimated $400 million

The stations—which didn’t so much scan as photograph books—had been custom-built by Google from the sheet metal up. Each one could digitize books at a rate of 1,000 pages per hour

The human operator would turn pages by hand—no machine could be as quick and gentle—and fire the cameras by pressing a foot pedal, as though playing at a strange piano.

they developed algorithms to detect illustrations and diagrams in books, to extract page numbers, to turn footnotes into real citations, and, per Brin and Page’s early research, to rank books by relevance.

In August 2010, Google put out a blog post announcing that there were 129,864,880 books in the world. The company said they were going to scan them all

What happened was complicated but how it started was simple: Google did that thing where you ask for forgiveness rather than permission, and forgiveness was not forthcoming

When Google started scanning, they weren’t actually setting out to build a digital library where you could read books in their entirety; that idea would come later. Their original goal was just to let you search books. For books in copyright, all they would show you were “snippets,”

There’s actually a long tradition of technology companies disregarding Intellectual Property rights as they invent new ways to distribute content. In the early 1900s, makers of the “piano rolls” that control player pianos ignored copyrights in sheet music and were sued by music publishers.

As Tim Wu pointed out in a 2003 law review article, what usually becomes of these battles—what happened with piano rolls, with records, with radio, and with cable—isn’t that copyright holders squash the new technology. Instead, they cut a deal and start making money from it. Often this takes the form of a “compulsory license

It only took a couple of years for the authors and publishers who sued Google to realize that there was enough middle ground to make everyone happy

The basic problem with out-of-print books is that it’s unclear who owns most of them

It’s been estimated that about half the books published between 1923 and 1963 are actually in the public domain—it’s just that no one knows which half

What became known as the Google Books Search Amended Settlement Agreement came to 165 pages and more than a dozen appendices. It took two and a half years to hammer out the details.

The scenario he and many others feared was that the same thing that had happened to the academic journal market would happen to the Google Books database. The price would be fair at first, but once libraries and universities became dependent on the subscription, the price would rise and rise until it began to rival the usurious rates that journals were charging

In their view, there had to be a better way to unlock all those books

Earlier this year, a Second Circuit court ruled finally that Google’s scanning of books and display of snippets was, in fact, Fair Use.)

Certainly Google’s competitors felt put out by the deal.

Years later, another class-action settlement that involved opt-out, “forward-looking business arrangements” very similar to the kind set up by the Google settlement was approved by another district court. That case involved the prospective exploitation of publicity rights of retired NFL players

“It is an attempt,” they wrote, “to use the class-action mechanism to implement forward-looking business arrangements that go far beyond the dispute before the Court in this litigation.”

the original case had been about whether Google could show snippets of books it had scanned, and here you had a settlement agreement that went way beyond that question to create an elaborate online marketplace

This objection got the attention of the Justice Department, in particular the Antitrust division, who began investigating the settlement. In a statement filed with the court, the DOJ argued that the settlement would give Google a de facto Monopoly on out-of-print books

A person closely involved in the settlement said to me, “Each of the publishers would go into the Antitrust Division and say well but look, Amazon has 80 percent of the e-book market. Google has 0 percent or 1 percent. This is allowing someone else to compete in the digital books space against Amazon. And so you should be regarding this as pro-competitive, not anti-competitive. Which seemed also very sensible to me. But it was like they were talking to a brick wall. And that reaction was shameful.”

Amazon, for its part, worried that the settlement allowed Google to set up a bookstore that no one else could.

The plaintiffs, in other words, had gotten themselves into a pretty unusual situation. They didn’t want to lose their own lawsuit—but they didn’t want to win it either.

*When the presiding judge, Denny Chin, put out a call for responses to the proposed settlement, responses came in droves.

Those who had been at the table crafting the agreement had expected some resistance, but not the “parade of horribles,” as Sarnoff described it, that they eventually saw.*

In the end, the DOJ’s intervention likely spelled the end of the settlement agreement

Dan Clancy, the Google engineering lead on the project who helped design the settlement, thinks that it was a particular brand of objector.

“It’s not clear to me that if the libraries and the Bob Darntons and the Pam Samuelsons of the world hadn’t been so active that the Justice Department ever would have become involved

One of Pamela Samuelson’s main objections was that Google was going to be able to sell books like hers, whereas she thought they should be made available for free. (The fact that she, like any author under the terms of the settlement, could set her own books’ price to zero was not consolation enough, because “orphan works” with un-findable authors would still be sold for a price.)

Samuelson herself even wrote, “It would be a tragedy not to try to bring this vision to fruition, now that it is so evident that the vision is realizable.”

A refrain throughout the fairness hearing was that releasing the rights of out-of-print books for mass digitization was more properly “a matter for Congress.”

Of course, nearly a decade later, nothing of the sort has actually happened

After the settlement failed, Clancy told me that at Google “there was just this air let out of the balloon.” Despite eventually winning Authors Guild v. Google, and having the courts declare that displaying snippets of copyrighted books was fair use, the company all but shut down its scanning operation.

Edited: 2017-04-21 21:40:56.294114 | Tweet this! | Search Twitter for discussion

Bill Seitz