Increasingly I find that if I enter a search on Google for something that I need explaining to me, the first result is a book. Of course, the book is in Google’s Book Search, but chances are the search is in a page that has been scanned and is available without having to buy the book. What I’m not clear about are the implications of this.
(The above example is from me finding myself watching a UK quiz show from 2001 on the BBC’s Entertainment Channel, which I noticed is free this month on our local cable network. As a long-term expat I find these programs compelling viewing, because they offer a window on a culture I’ve lost access to huge chunks of. So when they ask about something old, I’m good, but if it’s a reference to EastEnders since 1987, I’m stumped. Hence the search for what ‘bank’ means on The Weakest Link.)
So back to the implications. Well, Google may be gaming the system. But it looks like a legit result to me:
I don’t really understand how this works—I always thought links to a page affected its prominence in the rankings, but I’m not complaining. I found what I was looking for. But what does this mean for books? For publishing? Do authors and publishers try to SEO their books? Or will it eat into sales? Is it worth book-ising a website so that it scores higher on Google? Is it worth putting ads into books so when they appear in the scanned form on Google Book Search, readers see the ads? Just some thoughts.
Did Google check first with publishers before announcing its digital library initiative. Nature reports that publishers are irritated because they weren’t:
Late last year, Google, based in Mountain View, California, announced a decade-long project to scan millions of volumes at the universities of Harvard, Stanford, Michigan and Oxford, as well as the New York Public Library. The resulting archive would allow computer users worldwide to search the texts online. But some publishers complain that they weren’t consulted by Google, and that scanning library collections could be illegal.
Not everyone agrees: The story quotes Peter Kosewski, director of publications and communications at Harvard University Library, as saying the library believes that the way Google intends to handle copyright works is consistent with the law. Harvard is carrying out a pilot with Google on 40,000 titles before making a decision on digitizing its entire 15-million-volume collection. “We have a number of questions that will be answered by the pilot project, and that includes copyright issues,” he says. “We think it is a great programme Google has put together.”
Will all libraries eventually be digital?
Seems a pretty obvious question (answer: yes) but the process is surprisingly slow. I do research online and use databases like Questia but there’s still a hell of a lot that hasn’t been made available. And a lot of what is scanned has not been scanned well, unless the original material contained a lot of misspelled names.
Anyway, here’s a glimpse of what may be happening soon. From the excellent OnlineJournalism.com Newsletter — the daily news Weblog of the USC Annenberg Online Journalism Review — is a link to a report from CyberJournalist.net, which in turn “keyed in on an anonymous tip buried deep inside a Sunday New York Times feature” on Google and Microsoft: “Apparently Google plans to digitize every post-1923 [[correction: should be pre-1923; makes more sense. Thanks Jim]] text within the Stanford University Library, creating an enormous copyright-free resource available solely to Google users. The ambitious operation is codenamed Project
Ocean, according to The Times’ unnamed source.”
Wow. That’s about 18 libraries, ranging from the Art and Architecture Library to the Linear Accelerator Center Library (although that link doesn’t work, which doesn’t augur particularly well…)
This on top of Google Print blurb search and Amazon’s Inside the book search (both are shameless links to postings on this very site.)
Here’s another whacky trick that Google have quietly introduced, adding to the impression they are fast cementing their role as one-stop portal: Book searching. According to SearchEngineWatch (via the excellent TechDirt), Google Print is an experimental service that “indexes excerpts of popular books, blending the content from these works into regular Google search results”.
These excerpts are usually the blurb, for now. True to its apparent intention to make itself indispensable before it starts collecting cash, Google says book sellers pay nothing for links from these search results, and it is not benefiting if you make a purchase from one of these retailers. It’s likely that Google will eventually do what Amazon does already, namely offer full text searches of books, although these kind of searches will have to be crippled in some way to prevent users from downloading whole books online.
Can’t remember where I read this, but of course all this has wonderful side-effects for those of looking for something in a book we already own: So long as Amazon (or later, Google) have the book scanned, it would be quicker to do a keyword search there than to check the index, or leaf through the chapter list. Voila.