Thursday, November 30, 2006

Dead Plagiarists Society
Will Google Book Search uncover long-buried literary crimes?
By Paul Collins
Posted Tuesday, Nov. 21, 2006, at 12:22 PM ET

Amir Aczel knew just whom to blame. "It seems," the science author complained last month in an irate letter to the Washington Post, "that [Charles] Seife has submitted every sentence in my book to a Google search." Days earlier in a Post book review, Seife exposed what appeared to be embarrassing plagiarisms in Aczel's new book, The Artist and the Mathematician. But if Seife's discovery that Aczel lifted text from the Guggenheim Museum's Web site was instructive, so was the assumption behind Aczel's response. For any plagiarist living in an age of search engines, waving a loaded book in front of reviewers has become the literary equivalent of suicide by cop.

As it turns out, even authors not living in this online age are in trouble. My fellow literary sleuth Alex MacBride recently revealed to me that he'd uncovered an old crime in a new way. MacBride, a linguist employed by Google, idly ran a phrase from England Howlett's 1899 essay Sacrificial Foundations through Google Book Search, his employer's massive digitization of millions of volumes from university libraries. The search had nothing to do with his job—like the rest of us, sometimes Alex just kills time by plugging stuff into Google—and rather than go to the trouble of digging out Howlett's book by name, he'd decided to call it up with a phrase. To his surprise, he got more back than just Howlett: The search also revealed a suspiciously similar passage in Sabine Baring-Gould's 1892 book Strange Survivals. A lot of suspiciously similar passages.

Perhaps it's not too shocking that a small-time amateur like Howlett swiped from Baring-Gould, a frenetically prolific folklore scholar who published hundreds of books and articles. But, the search results revealed, this was not quite the end of the story. "Charmingly," MacBride e-mails, "Baring-Gould seems to have had sticky fingers himself." The wronged author, you see, had in turn used the unattributed quotation from a still earlier work: Benjamin Thorpe's 1851 study Northern Mythology.

We're talking about forgotten writers here: I don't think there will be too many England Howlett fan clubs grappling with disillusionment today. But MacBride's discovery is the first rumble in what may become a literary earthquake. Given the popularity of plagiarism-seeking software services for academics, it may be only a matter of time before some enterprising scholar yokes Google Book Search and plagiarism-detection software together into a massive literary dragnet, scooping out hundreds of years' worth of plagiarists—giants and forgotten hacks alike—who have all escaped detection until now.

But wait, you might ask, don't people accidentally repeat each other's sentences all the time? It seems to me that this should not be unusual. Yet try plugging that last sentence word by word into Google Book Search, and watch what happens.

It: Rejected—too many hits to count
It seems: 11,160,000 matches
It seems to: 3,050,000
It seems to me: 1,580,000
It seems to me that: 844,000
It seems to me that this: 29,700
It seems to me that this should: 237
It seems to me that this should not: 20
It seems to me that this should not be: 9
It seems to me that this should not be unusual: 0

It seems to me that this should not be unusual is itself ... unusual.

Google Book Search contains hundreds of millions of printed pages, and yet after just a few words, the likelihood of the sentence's replication scales down dramatically. And even before our sentence implodes into utter improbability, there's another telling phenomenon at work. The nine books that contain the penultimate It seems to me that this should not be are from a grab bag of subjects: a 2001 study of Freud, an 1874 collection of Methodist camp sermons, minutes from a 1973 hearing of the Senate subcommittee on transportation. So, if replicating the same sentence alone is suspicious behavior, then to also replicate it on the same subject warrants dialing 911.

