The Googlization of Collective Memory
Google, aka the "Big Data Dredge in the Sky," in addition to sucking up all your personal information in order to make you want to buy stuff, has also scanned 5.2 million books. Ten corpora consisting of about 4 percent of every word and phrase published are now available for public download. Also included, a nifty little free tool that allows those of us with too much time on our hands to do word frequency searches in books published within the last 500 years. Results are output to a graph mapping year-by-year frequency of each term in English, French, German, Spanish, Russian and Simplified Chinese.
I have completed initial testing and can report that this is going to be a real time waster. In anticipation of my readers' interests, here are some initial findings: George Carlin's Seven Dirty Words, which may not be appropriate for TV viewers, are showing steady growth, and despite a dip during the Reagan years, have returned to pre-1810 levels of frequency, with the exception of one four-letter word beginning with F, which still has a long way to go.
In yet another half-hearted attempt to keep this blog out of the gutter, I also did a search of the names of a few well-known painters―Michelangelo, Rembrandt, Ingres, Van Gogh and Picasso―to see if I could discern any patterns, which I could not. That is until I checked my spelling and capitalization, upon which I discovered that Van Gogh can't hold a paint brush to Picasso when it comes to spilled ink in books.
"'The goal is to give an 8-year-old the ability to browse cultural trends throughout history, as recorded in books,' said Erez Lieberman Aiden, a junior fellow at Harvard’s Society of Fellows. Mr. Lieberman Aiden and Jean-Baptiste Michel, a postdoctoral fellow at Harvard, assembled the data set with Google and spearheaded a research project to demonstrate how vast digital databases can transform our understanding of language, culture and the flow of ideas." This from Patricia Cohen in the New York Times earlier this month.Continued on the next page