Most Common 3-word Phrase In All Books Ever Written Up To 2010

As I explained on the c++ page, Google offers a service that shows word and phrase usage over all books up to 2010. They have the raw data available to download (each phrase and how many times it was used in each year). I re-wrote a c++ program I had previously written to parse this data except this time I wrote it in java. The program counts up each phrase usage and keeps track of it all and sorts it at the end. Because there is so much data however, I had to use memory management techniques to remove low-occurring phrases. Using java allowed me to auto download, unzip, and parse all the data. I made the program multi-threaded so it would download the next file as it processed the current one. It managed to finish this difficult task in just under 5 hours.

Continue reading Most Common 3-word Phrase In All Books Ever Written Up To 2010