Most common word
I made a c++ program that looks through a text file and finds the most common x word phrase. This was interesting to run on wikipedia articles and random text, but the real interesting content comes in with Google ngrams.
Google offers a service known as ngrams that gives access to each time a word or phrase is used in a certain year in a certain amount of books from all books every written (to the extend that they have). They offer the raw data for download so what I did was make a c++ program that parses and analyzes this data to find the most common word of phrase every out of all books.
I ran my program on all books looking for the most common one word phrase ever written (most common word), and as you may have guessed, it is
Most common 3-word phrase
After running the program on the n=3 with google ngrams, I found the most common 3-word phrase to be:
one of the
I have the full list available for download below.
You will notice some non-three-word phrases. This is because of Google ngrams’ algorithm’s definition of a 3-word phrase (includes “. it is” as a three word phrase).
The format of the file is:
x.) phrase : occurance # : occurance # of books