On Amazon’s Statistical Analysis of My Book

Amazon.com has a pretty nifty feature that’s a spin-off from its controversial Search Inside! program. Search Inside! allows users to read portions of a book, or search for keywords inside the text of a book, before actually buying it. Many publishers and authors are up in arms about it because they fear, probably correctly in the case of academic books and textbooks as well as other specialized subjects like cookbooks, that people won’t buy the books at all if they can get the information they’re looking for for free by searching inside the book on amazon. The risk to revenue is probably less for most fiction and non-fiction books, though the capability for unlimited search is certainly a threat to both authors and publishers. It’s also a useful selling tool, though, since the complete text of a book is not accessible online and if someone is sure they will find what they’re looking for in your book, they are much more likely to buy it. In any case, a side-effect of Search Inside! is that amazon compiles interesting statistics based on an analysis of the words in a book.

The stats for my history of aphorisms can be found here. Here you will learn, among other things, that of all the books in all of the categories on amazon, 63% are easier to read than my book, while 37% are harder to read. This is based on something called the Fog Index, a measure of the number of years of formal education required to read and understand a passage of text. It is not, I hope, a measure of the mist that I deliberately pump into my prose. In terms of complexity, of words and sentence structure, my book is right in the middle, with more or less half of all other books listed as more complex and half as less complex. (A word is considered “complex” if it has three or more syllables.) I’m a bit wordy, though. Only 23% of all other amazon books have more words per sentence than mine, a statistic that comes as a mild shock to me, since I’ve always considered myself a man of few words, admittedly in speech rather than in writing, but then again I’ve always held the view that there are few greater pleasures in life than a nicely constructed long sentence with plenty of dense subordinate clauses that languidly undulate from the main sentence like so many tributaries of an interesting stream of consciousness. Or maybe not. You do get pretty good value for money from my book, though, mostly due to my verbosity. Buyers of the hardback get 4,070 words for every dollar and 3,666 words for every ounce.This is all kind of interesting but really not all that useful, except for children’s books, where measures like these would be helpful in matching a book to a child’s reading level. The really interesting statistic is the concordance, an alphabetized list of the 100 most frequently occurring words in a book, excluding common words such as “of” and “it”. As you would expect, the most frequently occurring word in my book is—you guessed it—”aphorisms”. It occurs 250 times. The next nine most frequently occurring words are:

  • life 160
  • own 123
  • man 111
  • book 105
  • things 104
  • time 97
  • thought 93
  • first 89
  • world 83

The concordance is by far the most interesting, and useful, set of statistics about a book. One of Spanish philosopher José Ortega y Gasset’s best aphorisms is:

Tell me to what you pay attention and I will tell you who you are.

A true variation on this is:

Tell me what words you use and I’ll tell you who you are.

I was happy to see the top ten most frequently used words in my book, because three of them—life, man and thought—can be found in the Ralph Waldo Emerson aphorism that opens my book:

Life consists of what a man is thinking of all day.

This made me happy because I believe that aphorism sums up not only my book, but my life. It was gratifying to see that I was writing like I was thinking, even if I do tend to go on and on a bit…