Saturday, November 7, 2015

Zipf's Law

I was reading Sam Kean's The Violinist's Thumb when he brought up an interesting empirical fact about natural language commonly known as Zipf's Law. What the law states is that the frequency of use of any word is inversely proportional to it's rank in the frequency table. That's straight from Wikipedia, but in human language it means that the most popular word in English (the) appears twice as often as the second most popular word (be).

Zipf's Law has actually been applied to fields far beyond language. The example used in The Violinist's Thumb was DNA. In our DNA pairs of ACTG appear in triplets to form genes, so a line of ACC (opposite to which would be TGG) would be a triplet. As it turns out, the most common triplet occurs twice as often as the second most common, and so on with the third most. 

Computer science researchers at UC Berkeley even found that web requests follow a Zipf-like distribution. Pretty interesting stuff, it'd be cool to see how often the most popular method, method names, or data types appear and whether they follow Zipf's law as well. Maybe someday in our infinite spare time.

No comments:

Post a Comment