Zipf’s Law

"Zipf's law, named after the Harvard linguistic professor George Kingsley Zipf (1902-1950), is the observation that frequency of occurrence of some event (P), as a function of the rank (i) when the rank is determined by the above frequency of occurrence, is a power-law function Pi (l/ia) with the exponent a close to unity. The famous example of Zipf's law is the frequency of the income of a company."

So starts the abstract of an article called "Zipf's law in assets and income of a company" by
Kon Tadashi, et al. Unfortunately I do not fully, or even partly, understand what the abstract saying but I was alerted to Zipf's Law when I was asking a colleague about tracking web traffic to a site. I was wondering how to encourage people to look at my site. I can't remember the details of the conversation but at one point he mentioned Zipf's Law which I had never heard of.

I duly noted it down in my DayTimer – tagged 'for further investigation' and have now started down that path. It was mainly curiosity that took me in search of more info and more understanding. I was trying to find out it if would be useful for me to know about Zipf's law in the work that I do – would it illuminate anything, or help address 'issues, challenges, and opportunities' to put things in management speak?

I still don't know the answer to that yet and I'm still looking for a beginner's explanation to Zipf's law. but I get an inkling that it would be useful, and that's probably why it came up in the website discussion. Here's another snippet that I found on it.

"An example of where Zipf's law applies is in English texts, to frequency of word occurrence. The commonality of English words follows an exponential distribution, and the nature of communication is such that it is more efficient to place emphasis on using shorter words. Hence the most common words tend to be short and appear often, following Zipf's law."

This observation is developed on yet another website that also has a PDF of the top 50 words in 423 Time magazine articles finding.

"the" as the number one (appearing 15861 times), "of" as number two (appearing 7239 times), "to" as the number three (6331 times), etc. When the number of occurrences is plotted as the function of the rank (1, 2, 3, etc.), the functional form is a power-law function with exponent close to 1."

I'm not sure how useful that is to know but ok.

The person who posted this appears also to have been compiling a Zipf's Law bibliography since 1999. and so far has collected 728 items. I wondered whether there's a name for this activity in the same way that stamp collectors are called philatelists. What is the name for a Zipf's law collector? The bibilography is arranged by years and not by topic area so if I did want to find information on Zipf's law in Web Access Statistics and Internet Traffic or frequency of income of a company (the two that appear most relevant to my interests) they are not easy to find as there is no search function on that site.

However googling Zipf's law in Web Access Statistics and Internet Traffic I found a readable article published in 2007 Zipf's Law and number of hits on the World Wide Web that did start to give me some insight into the impact of the law on "the design and function of the Internet" and the "distribution of words according to their length and the hits they are able to generate" – in this paper's case on Google.

So I still don't know much at all about Zipf's law but I do know enough to know that I need to find out more about it. As I said in another post "Whatever one learns it is never enough".