# Numbers Don’t Lie

According to the Benford’s law, the first digits in many real life data follow a particular pattern such that the number 1 will be the most common as the first digit, number 2 will be the next most common, then 3 and so on. For example, one out of three times you will see 1 as the first digit. Click here to see the likelihood for all single digit numbers. This trend can be found in real life data, such as financial statements, birth and death rates.

This means that a simple frequency distribution check on, for instance, accounting reports, can be used to detect fraud – because the botched numbers tend to violate the distribution.

That’s exactly what someone did to verify an anomaly in the ballot numbers from 2009 Iranian presidential elections. And guess what, according to this paper, the results indicate a possible overestimation of the winning candidate’s votes by several million! (Apparently, as compared to the expected distribution according to the Benford’s law, there was an access of 7′s in the vote counts for Mousavi, and access of 2′s and lack of 1′s in the vote counts for Ahmadinejad.)

P.S. Andrew Gelman, whose blog I visit often, is not convinced about the methodology, by the way.

# Debunking Some Third World Myths

With insightful data backed by some spectacular charts Hans Rosling shows how some pre-conceived notions and grand generalizations (about the Third World countries, in this case) can be sharply in contrast with the facts.

In the beginning of his talk, Rosling discusses how he accidently “discovered” that the Swedish (undergraduate) students knew statistically significantly less about the world than the Chimpanzees. We know that this example is merely a humorous segway into the rich discussion that follows, his comment is worth taking a note “the problem is not ignorance, it’s the pre-conceived ideas.”

The video (with better quality) can be viewed also on the TED web-site (link). This web-site is a great collection of such fascinating talks given by some leading thinkers of our times.

While the logarithmic scales can be misleading, some simplifications demand further inquiry, and the authenticity and accuracy of data from the Third World countries can be questionable, the essence of this talk is quite striking. John F Kennedy once said “The great enemy of truth is very often not a lie – deliberate contrived and dishonest, but the myth – persistent, persuasive and unrealistic.”

P.S. Those uber-cool graphs and motion charts in Rosling’s presentation were created by Trendalyzer software that was developed by Gapminder Foundation (now acquired by Google).

Posted in Data Mining