The utility of standard deviation is not very apparent and obvious as the way average appeals to common sense. To many, standard deviation is something abstract that statisticians use and doesn’t have an intuitive meaning. If you’re one of those, hopefully this post will change your mind.
Consider this: the average yearly temperatures of Boston and Seattle are 51 F and 53 F, respectively (decimal points are discarded for simplicity). Pretty close. Does that mean both cities have similar weather? Obviously not, as we know that Boston tends to get really chilly during Winter, while Seattle has relatively mild Winters. And Seattle doesn’t get as hot as Boston in Summer as well.
What would be really useful — along with the averages — is to know the standard deviations (SD, henceforth) for both cities. Well, it turns out that the SD for Boston is 16 and 9 for Seattle. This suggests that temperature fluctuates more in Boston than it does in Seattle.
That much, about SD, is general knowledge. We all know that SD measures fluctuation (or, dispersion). But here’s a concrete fact that has a very intuitive meaning and practical use that many are unaware of:
There’s a 95 percent chance that a random variable will fall within 2 standard deviations of its average.
This means that 95% of all daily temperatures in Boston would fall within average plus/minus 2*SD. Using the numbers mentioned above for Boston, 51 – 2*(16) is 19 and 51 + 2*(16) is 83. Hence, we can say the 95% of the time, the temperature in Boston would be within [19, 83] range. The same range for Seattle is [35, 71].
Now we have a better comparison! In Boston, 5% of all days within a year (almost three weeks) the temperature will be outside of that [16, 83] range; either too cold or too hot. On the other hand, Seattle has a relatively narrower range: 95% of the days will be within 35 to 71.
Another example: if the batting average of a cricket batsman is 50 runs per match, and the standard deviation is 15 then we can easily infer that 95% of the times the batsman’s score would fall somewhere between 20 and 80.
The economist Ian Ayers writes in Super Crunchers (which I read few years ago):
[The] inability to speak to one another about dispersion hinders our ability to make decisions. If we can’t communicate the probability of worst-case scenarios, it becomes lot harder to take the right precautions. Our inability to communicate even impacts something as basic and important as how we plan for pregnancy.
Everybody knows that a baby is due roughly nine months after conception. However, few people know that the standard deviation is fifteen days. If you’re pregnant and are planning to take off time from work or want to schedule a relative’s visit, you might want to know something about the variability of when you’ll actually give birth. Knowing these standard deviations is the best place to start. (The distribution is also skewed left – so there are more pregnancies that are three weeks early than three weeks late.)
So whether you are anticipating the pregnancy due date, comparing the batting scores of cricket players, scrutinizing the approval ratings of political candidates, or evaluating the probability of success of a medical treatment, knowing standard deviation is massively informative.