Monday, August 9, 2010

When it comes to distributions, we aren't all normal

I love a good example of one probability distribution being mistaken for another. A article from the New Yorker is reproduced here. It has a lot of interesting example of moments where conventional wisdom holds that a distribution is normal, when in fact it is a power distribution. I will list two from the article:

  • After the Rodney King beating, officers in the LAPD police department were widely accused of being excessive in their use of force. A commission was formed to investigate the rate of officers who had excessive force charges brought up against them. It was widely believe that the rate of such charges (among officers who had them) would be normally distributed. Not even close. Most officers (in the excessive charge subset) had one charge brought against them, some had a few, but only a very small number had anything approaching a excessive amount. In fact, these few cops had a almost inhuman number of excessive force charges brought against them. It was found that the firing of 44 police officers would dramatically reduce the number of excessive charges brought against the department.
  • Until it was measured,  most researchers assumed that the length of time spent homeless was normally distributed. This was found to not be the case at all. In Philadelphia, the most common length of homelessness is 1 or 2 days. At any time, of all the homeless persons on the street 80% will only be homeless for 2 days at most. The next 10% will be homeless on a episodic schedule. These homeless usually represent drug users who have relapsed. Finally, less than 10% of all homeless on the street will actually be homeless for long periods of time. These represent the homeless most people think of when they think homeless. The physically or mentally disabled, the permanent drunk, etc. 

The point has been made that these top 10% cost the city and state a inordinate amount of money. In fact, in Denver, the most expensive homeless residents average about 15000 in medical care alone. A efficiency apartment in Denver would run at about 4500 a year. Some people are arguing that it would be cheaper just to provide the worst of the homeless with their own efficiency. Then they do not develop things like pneumonia and end up going to the hospital.

Of course, this does not go over well with most people. The idea of providing the homeless guy on the corner with free housing and care while the rest of society is forced to work... Well, it is a hard sell, even if the numbers add up.

My own opinions about homelessness are varied, and to be honest, not that well defined. I do not feel like going into them at the moment. I did not really care about the actual focus of the article (homelessness). This often happens with the New Yorker, as it is a magazine that likes to take a page to say something that could be stated in a paragraph. I often get frustrated with the New Yorker for this reason.

The interesting thing to me was that both of these examples were power law distributions, and not normal distributions. It is amazing how often we just assume a normal distribution when we think "human", when in fact a completely different distribution models the data. Just because height follows a normal distribution, does not mean that all human characteristics will. 

