Mean vs. median

Readers provide a quick course in statistics. Much obliged!

Published April 23, 2009 7:45PM (EDT)

HTWW's readers have convinced me that in the case of bank lending number crunching, the Wall Street Journal's reliance on the average  decline in bank lending may well be a better guide to what's happening than the Treasury's preferred median decline. I learned a lot from the discussion. Thanks.

But perhaps the most informative comment came directly in e-mail, and I'd like to share it here for anyone who continues to be interested.

To answer your question about the right way to present this information, from what I can tell I'm not sure either is a correct way to do the measure. Here's why: Treasury's use of the median average (the point between the highest of the low and the lowest of the high) makes sense if there is/are one or two lenders out there that are giving a lot of money and the rest aren't. This measure, as you know, is how we measure income in the U.S. because it captures the break point between the highest of the low and the lowest of the high. This way someone like Bill Gates doesn't distort the average income in the U.S., which his income would, if we used a mean average.

The WSJ's use of the mean average (sum all the values and divide by the total #) works if there are no really high or really low lenders out there -- i.e. a lender or two that is doing a lot/all of the lending or conversely one or two doing no lending at all. Such extreme values (or scores) would have leverage and influence and would result in a skewed estimate. If, however, the values are naturally distributed, then a mean average would be okay.

Finally, and no one usually talks about it, it might be appropriate here to do two possible further things:

1) Use the mode average (the category with the highest count). In this case you'd break out the lenders into categories say lending a lot, lending some, lending none and tally the total in each category. The category with the most is the mode average and that would give us a number total that could then be converted into a percentage of the total. So if we had 100 lenders and 10 were lending a lot, 60 lending a little, and 30 not lending at all, then the mode average would be "Banks that lend a little" and we could say that 60 percent (now you see why I used a total of 100...) are lending a little, 10 percent are lending a lot, and 30 percent% not at all.

2) run a ChiSquare analysis which tells us if our values are normally distributed. In this case we'd take say four time points, each becoming a set of values. The first would be before the recession when, we could argue, lending would be normal. The second before the first bailout, the third say before the second bailout, and the fourth yesterday. For each time point we'd take the amount of lending going on for all the banks. We then apply the ChiSquare statistic and the result will tell us that the values are distributed normally (what you expect if everything were normal) or abnormally (what we'd expect with all the craziness).

By Andrew Leonard

Andrew Leonard is a staff writer at Salon. On Twitter, @koxinga21.

MORE FROM Andrew Leonard