You’ve Always Underestimated The Median

In about 5th grade, American children are introduced to intermediate mathematical concepts and learn more than just addition, subtraction, multiplication, and division. This is when we first learn about the basic concepts of the “measures of central tendency” known as the mean, median, and mode.

These summary statistics have the ability to take a large amount of data and boil it all down to one number, trying to get as close to the “center” of the data as possible. It’s much easier to represent data using one number rather than trying to report on every data point in a large dataset. For example, perhaps your marketing team wants to showcase how the age of your customers differs from the competition, or that the deposits of customers tend to be higher for this product over another. To accomplish this, a single number is preferred for comparison’s sake. The most common type of calculation used in these cases is the average, also known as the mean.

But why is the average/mean the preferred metric? Unbeknownst to us, teaching curriculum standards created a subconscious, shared bias towards using the average even though it’s rarely the best way to represent large datasets. Without our consent, this is where the bias against using the Median began, and we, the Kasasa Analytics Research Team, are here to explain why the Median deserves another chance.

We’ll use characters to help keep the discussion from getting too abstract... or sleep-inducing.


The Mean (aka The Average)


cartoon man drawingThe Mean is the cool, popular statistic that seems to receive all of the attention. It has an “exciting” mathematical equation that captures peoples’ interest and imagination. In words, the Mean is “the sum of all of the data divided by the number of data points.” Written mathematically:

the mathematical formula for the mean is the sum of all parts divided by the number of partsLet’s look at a quick example. Consider that you have 5 employees who each have customer satisfaction ratings on a scale of 1-100. Their scores are: 98, 98, 94, 90, and 30. If you were to calculate the Mean score ([98 + 98 + 94 + 90 + 30] / 5) you would see that an average score of 82 doesn’t accurately show off how well the majority of your employees are doing. In fact, you would believe that they are all doing a pretty so-so job. You might even withhold raises and praise instead of giving your high performers bonuses and firing the one slacker.

Yet, the love affair with the Mean has spread throughout business articles and industry findings, all to the chagrin of data analysts. Countless claims start with the words “on average,” and few people ever really second-guess its authenticity. While, on the surface, the Mean seems like a natural front-runner for the “Most Likely to Succeed” title, it can actually be quite flawed, misleading, and easily swayed by outliers. It’s not so perfect after all and can even be a bit of a bully, pushing other more useful metrics out of the spotlight.

Maybe it’s time to consider our other options.


The Mode


cartoon robot drawingTo be honest, the Mode is an oddball. It’s not very mathematical in nature — it’s defined as “the most frequently seen value in a dataset” — and hardly anyone uses the Mode to summarize data. It’s used with categorical data when it doesn’t make sense to use an average and captures the most popular category in a dataset. Below, for example, reveals that chocolate flavored ice cream is the Mode, as it was selected more often than any of the other flavors.

preferred ice cream flavor data on graphUsing the customer satisfaction example from above, the Mode score would be 98 — the most frequently seen employee score — but that highly overestimates the performance of the employees as a whole. While it may be a great bragging point, it is just plain misleading. Sorry, Mode, but we’re just not that into you. At least not for this scenario.


The (forgotten) Median


cartoon drawingNow to the hero of our story, the Median! As one believer put it, “The Median suffers from poor marketing.” Textbooks call it simply “the middle point” — the midpoint for a set of data that has been arranged in order of magnitude. A true middle child if there ever was one, the Median never gets the same attention as the Mean nor does it receive the sympathy that is offered to the Mode. It’s just stuck smack dab in the middle.

But, if you focus on its real value, you’ll see that the Median is in perfect balance — exactly one half of the data is on one side of the Median and the other half is on the other side. It’s never dragged down by outlying data points in the same way as the Mean; in fact, the Median best summarizes datasets that are negatively or positively skewed.

graph illustrating mean, median, and mode with negative skew, no skew, and positive skewIn our customer satisfaction rating example, the Median score of employees would be 94. This is much more representative of the group as a whole. It shows that there are some great employees and others that might need some support (or another job). The scores of the employees make up a positively skewed dataset, and therefore would benefit from being summarized by the Median.

The more skewed the distribution, the greater the difference between the Median and the Mean, and the more valuable the Median becomes. And when the data is normally distributed, or symmetrical, the Mean begins to mimic the Median! (So does the Mode, but that guy’s weird.) The Median stays honest and true to the data, no matter how unbalanced the other two guys get.

This guy seems legit.


Bring Back The Median!


cartoon drawingAs data analysts, it would be remiss if we didn’t point out that summarizing data can be overly simplistic and risks leaving out important, nuanced insights (all three of our characters can be guilty of misrepresentation from time to time). But, we truly believe that if you need to represent a large dataset in one, simple number, the Median is the most effective way to do it. It’s easy to find — no calculations necessary; it’s easy to explain — the true midpoint of the data; and above all, it’s honest to the data it summarizes without caving to the power of outliers.

We challenge you to make the Median the norm in your workplace. It may feel cumbersome to start using the Median in your presentations, especially since it may require a lengthy explanation. But we at least wanted to explain to you why data analysts trust the Median, and how it differs from other numbers you are more familiar with. You can start your campaign to bring back the Median by sharing this article with your co-workers!

We do sincerely hope we’ve convinced you not to overlook the Median. It’s got a lot to offer and is eager to help out… if you’ll just give it a chance.


Note: Icons created by Ester Barbato from noun project

Tags: Data