Lies, Damned Lies, and Statistics

By: Sumeet Goel

The full quote — “There are three kinds of lies: Lies, Damned Lies, and Statistics” — has been attributed to Mark Twain, who himself attributed it to British Prime Minister Benjamin Disraeli, who might never have said it in the first place.

Regardless of the origin of the phrase, it is one that I hold near and dear. I’m a numbers geek. Love numbers and statistics. Always have, always will.

As a child, the last thing I would do before going to bed each night was pull out random baseball cards and just look at the numbers on the back. Always preferred Topps to Donruss or Fleer because they gave me a more complete statistical picture. Numbers relaxed me before hitting the hay.

Now, three decades later, I often end up on Baseball Reference and click on the random headshots on the home page and do the same thing.

This is all a long-winded way of getting to my point, and the point of the quote — that sometimes we can take statistics too far. So far, in fact, that you give ammunition to the numbers-haters.

I thought of that this past weekend upon reading this article from ESPN:

The ultimate MLB draft: The best pick ever at every spot from 1-30.

The MLB draft starts today, and in advance of that, ESPN decided to select the single best pick from MLB draft history, in each of the top 30 slots. That is, who was the best #1 overall pick of all time? Who was the best player taken 2nd overall? And so on, down to the 30th slot — who was the best #30 overall pick amongst the over 50 players taken with the 30th pick over the past five decades?

To do this, ESPN relied solely on statistics. One particular statistic to be more specific. Wins Above Replacement — or WAR as it is commonly known among statheads.

What is WAR? It’s an attempt to assign a value to a player, regardless of position, in one simple number. As the name implies, the number calculates the number of Wins that the player contributes to his team Above that of a Replacement player. A Replacement player would be the next best available player that’s “out there” — some would say the bench, some would say the best available player not on the bench, some would say the best player in the minor leagues — it really doesn’t matter — the important thing is that smart people came up with a metric, they calculate it for you and it gives you a barometer to measure players against each other, regardless of position, team and era.

When someone has a WAR of +4.5 in a season, it means that the player contributed 4.5 wins to his team, above and beyond what a replacement player would have done. A negative WAR means exactly that — the player had a net negative effect on the team.

WAR has become a much more accepted statistic in recent years — it leads to things like Felix Hernandez winning the American League (AL) Cy Young award in 2010 with a middling 13-12 record, but with a league leading WAR of 7.1 based on the value he provided to a bad Seattle team.

At the same time, Mike Trout has led the AL in WAR each of the last 5 years and has “just” 2 MVP awards to show for it. The most interesting battle between traditional and sabermetric (fancy stats) baseball folks was in 2013, when Miguel Cabrera won the Triple Crown, leading the league in HR, RBI and batting average, while Mike Trout was the runaway WAR winner. The traditionalists won in that case, as Miggy received 23 first place votes to Trout’s five.

Now back to that article — why did it get me, an admitted stathead, so worked up? Well, not only did ESPN use WAR, they decided to use a player’s average yearly WAR — and this is where things got ugly for us numbers guys, and where the Twain/Disraeli comment strikes a chord.

At the #1 spot they have Alex Rodriguez. I will stay away from getting on my performance enhancing drugs (PED) soapbox for now (but worth noting that 6 of the top 22 ‘winners’ in this analysis are PED users of repute). I have nothing against that pick, but I should have been ready for what was next when they listed Ken Griffey Jr. as the 5th best #1 pick of all time behind Chipper Jones, Carlos Correa and Bryce Harper.

At #2 it fell apart for me — Kris Bryant is a fine young player who has been in the majors for a little over 2 years. He has a Rookie of the Year award and an MVP. He is a 2 time all-star. And ESPN considers him the greatest #2 draft pick of all time.

The player he beat for the honor? A 14-time all-star with his own MVP award, 4 home run titles, 4 World Series rings, 2 World Series MVP awards, and 563 career home runs who was voted into the Hall of Fame on his first try.

Yes, according to ESPN, Kris Bryant is already a better #2 overall pick than Mr. October, Reggie Jackson.

What happened here is someone took a valuable statistical tool and then just went too far with it. Using it blindly and without context, you end up doing yourself a disservice. The value of numbers disappears when you don’t think about them in the bigger picture.

Just today GE announced that Jeff Immelt was stepping down as CEO of General Electric.

The lead message in today’s articles covering Immelt’s departure is that during his 16 years at the helm, GE stock dropped by 30%. That’s pretty bad, right? But what gives it more context is that the S&P almost doubled during that period. Now we’ve gone from bad to really bad.

Separate from that message and perhaps equally important is to better understand the context of the data. Much like my issue with Reggie Jackson and Kris Bryant, I find it a bit simplistic to only look at the performance of GE’s stock even though, like WAR, it tends to be a commonly accepted, one size fits all metric to gauge performance.

However, did you know…

  • Immelt took over as CEO a few days before 9/11?
  • GE Capital was roughly half of GE’s net income when he joined and became a liability during the 2008 financial crisis?
  • Immelt had to rebuild $10 billion in loss reserves in the company’s reinsurance business, an early hole that forced him to curtail capital spending and reinvestment in growth early in his tenure?
  • Immelt completed the sale of $260 billion in assets for GE Capital over the past two years?


Look, there is no getting around a 30% drop during an extended bull market for one of the true blue-chip stocks in the U.S., but the context gives you a fuller picture and saves you from jumping to knee-jerk reactions to clickbait items, as we are prone to do these days.

Whether it’s the ESPN ranking or Immelt’s retirement, whenever I see data driven conclusions, I try and think about two things before coming to any conclusions. Number 1 — relative to what? Number 2 — what’s in those numbers?

Relative to What?

Mary is paid $100,000 a year. John is paid $87,000 and has the same title at the same company. Is a) Mary overpaid b) John underpaid or c) Both are correctly paid? The answer is d) I don’t know. What are their jobs and what is the market for those jobs? Do we need to adjust for geography? Do they get flexibility to work from home? Does one person have more tenure? Are they at the top and bottom of their company salary band? Does John get other benefits that aren’t counted in that compensation?

You should never look at data in a vacuum. Always understand the context. Several years ago, a few friends decided to have a weight loss contest. The heaviest of us immediately suggested that biggest loser wins. Yours truly, the lightest one at the start, made sure we judged based on percentage of weight lost. It’s all relative.

What’s in Those Numbers?

Or put another way, what’s NOT in those numbers? Statistics are wonderful, but they are easily manipulated. And when you don’t know what goes into them and then, by definition, what is excluded, you can end up with erroneous conclusions and then bad decision-making.

Big companies will exclude one-time events. Small and rapidly growing companies will focus on revenue run rates to show how big they are. Depending on what makes it look the best, any given company may refer to its size by revenue, which is a universally understood metric for size, but also by number of employees (companies that are labor-intensive), retail revenue (wholesalers), gross sales (commission based), number of locations (significant physical presence), net income (high margins), number of customers (low price), units shipped (low margin), downloads (free or freemium) — you name it.

There is no doubt that Immelt’s shareholder value performance was horrendous compared to Jack Welch, the market during his tenure, and the expectation of a true-blue chip stock. Now it may be that 10 years from now we’ll look at Immelt as the guy that positioned the company for success with his moves, but for now, both the data itself and the context (relative to others and regardless of how you slice it) do not put his tenure in the most attractive light.

We try to balance the data conundrum here at HighPoint Associates as well. Whether internally focusing on our business, or externally on our client engagements, we always want to “Ask the Why” before jumping in.

As an example, we are always looking to maximize our client development efforts. So over the years we have done lots (and lots) of analysis to help figure out where we should be focusing our conversations — the types of companies, the industries, the size of companies, types of projects, etc. What has proven to be just as important as what our data tells us about our performance is what is not in the data, and how we interpret the data and use it to inform our strategic decisions going forward. Doing so has led to the fine-tuning of our business model (we call it HPA 3.0), one that focuses on partnering with our consultants and clients to ensure long-lasting client success in the targeted areas.

Any poli-sci major will tell you that numbers can pretty easily be manipulated to tell whatever story fits your purposes; however, if you want to get the real story behind the numbers, you have to keep it all in context — take into account all of the numbers, not just those that are convenient. Otherwise, you’ll have a disgruntled audience questioning your judgment and wondering how the heck you managed to rank Kris Bryant above Reggie Jackson. Not a pretty picture.