Numbers in Context, the Sanity Check

Screenshot 2019-04-02 17.19.00.png

This chart from the New York Times is not new, but it’s so rife with issues, I’m dedicating a post to it. This data shows that Bill Clinton lies less than any other presidential politician of our era. Huh? I don’t want to get political, but the man was impeached for perjury. When I apply my sanity check to this conclusion, this doesn’t pass, and I want to unpack.

So, I dug in to the story a bit deeper, and discovered an astonishing issue for a visualization of data published in the New York Times. Is this a comprehensive data set, or just a sample (it’s a sample)

OK, so how was the sample of statements defined? According to the article:

We don’t check absolutely everything a candidate says, but focus on what catches our eye as significant, newsworthy or potentially influential.

Not great, guys. Never sample based on what catches your eye. This obviously biases the data, and makes it unsuitable for drawing the conclusions the author presents to us.

Fact checking is good, but we should not count things that were never designed to be counted.