This is a reprint of a post from 2020 entitled “The Worst Statistical Analysis I Have Seen This Year”, motivated by Mr. Bruce Hall‘s urging that we look at a Judith Curry link.
(And I have seen a lot of terrible analysis) [Update 8/14/2020: the author has taken down the post, but here is an archived 8/13/2020 version of the webpage]
Now Ms. Colleen Huber, NMD*** comes to this conclusion thusly:
As of this writing, 32 weeks have elapsed in 2020. However, for each previous year, 52 weeks have already elapsed. How then can we compare deaths from all causes in 2020 to previous years?
I divided the total number of deaths for each year by the number of weeks. That is 52 weeks for all years, except for 2020, in which 32 weeks have elapsed as of this past Saturday, August 8, 2020, which is the most recently updated week in the CDC data cited. This gives us the average number of deaths per week for each of those years, and allows a meaningful comparison between 2020 and prior years.
She then generates the following table:
It seems that there is no pandemic in 2020 of COVID-19 or of anything else, at least not in the United States.
It’s great that Ms. Huber tells us there are 52 weeks in a year. She divides 2020 data by the 32 weeks that have elapsed and have been recorded by CDC (despite the fact that recent weeks are very incomplete in terms of reporting).
This would be a sensible approach — calculating a per/week fatality rate — if there were no seasonality in the data. However, deaths are seasonal in the US, as can easily discerned in the CDC data she was analyzing.
Figure 1: CDC data accessed 7 August 2020.
As we enter the latter part of the year, deaths typically rise (with flu, etc.). Hence, using 32 weeks for 2020, and all 52 weeks for previous years, will typically yield a nonsensical comparison. (There is a standard approach, used in many economics releases — year-to-date counts. I.e., Ms. Huber could’ve compared deaths in the first 32 weeks of each of the preceding 20 years against those in the first 32 weeks of this year.)
Once again, the most embarrasingly stupid data analysis I have seen this year (maybe this decade, although the competition is tough).
My investigation using CDC estimates of expected deaths, here.
*** “NMD” means “naturapathic medical doctor”
Addendum 9/26: I am ever grateful to Mr. Hall for providing this example of egregiously bad reasoning. I cite it each and every time I teach PA819 (ungated 2020 website). I am also grateful to Craig Eyermann (aka Ironman at PoliticalCalculations), Steven Kopits, and particularly CoRev for many other examples.