The Golden Rule: Understand…your…data
[Updated to incorporate 6/9/2020 data] A commenter on twitter writes:
Here’s the real data straight from the WI DHS website. Trend is down no matter how much you want it to be going higher. https://dhs.wisconsin.gov/covid-19/county.htm
Is this assertion true? Are deaths really falling drastically, if we look at deaths by date of occurrence. Unfortunately, an important caution in the notes was ignored:
This chart is called a mortality curve. It is used to track the number of deaths over time and see when peaks occur. This figure is showing data by when a person died. Date of death is more meaningful than using the date when the person’s death was reported to public health.
When presenting data by the date of death, any downward trends that are seen during the most recent two weeks are usually not true decreases in deaths and need to be interpreted with caution. This downward trend usually represents the data lag time; thus, data during the most recent two weeks are highlighted as preliminary data. It takes time for patient deaths to be reported to public health and to be included in death counts.
There are two ways to see the hazards of ignoring this warning. First, examine the revisions; second, examine the two cumulative series for the “death-by-date” and “death-by-date-of-report”.
Figure 1: [Updated] Wisconsin Covid-19 deaths-by-date, according to June 2 vintage (blue bar), according to June 3 vintage (tan bar), according to June 4 vintage (green bar), according to June 5 vintage (red bar), according to June 6 vintage (gray bar), according to June 7 vintage (purple bar), according to June 8 vintage (brown bar) and according to June 9 vintage (chartreuse bar). Source: Wisconsin Department of Health Services.
Figure 2: [Updated] Wisconsin Covid-19 cumulative fatalities by date reported (blue), and by date of death (tan). Light pink shaded area denotes data that is denoted “preliminary” by WI DHS as of 9 June 2020. Source: Deaprtment of Health Services , , and author’s calculations.
Notice that over time, deaths-by-date change (rise) as new data come in over time — hence the caveat on the DHS website (twice!). Figure 2 shows that over time cumulative deaths-by-date catch up with cumulative deaths as recorded- but generally lags, and as Figure 1 shows, lags the most in the most recent data at any given time. Hence, to repeat for those who are unable to read:
When presenting data by the date of death, any downward trends that are seen during the most recent two weeks are usually not true decreases in deaths and need to be interpreted with caution.
Now, it might be the case that in the past week, Covid-19 fatalities by date have declined — but I would be hard pressed to honestly make such a judgment.
In general, those who wish to do data analysis in real time need to resist the temptation to rush and download the data, and need to read the data description. That’s the first rule I give my students in stats-for-policy course. Easy enough to remember and implement in the old days when you had to transcribe the data by hand from hard copy to type into the computer. Hard these days when you can click and download…
(For other disasters in data analysis in real time using subject-to-revision administrative data, see here.)