An underexplored aspect of Covid-19 data is the impact of testing. As it turns out, testing may be influencing our perception of the spread of Covid, often in ways that differ from reality. The first chart we should look at is the population-adjusted (per 100K) number of Covid-19 cases by state.
From the interactive map above, it should quickly become clear with just a cursory glance that New York, New Jersey, and Louisiana are some of the hardest hit states. Many states, such as California, Texas, and Ohio look relative unscathed when compared to New York. New York has an astronomically high 390 cases per 100K, while Calfornia is only at 19 cases per 100K. Thus, California is doing much better than New York, right?
Well, maybe not. This next map looks at the number of tests per 100K people in each state. Notice that New York leads the country on this, with 1,055 tests per 100K people. In other words, roughly 1.6% of the population in NY has been tested.
Contrast that with California where a mere 73 tests per 100K have been conducted, or roughly 0.07% of the state's population. Thus, New York state has conducted about 14 times as many tests per capita as California. This opens the door to the possibility that some of the states with low case counts are simply undertesting.
Before we dive deeper into that, however, let's look at one more topic: the rate of positives in the testing. This may also be a measure that provides a better understanding of spread of the disease.
This map tells us a lot of interesting things. First off, while New York is doing a lot more testing than other states, it also has an extremely high rate of positives with nearly 37% of tests coming back positive. If anything, this might suggest that while NY is doing a better job than the other states on testing, it's still under-testing This is a not comforting thought, but it is the reality.
However, beyond New York, we do see some unexpected insights. Oklahoma suddenly looks to be in much worse shape than the initial population-adjusted case map suggested with a 31% rate of positives, while New York's rate of testing is 23 times higher!
Michigan also stands out like a sore thumb with a 39% positive rate, while conducting less than 1/5 of the testing as New York. Indeed, we can start to see evidence here that Michigan may be in worse shape than NY.
Also the golden boy, California, doesn't look so golden with a 26% positive rate, with a small fraction of NY's testing, as well. We can also see hotspots in Georgia, Mississippi, and South Carolina.
So how do we compare this apples-and-oranges data? Unfortunately, there's no definitive answer. However, we can create some metrics that examine the hypothetical impact of under-testing. I built two different models that did just this. The first examines if every state tested as much as New York (our testing gold standard), and had the same positive rate through all of the testing, how many cases would there be? To be sure, this hypothetical situation is unlikely since the states that are under-testing are likely testing people with a higher probabilities of having Covid-19. Nevertheless, I include this hyptothetical scenario if you hover over states in the map below (see "Adjusted Cases 100%").
I came up with another metric, as well. This 2nd model assumes every state tests at the same rate as New York. It takes all the confirmed cases in each state. It then adds the number of tests needed to meet NY's testing standards. However, it only assumes a positive rate at 75% of the current rate on those additional tests. Is this realistic? It's tough to say and it could differ from state to state.
Nevertheless, this is the most reasonable model IMO, but every model will have its obvious flaws. We simply don't know how many people out there who haven't been tested would be positive cases. And even with New York, it's likely that the state is still significantly under-testing and thus, significantly under-counting cases. Yet, this hypothetical model at least illustrates how under-testing could be skewing the data and creates something closer to an apples-to-apples comparison. The results are in the map below labeled "Adjusted Cases 75%".
How does this change our perception? From this projection, we can now see that Michigan is now in the same ballpark as New York in terms of population-adjusted case numbers. Similar deal with New Jersey. Meanwhile, California's pop-adjusted case count jumps 11 times and while it's still lower than New York's, it now appears to at least be in the same ballpark.
The problems in Oklahoma, Georgia, Mississippi, and Colorado become more evident as well. There's also a theme that nearly every state with at least one large city has surging cases of Covid-19 and states that haven't been hit hard (e.g. New Mexico, West Virginia, North Dakota) are more rural / small city than average.
None of my projections are meant to represent the "true reality", which is unknown, so long as we're under-testing all across the US. But it does showcase how the under-testing phenomenon is skewing our perception of reality, leading us to believe New York is crisis-central (with some compelling evidence supporting that belief), while looking over some other hot-spots such as California and Georgia.
The narrative surrounding Louisiana also seems different. Instead of Louisiana being an outlier, we can see the same problems across most of the Southeastern US.
I’m happy we have some good data out there on Covid-19, but beware that significant adjustments need to be made to fully understand it.
Author: Jake Huneycutt, Aardvark Data
Last Updated: 1 April 2020, 10:00 am EDT
Sources: Covid Tracking Project, link