Why is coronavirus data so damn difficult to communicate?

Errors, lag, and perplexing charts—trying to understand COVID-19 data has become a major headache for many Georgians. Here's why.

1800
Coronavirus data Georgia why is it so difficult
A drive-thru COVID-19 testing site in Los Angeles. Virus test results, collected in Georgia at sites like this one, have recently been mixed with antibody test results on the state’s data dashboard, leading to more confusion among Georgians looking for information about COVID-19.

Photograph by Kevin Winter/Getty Images

TJ Muehleman is from Texas, but he has family in Louisiana, lived in Georgia for years, and now resides in Seattle. In the early days of the COVID-19 pandemic, he began to notice big differences in the ways data was presented by each of the states he’s called home. And he noticed something else about Georgia’s data in particular: it was a mess.

As co-founder of Standard Co, a company that builds public health data visualization platforms, it’s Muehleman’s job to find areas for improvement in charts and graphs. But the problems with the Georgia Department of Health’s data dashboard seemed particularly dangerous, he says, because of the life-and-death decisions business owners and others were making based on the data presented.

As a side project, Muehleman created the Covid Mapping Project, a website that displays simple charts describing the same key metrics from each state. And as he’s watched each of Georgia’s widely reported data fiascos play out, he has had increasing opportunity to reflect on the cause. “I don’t think anything nefarious is happening,” he says. “I just don’t think they know what they’re doing.”

That was his reaction last week, when a bewildering chart displaying county-level data in nonchronological order elicited widespread jeering. The chart in question showed case totals arranged from highest to lowest; Muehleman speculates that “somebody, somewhere, said, I want to know what have been the worst days in the past month” and that graphic was the result. The chart was quickly revised to reflect chronological order, and a spokesperson for Governor Brian Kemp acknowledged the visual was problematic and apologized. But the state’s website was again pilloried on Wednesday after the Columbus Ledger-Enquirer reported the total testing numbers it displayed mixed tests for the virus and antibody tests, artificially lowering Georgia’s positive test rate. Combining these two tests in figures has been an issue in several states—even the CDC has followed the practice, although the agency also recently announced it would stop doing so.

Communicating clear, easy to read data to the public is perhaps one of the most pervasive problems during the COVID-19 pandemic. “You have a lot of very smart people producing really neat visualizations that other professionals might understand,” Muehleman says, “but the general public is like, What the hell am I looking at? I just want to know if it’s getting better or worse.”

But it’s not that easy to tell—at least, not in real time. As governments worldwide wrangle data in an effort to thoughtfully re-open economies, they all must reckon with an intrinsic quality of the COVID-19 outbreak: because it takes so long for a single person to go from getting infected to being counted by the state where they live, there’s no great way to know the pandemic’s immediate trajectory. But data scientists say there are good and bad ways to forecast the pandemic’s path, share the data they use—and, as that much-maligned chart demonstrated—clearly communicate how they use information to make critical decisions.

Not all of Georgia’s data problems are man-made; no matter who is making public health policy, a number of days will inevitably elapse between the time a person becomes infected with the coronavirus and the time their infection turns up on a health department’s website. To start, there’s the period between the moment a viral particle enters someone’s body and the moment they first start feeling sick, which usually takes around five days but can take as long as two weeks. Once a person starts feeling sick, it could take days before they are tested for COVID-19, then additional days for the test to actually be processed at a laboratory. The laboratory must then communicate the result to the state (more days) before the state can include that person’s infection in its counts.

When the state is finally notified about a COVID-19-infection, it adds the case to the numbers it displays on the dashboard using the date the infected person first felt sick. That means “new” cases usually show up as occurring several weeks in the past, before the state was notified. (There are exceptions: if the state receives a case report that’s missing the date of symptom onset, it instead substitutes the test date or the date they received the result.) So today’s case counts are the lowest point of a rolling wave that won’t crest until it’s well out of reach, which could give the public a false sense of security.

Georgia denotes this time lag is an unreliable basis for decisions by displaying newer data as dots rather than a line on its chart of daily case counts. But that’s probably not enough to prevent observers from becoming confused, says Muehleman: “until we shorten that time lag, onset is a problematic indicator to the general public,” even if it is a useful one for health officials, he says.

While the data will never be perfect and the data lag cannot be completely eliminated, there are better ways to communicate uncertainty, convey what we do know, and do most other things the cluttered and confusing Georgia dashboard attempts to do, Muehleman says. He points to Louisiana’s data dashboard, where a chart tracking new cases only displays data more than 12 days old. He also likes Alabama’s, which lists positive cases by the day their test result was confirmed, shortening its data lag.

J.C. Bradbury, an economist and data analyst at Kennesaw State University, sees additional problems with Georgia’s dashboard. Among his frustrations is that the most useful data isn’t all reported in the same place. “The governor keeps mentioning the number of people currently hospitalized” with COVID-19, he says, a metric that to him seems like a promising indicator of current disease prevalence. (That number recently went below 1,000 for the first time since the first surge of cases.) But that figure is published by the Georgia Emergency Management Agency (GEMA) in a PDF document under the “Situation Report” section of the agency’s coronavirus site—not on the health department’s dashboard. That metric should be front and center among the data the health department shares, says Bradbury.

Another key change both Bradbury and Muehleman would like to see on the state’s dashboard is more information on each case included in the downloadable raw data set that they and other scientists use to understand pandemic dynamics in Georgia, including the date each person who tested positive was suspected to have been exposed, the date each case sample was obtained, and the date each test was run. They also wish the state would prominently display the number of newly confirmed cases each day, much the way the Atlanta Journal-Constitution’s data dashboard does—they both calculate this figure independently using the state’s raw data because it is subject to less data lag than the existing metrics displayed.

In addition to running his own calculations, Muehleman also makes daily visits to the Covid Tracking Project, a volunteer-run website created by The Atlantic magazine.

There are other data sources that may emerge as useful indicators of real-time COVID-19 prevalence in the future—such as where fevers are reported (measured by smart thermometers) or flu-like symptoms (reported by doctors)—but none of these have yet been established as reliable indicators of whether the pandemic is getting better or worse at any given moment.

For now, Bradbury says, people making daily decisions about what they can safely do should look at their county’s daily growth in newly confirmed cases and new deaths—metrics he calculates himself and tweets out each day. (Data reported by each county’s respective health department can also be found here: Fulton; DeKalb; Cobb/Douglas; Gwinnett/Newton/Rockdale; Clayton.) Changes in these numbers give him some sense of whether transmission is getting worse or better in his general area. On Thursday, he tweeted an ominous caption alongside his usual graphics: “Fulton had so many deaths, I had to add another category.”

Advertisement