INTRODUCTION
COVID-19 is not the first global pandemic – but it is the first global pandemic with readily available, real-time data. With minimal effort anyone can find data on cases, deaths and their locations from as recently as yesterday. As a result of the newness of COVID-19 and this immediate availability, many data visualizations focus on today’s cases. But as we near the year anniversary of the virus, I believe it is time to take a step back and look at the shifts in numbers over time. Perhaps by better understanding the fluctuations in cases, future surges can be predicted or even prevented.
THE DATA
The New York Times API includes a link to their COVID-19 raw data, shared via GitHub. Daily case and death totals are available at both state and county levels. The raw data is continually updated. Wanting to capture the changes in case totals over time, I opted to look at the state-level data.
I first brought the raw data (13,600 rows) into Excel. For each date, starting with January 21,2020, any state with a positive COVID-19 case and / or death was listed as an entry. January, the earliest days of the pandemic in the United States, only listed four states: Arizona, California, Illinois, and Washington. By March all states were experiencing outbreaks.
Although the data was exceptionally clean, I did have to do some analytical work to help make sense of it. I performed a VLOOKUP function to link an additional dataset of state populations, as to view case numbers in a more meaningful context. The Times data, rather than listing the new cases and deaths associated with each date, presented accumulated totals. After some trial and error with formulas and pivot tables, I was eventually able to extract the data (number of cases and deaths) pertaining to each date.
INSPIRATION
I find myself continually referencing COVID-19 data via The New York Times iPhone app. The interface is exceptionally easy to use. The map view provides a quick snapshot of current cases, while the interactive table is more detailed. The user is able to adjust the level of information: state or county, cases or deaths, totals or recent averages.
What The Times‘ COVID-19 map does not readily capture and present is the changes in cases and deaths over a period of time. (This data is presented as small multiple bar charts.) Unsure if time series data should be combined with geospatial data (and if so, how? ), I decided to explore this idea in my lab.
MAPPING SOFTWARE & PROCESS
I used Carto, a free software platform that combines geospatial data and analytics, to generate my map. I was able to download shapefiles for the 50 states directly from Carto. Once I created my United States map, I added a layer with my COVID-19 data (imported as a CSV, as saved from my Excel file).
Wanting to view over-arching changes in numbers, I realized that working with daily totals was too granular. Returning to Excel, I reworked my data to reflect monthly totals of cases and deaths. I then further limited my data to six months (May-October), as Carto’s Widget only displayed six options to click on. Initially I had been viewing cases and deaths by percentage of population, but those figures looked underwhelming small. Taking inspiration from The Times, I reworked my formulas to present the cases and deaths per 100,000 people of each state’s population.
Carto allowed me to color the states on number of cases per 100,000. The states with higher totals skewed red, as opposed to states with lower totals which skewed yellow. My colors were also inspired by The Times. I did attempt some color variations, but none were as successful. More divergent color palettes seemed to falsely imply that some states were in the clear of the pandemic’s threat.
Carto defaults the map to showing October’s data, but the Widget allows the viewer to comparatively look at numbers from different months. The pop-up feature provides the underlying cases total data behind the states’ coloring, as well as the deaths per 100,000 people.
NEXT STEPS
Ultimately, I felt limited by Carto’s design options and was dissatisfied with my map. I did, however, find it extremely informative to look at COVID-19 data by monthly totals. As technology allows for real-time updates, I most often see daily COVID-19 totals visualized. Daily totals are unable to capture the full story of the pandemic. I would like further explore both US and global data, possibly using Tableau where I can create a variety of visualizations, including (but not limited to) maps.