The COVID-19 pandemic hit us in December 2019, but we are still trying to combat this virus. A simple Google search can bring forward thousands of efficient datasets and visualizations. Many of these visualizations however highlight complete counties in a particular color, on the basis of the number of cases. I found this a little misleading, as some of the counties have a really small populations and hence can’t have as many cases big metropolitans.
After an initial analysis, I was intrigued to answer following questions:
- What is the current population of every county and how can I highlight them on my map?
- How can we analyze the COVID risk in every county using county population and total number of cases?
- Which states are at high risk of having a second wave of COVID cases?
During my initial research, I found the New York Times COVID map really fascinating, as it used average cases per 100K people. But there was one short coming of this map, it didn’t highlight the population of each county.
Although the color scheme is really effective in highlighting the various counties, the lack of the state borders makes it hard to distinguish and locate specific counties.
In order to make the idea of my visualization possible, I had to collect and combine quite a few different datasets.
I used four different datasets for my lab activity. The first dataset contained the total number of cases in each county starting from 01/22, when the first case was detected in the US. The second data was about the population of each county. The third dataset was the shape file for every US county and the fourth and final dataset was the shape file for every US state. I have linked all of these four datasets here:
Total number of COVID cases each day starting 01/22 for every county in US
Total population classified by county
These datasets were originally sourced from USA Facts and can be found here.
For this lab activity I used Carto. It is a leading location intelligence platform which allows users to harness the power of Spatial Data Analysis. It allows companies with efficient delivery routes, better behavioral marketing, strategic store placements, and much more. I really wanted to understand and use Carto to its full potential, so I restricted myself to using only Carto for everything, including data cleaning.
Once in Carto I started my analysis by adding the different data layers. At first I faced some issues with running the analysis on my datasets, because of the wrong datatypes that Carto automatically selects. After fixing this minor issue, I started by appending the total COVID cases column with my counties shape file. After running the analysis, I styled and colored the layer by the total number of cases. In order to get a better sense of relative positioning to each county, I highlighted the states borders using the states shape file.
Although this map was technically accurate, it still did not compare the population of each county in terms of the number of cases. In order to highlight the population of each county, I added another data layer and changed the geometry of counties from polygon to point, based on the size of their population. The final is as follows.
You can also interact with my interactive map by clicking the following button.
The final spatial analysis does a better job at highlighting the total number of COVID cases along with the population of each county. This brings forward the information that was previously hidden. This map also highlights the fact that the county with larger population have larger infection rates as well, along with highlighting which states at more risk as compared to the others. Some counties in the North Eastern states (like Maine, Vermont, etc), have large populations but considerably lower infection rates.
In this age of technology where data is readily available, fighting a pandemic no longer just the duty of health care professionals. While working on this assignment, I realized the importance of accurate data visualizations. Working with Carto was also fun, but some of the features like the legend placement were pretty limited.
An idea that crossed my mind was that even with just two factors: population and number of cases in that particular county, one can build a model and predict cases in the future. This could help predict the second wave. Diving deeper, one can use different socio-economic factors like race, ethnicity, gender, age, average income, number of hospitals in the area to build such a model.
Further analysis, could benefit from more varied point sizes, as the current point sizes fewer data buckets with large sizes. This model can also be expanded on global scale to highlight the countries at most risk. it could also benefit from individual daily cases instead of a summation of all the found cases.