COVID-19 has taken almost 227,000 lives in the US and the recent outbreak at the White House further enumerates how dangerous it is. I wanted to explore this data using network analysis and further understand the central nodes as well as the allies becoming vulnerable.
Inspiration and Refrences
I was initially inspired to visualize this data from the New York Times article, Tracking the White House Coronavirus Outbreak. I think that a network visualization works well to analyze the contact tracing data.
During my research for the dataset, I found an article working on the very same information and to my surprise, using Gephi. It really helped me find the data that I needed and inform my visualization decisions. However the article primarily focused on cleaning the data and not how to use visualize it in Gephi. The visualization they produced was also not that intuitive. I went ahead and ran different statistics on the data which I will further explain in the next sections.
Although the visualization does a great job at highlighting communities and patterns, the color scheme can be a little confusing for some people. The labels are also not legible which can further confuse the user.
Methodology and Tools
I used Gephi to visualize the data as a network. Gephi is an open source network analysis and visualization software. It allows you to import nodes and edges table and complies them in a network. You have the option to run different statistics on your network after that.
The dataset used by the article was originally sourced from Kaggle. I tried to clean and compile the data using OpenRefine and Excel but it proved to be difficult than I anticipated. I then followed the code from the article to clean and compile the dataset in pandas. It was really interesting as I had to understand and then implement the code in order to know what was going on.
After I had cleaned the data I exported it as a CSV file so that I could directly pull it in Gephi and start my analysis.
Process and Rationale
The dataset consisted of 60 nodes and 594 weighted and directed edges. After importing the data in Gephi, I ran some statistics and here are my findings:
Average Degree: 9.9
Graph Density: 16.8%
Network Diameter: 4 – the average graph distance between all pairs of nodes is 4
Modularity: 28.3% with 6 communities
I started my visualization journey by trying out different layouts that could suite my data. I finally settled for ForceAtlas 2, as my data had two modes. In order to visualize the data better, I colored the nodes according to weather that person tested positive (red) or negative (green).
I also sized my nodes according to their overall degree. This revealed that 5 people were at the centre of the outbreak, with President Trump being the most central node with highest degree.
After running all the statistics and trying out different layouts in the overview tab, I moved to look at the final visualization in the Preview Tab. I manually adjusted some of the nodes and stylized the labels so to increase the overall legibility of the graph.
The clusters were formed according to the the two major events President Trump attended prior to his positive COVID test. One of event was the first presidential debate in Cleveland and the second was the nomination meeting in rose garden with Amy Coney Barrett.
When I look back at this entire process, I believe the most challenging part was cleaning and compiling the data. It was a different task altogether, because none of the tools I am familiar with proved feasible and I had to learn a completely new tool. That being said, I really enjoyed conducting visualizations in Gephi. It is a really intuitive tool, and takes some time and effort understanding all of its functions, but once you do it becomes really easy to interact with.
Going further we can collect more data about the second outbreak at the White House involving close aids of Vice President Mike Pence. Tracing and isolating contacts is of utmost importance during the COVID-19 pandemic, and I think ego networks can really help us identify and isolates cases faster.