In the early months of 2020, Covid-19 changed the world in a dramatic way. When the virus reached the United States, nowhere was hit harder than New York City. Peaking at several hundred deaths per day, the city went from business-as-usual, to a state of emergency in a matter of weeks. Now, many months after the first case arrived in New York City, the data reveals some interesting patterns in how the virus spread around the city. Subsequent analysis has shown that socio-economic status had a strong correlation with the impact of the virus on different areas.
As unsettling as this is, it did not come as a surprise. Many of the risk factors for Covid-19 are linked to behaviors typical of communities or families of lower socio-economic status. Perhaps the most heartbreaking example being those who hold jobs that are not able to be done from home. These essential workers, such as grocery store clerks, public transit employees, and delivery workers, are forced to experience an extremely high volume of face-to-face contact as part of their normal work day. As we no know, the inability to stay away from others is the primary risk factor in the spread of the virus.
For this project, I sought to demonstrate this idea: that Covid-19 was able to target and spread most rapidly though communities in New York City of lower economic means. To do this, I first created a map of New York City showing the rate of the virus in each zip code. I then selected two variables that I believe to be close indicators of economic status: high school graduation rate, and evictions per capita. By looking at the data for these variables in each zip code, and mapping them over the original Covid-19 map, I was able to illustrate how they were each related to the spread of virus in each area.
The idea for this project, visually, was inspired by the New York Times coronavirus tracking maps. Pictured below, the design is a simple, but informative choropleth map showing the rate of spread in different areas by using a light yellow to dark red shading scheme.
I was drawn to this design for its simplicity, in yet it is ability to convey a large amount of information instantly to the user. In one glance, the user can quickly tell exactly what areas are “hot spots” and which have the infection rate more under control. I believed that using this type of map for New York City would be interesting, while also serve as the perfect base layer for plotting other variables on top.
I created my visualization using Carto – a free web-based mapping tool.
I used 4 datasets for this project, all of which were pulled from the NYC Open Data website. The first dataset I used was a shapefile of NYC zip codes. This dataset served as the base for the rest of my datasets, and the entire project itself. After I mapped the zip codes, I found a dataset with up-to-date Covid-19 data, a dataset with eviction data from 2017-present, and one more with data from all NYC public high schools in 2019.
Twice during the project, I had to reshape a dataset to get the information that I needed (further explanation on this in the next section). To do this, I used Microsoft Excel to transform the columns/rows into the format that I needed before importing to Carto and mapping the data.
The design of this project began with the first of my two UX studies. I had previously created a map of New York City with Covid-19 data for each zip code that I wanted to expand upon with other variables that I thought might be related. In order to identify which factors I should be looking at, I reached out to a friend that recently graduated with a Masters in Public Health to see what socio-economic or demographic factors she thought would likely be associated with the spread of the virus. The two major factors that she thought would be most closely related to infection rate were median income and median educational attainment. I looked for both of these in American Community Survey datasets, but couldn’t seem to find any that were specifically broken down by zip code. Ultimately, I ended up finding two datasets with zip code-specific data that I felt approximated the information I was trying to acquire – public high school graduation rate and evictions. With these two datasets, I was able to move forward with the project.
Once I began creating the visualization, I quickly realized that trying to display all of the data I had gathered on one single map was going to create a visual nightmare for the user. To remedy this, I decided to create three separate maps, each sharing some common components for consistency. This allowed for me to have much less information on each map, and convey it more effectively to the end user.
The first of the three is a simple choropleth map of New York City zip codes, showing the rate of Covid-19 spread in each. I had previously created a point map with the same dataset, which placed a point at the center of each zip code that was sized based on the infection rate. However, in order to achieve the choropleth design I was hoping for, I had to find a shapefile of NYC zip codes. After importing the shapefile into Carto, I was able to join it with my original Covid-19 dataset using the “Add Columns from Second Dataset” analysis tool. This allowed me to apply a gradient color scheme to an entire zip code based on the infection rate in that area. I chose a similar color scheme to the NYT coronavirus map, as the darker red colors seem to naturally convey a higher severity of a given variable. The result is pictured below.
For more detailed information, I created a pop-up window when the user hovers over a zip code with specific data about that zip code. The data that I felt was most pertinent was the borough name, the neighborhood name, the zip code, and the infection rate per 100,000 residents. Pictured below is the pop-up window for zip code 10002.
After creating the choropleth map, I created a second map using data from a survey of public high schools in New York City. I kept the choropleth map as a base layer, and added the high school dataset as a second layer. The data set contained lots of information that every New York City high school is required to report every year, including their graduation rate. For the purposes of the survey, graduation rate is defined as the percent of students that graduate within four years of the time that they start 9th grade. After initially importing this dataset into Carto, I realized that the data needed to be transformed. Rather than plotting each individual high school on the map, I wanted to get the average graduation rate for every high school in a given zip code. To achieve this, I had to create a pivot table in Excel, in which each zip code had a single row, with the average graduation rate for each listed in the adjacent column. After creating the pivot table, I imported the dataset into Carto once more so that I could show the average graduation rate for each zip code with a dot located in the center-point of the area. I decided to make the dots a cool color, so that they would stand out very clearly over the warm-colored choropleth map underneath. The end result is pictured below.
Since the original choropleth map featured a pop-up information window when the user hovered over a given zip code, I decided to keep that feature on this map for consistency. When the user hovers over a dot on this map, they will see all of the information that they saw on the previous map, but will see the average graduation rate of the area where they formerly saw the infection rate. I liked the concept of having separate maps for separate sets of information, so the user can pick the map they want to use depending on the information they need. Additionally, the color scheme of the base map is pretty self-explanatory, so it can give context to the data on top of it even without giving the specific infection rate. See below the pop-up window for zip code 10282, having one of the highest graduation rates in the entire city.
Additionally, I included a widget with this map that allows the user to filter out zip codes by their graduation rates. For example, the user can look at only the areas with the highest graduation rates to see where they are located, or the lowest. This is a particularly useful tool for analysis, and one that I will demonstrate further in the following section.
For the third and final map, I used a public dataset of all evictions in New York City in the past 3 years. Similar to graduation rate, I felt that eviction data could closely approximate socio-economic status of a given area. Reshaping this dataset and getting it into Carto proved to be the most complex part of the project, and required multiple steps. I began much the same way that I began with the previous dataset – by opening the dataset as a CSV file in Excel and creating a pivot table. Since the dataset was fully normalized (each eviction having its own row with the date, address, etc.), I had to create a new table with each zip code having one row, and the total number of evictions from that zip code in the adjacent column. I imported this dataset into Carto, and plotted the points on the map in much the same way that I did with the graduation data.
However, I quickly realized the logical inconsistency with using raw numbers in each zip code rather than a rate. Since population can vary drastically in each zip code, it is important to calculate a per capita number in order for the numbers to be useful in comparing different zip codes. To accomplish this, I used the “Add Columns from Second Dataset” analysis tool. I remembered that my original Covid-19 dataset included the population of each zip code, so I was able to add that as a column to the eviction dataset using the zip code column as the key. With this new dataset, which now included zip code, total evictions, and total population, I then exported it back into an Excel file. In Excel, I added one additional row which divided the total evictions by total population, giving me the evictions per capita in each zip code. Finally, I was able to re-import this back into Carto and map out the eviction rate in each zip code. I used the same color scheme for the dots as I had in the previous map for consistency. Pictured below is the final map.
I once again included a pop-up information window similar to the previous two maps. However, when the user hovers over a blue dot in this map, they see the eviction rate for the given zip code in place of the graduation rate or covid-rate in the previous maps. See the pop-up window below for zip code 10451.
To be consistent with the previous map, I also included a widget that allows the user to look at areas with specific eviction rates. This is an interesting tool for analysis, as it allows you to see what areas have the highest or lowest rates only, and how those tend to correlate with the covid map beneath.
View the full, interactive maps using the links below:
By using the widgets that I applied to the graduation rate map and the eviction map, there are some rough trends that start to emerge. For example, when filtering for the zip codes with the lowest graduation rate, you can see they tend to fall in the “hot zones” for Covid-19, with only a few outliers. See the map below of the zip codes with the lowest average graduation rate.
Conversely, if we shift the filter to the areas with the highest graduation rate, we see the dots tend to fall mostly into central and lower Manhattan, north Brooklyn, and northeast Queens – areas with a statistically lower covid rate. See map below.
The eviction map shows similar trends. Again, using the widget tool, you can see that zip codes with the highest evictions per capita tending to fall into highly infected areas of the Bronx, east Brooklyn, Far Rockaway, and Hell’s Kitchen. See the map below of zip codes with high eviction rates.
Shifting to the other end of the spectrum, we can see that areas with low instances of evictions tend to be in areas of lower infection rate, with only a few outliers. See the map below of zip codes with the lowest eviction rates.
The three maps that make up the project begin to shed some light on how certain socio-economic indicators are correlated with the infection rate of Covid-19. There is certainly further analysis that can and should be done here, but I believe that this can serve as the basis for pursuing further research. While this visualization shows some rough trends, it also reveals some outliers that I believe could be interesting to look deeper into. For example, the unusually high infection rate in the Hell’s Kitchen/Midtown West area seems very out of place given the infection rate of all areas around it. This is something that would be interesting to look into in a future project. Perhaps there is another factor at play in this area that is affecting the infection rate, such as median age, or possibly an abnormally high presence of nursing homes. Overall, the visualization serves to spark some conversation about how economic status can affect public health.
Recommendations for Revisions
When the visualization was complete, I had my final UX participant review the project and provide some feedback. The participant was a co-worker who is currently pursuing a degree in UX Design. Her only recommendation was that I reverse the sizing scheme on the graduation rate – ie, showing the higher graduation rates with a smaller dot and lower graduation rates with a larger dot. She suggested this because it was somewhat confusing seeing an inverse correlation with high school graduation rate and trying to compare it to the positive correlation with eviction rate. If I could flip the sizing scheme, it would make them look more consistent. While I thought this was great feedback, I unfortunately was unable to do this in Carto, and could not come up with a good way to transform the data in Excel to make this happen. I believe with more time and experience I could have made this work eventually.
A final revision I would have made to this project would have been my decision to base my analysis around zip codes. This proved to be incredibly problematic when trying to find data. I poured through American Community Survey data for a total of 12 cumulative hours trying to pull median income data broken down by zip code, but was unsuccessful. However, I was able to find plenty of income and other economic indicator data broken down by neighborhood, congressional district, city, state, etc. However, I couldn’t really use this data with my Covid dataset, as it would be confusing to plot these points over a zip code-based choropleth map, and likely wouldn’t make any sense. If I were to develop this project further, I would reconsider the area configuration that I based my analysis around, as it would likely open up several other options that I could use as variables in the study.