NYC HIV Spatial Map


Maps
2021 HIV/AIDS Infographic from NYC Health + Hospitals

Introduction

Given the recent monkeypox outbreak and the ongoing COVID crisis, transmissible diseases have unfortunately been top of mind. I was curious about what data might exist around STI/STD transmission in NYC and if any correlations could be drawn between public health efforts and transmission / diagnoses. As an LGBTQ+ man living in the NYC area, the HIV / AIDS crisis is a phenomenon that I do not have the luxury of ignoring. It meaningfully impacts my community and how we all need to navigate the world around us. I decided that for this lab, I would look to perform a spatial analysis of HIV-related data for the city of New York.

Inspiration

I have been very anxious about Monkeypox as of late, given the high rate of transmission amongst people who share my social identity as a queer man. As such, I’ve been closely tracking its transmission and efforts to make vaccines available in the NYC area. There is an excellent visualization tool from ourworldindata.org that provides easily digestible Monkeypox case data at the global and country level in a variety of formats. (Data was sourced from global.health) I found the trending charts as well as the choropleth pictured below to be particularly valuable as I attempted to understand how this disease was moving globally. Seeing the effectiveness of the choropleth in clearly visualizing medical case data, I hoped to do something with my NYC spatial map.

Monkeypox 7-day rolling average, global choropleth with country breakouts

Methodology

Datasets

In order to build a spatial map of NYC, I needed to obtain both a spatial boundary map for the geographic unit that I wanted to analyze, as well as data surrounding HIV case counts. Through NYC Open Data, I was able to find:

  • A spatial file containing NYC Zip Code Boundaries. This file provided the basis of my visualization as it generated the underlying NYC map divided by zip code.
  • A .csv file containing HIV Testing Locations throughout the NYC
  • A .csv file containing Condom Availability Programs in NYC by Zip Code

While the NYC Open Data regarding STI/STD/HIV transmission was stale (latest data was from 2014), I found a .csv dataset online through AIDSVU (a non-profit that seeks to make HIV-related data widely available) that provided 5-year rolling average (2016-2020) statistics regarding HIV/AIDS diagnoses and treatment patterns in NYC by zip code. These four datasets were combined to create my dashboard.

Tooling

I used Tableau Public to create my spatial dashboard given the relatively low learning curve (compared to other GIS tools) and the ability to make additional non-spatial visualizations within the same dashboard. All four of the datasets shared a common field of ‘Zip Code’ which was used to join the datasets in Tableau prior to building my visualizations.

Joining of three HIV-related datasets to spatial Zip Code file in Tableau Public

Building the Spatial Map

Generating the zip code boundaries

To start, I used the Zip Code spatial file to generate a basic map of NYC broken out with zip code boundaries. I used both the ‘Geometry’ field to generate the base map, and added ‘Zip Code’ to detail in order to highlight over individual zip codes boundaries.

Base NYC Zip Code Map with individual boundaries

Creating the Choropleth Map

I chose to create a choropleth map as I thought it was an effective way of visualizing case counts, similar to the Monkeypox dashboard referenced in the Inspiration section. The AIDSVU dataset contained a field providing a 5-year rolling average of HIV diagnoses by zip code which I used to generate the choropleth shading. I chose a red gradient given our cultural association of red with a dire situation (such as infectious disease transmission). A darker shade of red indicated a higher prevalence of HIV cases in a particular zip code.

Choropleth map of NYC by zip code using color to indicate cumulative HIV cases

Adding Symbols to the Map

To add meaning to the visualization beyond the basic choropleth map, I used the HIV Testing Location dataset to add marks for each county based on the # of testing locations available. This was done by adding a second layer to the map using the ‘Marks’ feature in Tableau. I chose to keep the color consistent across marks so that it wouldn’t distract from the choropleth’s color information. (Specifically, I used a high opacity blue to contrast from the red-based choropleth in an unobtrusive manner) Size of mark was used to indicate how many testing locations were available within a particular zip code, with a hover-over detailing the exact count for users.

Choropleth Symbol Map of NYC HIV Diagnoses and Testing Locations by Zip Code

Enriching the Dashboard

I added some additional datapoints as detail to my choropleth symbol map, as it felt it gave a more complete picture of the HIV / AIDS spatial analysis. Count of Condom Availability Programs in NYC was added as a detail on the HIV Testing Location marks given the importance of testing and condom-usage in reducing the spread of sexually transmitted infections. On the choropleth, I added Count or HIV Risk cases, which was a projected data point from the AIDVU dataset based on the historical HIV diagnoses and census data for a particular zip code. These data points were also made available in supplemental dashboards in case users were interested in focusing on these data points without the distraction of the map boundaries. A county filter was also added so that users could narrow their analysis if desired, which was particular helpful for the bar charts.

Complete enriched NYC HIV Spatial Dashboard

Results

The dashboard was intended to be exploratory in nature – I didn’t go in with a specific hypothesis but rather wanted to understand what trends or correlations might exist. Some interesting findings:

  • Overall, there were 8,905 cases tracked in the 5-year rolling average for NYC from 2016-2020
  • Bronx county contained both zip codes with the highest case count as well as the highest number of testing locations
    • Zip Code 10456 had the highest case count with 213 cases
    • Zip Code 10451 had the highest number of HIV testing locations, with 19 locations
  • Kings county had the highest case count with 2,673 cases. Within Kings County:
    • Zip Code 11207 had the highest case count with 203 cases
    • Zip Code 11201 had the highest number of HIV testing locations, with 17 locations
  • We can observe a general trend that where there are higher rates of new diagnoses, there are more testing locations.
    • It’s important to note that this is merely correlation and not causation. We are unable to say that there are more testing centers in a location because there are higher rates of HIV diagnoses.
    • Perhaps centers were established in these hotspots, that could theoretically be true. But maybe there are higher rates of diagnoses because there is more access to testing. Infection rates could be consistent across the zip codes and be swayed based on testing availability.

Reflections

I struggled a lot with joining the datasets together in Tableau Public. The learning curve for the basic spatial join on a matching ID wasn’t too tricky, but making my measures display as expected was a challenge. Specifically for my dataset from AIDSVU around new HIV diagnoses, I found myself needing to manually adjust column attributes in Tableau as some of the headers included the phrase ‘zip code.’ This caused them to be treated as geographic values when in reality they were basic measures with numerical data. If I were to approach this project again, I would strip down my datasets in Excel to just the columns I was interested in manipulating in Tableau and rename the headers more intuitively. I had a lot of “junk” columns that made it more distracting to build and test my visualizations.

If I were to expand this project, it would be really cool to obtain (or create) a spatial file with the locations of the different testing centers. I would love to see how these are spaced out on the map and understand whether they’re evenly distributed within zip codes or concentrated in certain areas. Additionally, this sort of analysis would be richer with additional demographic data to contextualize the findings. The AIDSVU dataset did provide diagnosis stats broken out by race/ethnicity, age and gender; though I felt unable to accurately interpret this data without understanding the broader NYC demographic breakdown in those particular areas. For example, a spike in diagnosis rates among black new yorkers would be a more interesting compelling finding if that specific zip code had a relatively low percentage of black new yorkers overall. At an even higher level, when interpreting data about case counts, it would be valuable to understand the population by zip code to understand if trends are simply based on population size (i.e. more people -> more diagnoses). HIV/AIDS diagnoses is a complex issue and researchers can benefit from more robust data and analyses to aid in their efforts to better understand the disease and reduce transmission.