breaking the rules of social distancing: how actionable are 311 complaints in predicting covid cases?


Lab Reports, Maps, Visualization

Introduction

As the world slowly moved out of the initial wave of the COVID-19 pandemic, cities across the world developed reopening plans that would gradually lift the restrictions on public life. New York City, one of the worst-hit cities in the world, is following the strategy of phased reopening and lockdowns in micro-clusters. This means that there are different levels of restrictions on different types of activities and businesses across different locations at any given time. The City of New York helpline at 311 has been receiving complaints regarding violations of the phased reopening restrictions across the city. This study attempts to analyze these complaints in comparison with the COVID-19 positivity rate in the subsequent weeks in the locality of the complaint. The goal here is to ascertain whether the complaints can be considered a reliable method for predicting whether positivity rates would go up in the future. Furthermore, this analysis can also shed light on how effective the phased reopening restrictions are for reducing positivity rates.

Background & Review

For this analysis, I took inspiration from the many different works of visualization done on the 311 complaints dataset. In particular, I was inspired by this analysis by Erin Murphy which maps out fireworks complaints in Brooklyn in 2020. The analysis drops pointers on a map of Brooklyn to show the location of where each complaint originated and then uses a timeline to show how rapidly the complaints increased during the months of June and July 2020.

On the other hand, I have also regularly been looking at the visualizations of COVID “hotspots” in NYC provided by the administration. For some time since August, the city administration had introduced their own zones for different levels of phased reopening restrictions. These zones were independent of any existing demarcations in the city. However, recently the city has once again started to release ZIP code-wise data of COVID cases. This gives a familiar look of the city with a high level view of where disease is spreading the fastest as shown below. I saw this as a good opportunity to map out the 311 complaints regarding phased reopening violations in different ZIP codes and compare that with the spread of COVID in those locations.

ZIP code wise COVID positivity rates as provided by the official NYC.gov website.

Process

Tools & Datasets

I used the spatial analysis tool Carto.com for my analysis. The data was sourced from the following three locations:

  1. 311 Service Requests from 2010 to Present from NYC Open Data
  2. NYC ZIP codes shape file from NYC.gov
  3. ZIP code-wise COVID data for November 3rd till November 9th from NYC.gov

Assumptions

It is important to consider the delay in between the time a person gets affected and when they start to develop symptoms and are likely to get themselves tested for COVID-19. As per the CDC, on average the on set of symptoms can be at any point from 3 days till 14 days after infection. Considering this, it would be futile compare the 311 complaints data with the COVID positivity-rate data for the same date range. For this purpose, I included positivity rates date for the date range November 3rd till November 9th and the 311 complaints data starting from the date October 1st 2020.

Data Preparation

I completed the following steps to prepare the data for each dataset:

311 service requests data: The original dataset is too large to effectively handle and analyze in Carto. I used the instructions given here to filter the dataset down to around 12,000 rows. This was done by filtering on the created_date field to include only those entries that were added after Oct 1st and on the complaint_type field to include only those entries that were “NonCompliance with Phased Reopening” and “Mass Gathering Complaint”.

ZIP code shape file: the zipped file was uploaded directly to Carto.

ZIP code-wise COVID data: I added a new field for the labels by concatenating the fields of the positivity rate, the number of people tested and the date range. Lastly, the zipcode field had to be converted to String datatype so that it would match the ZIP code fiels in the shape file.

Design Decisions

For the purpose of differentiating the two different types of complaints, I chose red as the colour of the pointer for “NonCompliance with Phased Reopening” and blue for the other category. For the ZIP code positivity rates, I chose a shade of red to represent the highest positivity rates and since none of the areas had a positivity rate of zero, I chose yellow to represent the lower ones. Lastly, I have purposefully added a widget on the map for selecting the date range for the 311 complaints. This way, the viewer can select a range of their choosing to see how the density of the complaints changes over time.

Click here to access the complete interactive visualization

Results

This juxtaposition of the two different datasets produces some interesting results. When viewed at a high level, the high frequency of the complaint calls from most of the ZIP codes makes it difficult to interpret the data. One outlier that immediately stands out is the borough of Staten Island. While the frequency of complaints here is significantly lower than the other boroughs, the positivity rates are the highest. There is no clear explanation of why this may be but there is a possibility it may have more to do with politics and the after effects of the US Presidential Elections.

The overview of the visualization. The high frequency of the complaints in Brooklyn and Manhattan make the map difficult to interpret when seen at this zoom level.

The analysis produces more value if the viewer zooms into a deeper level and if we compare different ZIP codes in the same region of a particular borough. Generally, the areas in Brooklyn and Queens which had the highest number of complaints also had the highest positivity rates in the given date range. For instance, the areas south of Prospect Park in Brooklyn have had a high frequency of complaints and also have the highest positivity rates.

The overlap in the high number of complaints and the high positivity rates in south Brooklyn is quite evident.

Another interesting exception is Downtown Manhattan. While the density of the complaints is perhaps the highest here as compared to the rest of the city, these ZIP codes have significantly lower positivity rates. This might be due to the high concentration of businesses (the most likely culprits of violating reopening rules) and offices in this area. A lot of people might be getting infected in these areas when they commute here for work but their cases get associated with the ZIP code of the neighborhood where they live instead.

The high density of complaints originating in downtown Manhattan don’t correspond with the low positivity rates in these areas.

Reflection

There seems to be a correlation in the incidence of complaints on the 311 helpline and the subsequent positivity rates in that region in the following weeks. However, some missing factors in this analysis are whether or not any action is taken against these complaints and whether the checks and repercussions are equally applied in the different areas of the city. Apart from that, there can be a range other factors like household income, common occupations, public gathering events etc. that affect the incidence of complaints and the positivity rates which are not incorporated in this analysis.