Which Subway Stations Had the Most Rat Sightings in March?


Visualization

In fall of 2023, the Transit app added a function where users can report rat sightings on subway stations. I have been interested in working with this data set since learning about it. Since the data collection is pretty new, there are not that many existing visualizations using it. 

Inspiration

Rats of New York, Ivan Lokhov

This is one of two maps I could find that uses the Transit rat reporting data. I like the annotations that call out specific stations, but the color scheme would be easier to read if it was categorized instead of continuous. The map also combines the percentage of “one or two” and “so many” reports into one percentage of overall chance of encountering a rat. I’m not sure that this is the most statistically sound approach. 

NYC Rat Observations Oct 2023

This map allows you to view the percentage of rat sightings for each category of sighting. I like the diverging color scheme and then general design of the map, although the pop-ups could use some cleaning up. 

Methodology

Although the Transit App makes the data publicly available, there are several issues: 1. You can only access the past 30 days at a time, 2. The subway stations do not have any geographic attributes, and 3. The rat sighting counts are broken down into the following categories: “none,” “one or two,” and “so many.” Even with all of these issues, I was still interested in working with the dataset and trying to visualize it.

In order to map the rat sighting data, I needed to assign geographic coordinates to the stations. I did this by creating a join with an MTA Subway dataset in Tableau.

Rat sighting dataset
MTA Subway Dataset

Ideally, one would use the “station ID” column to join datasets, but frustratingly the MTA Subway dataset and Transit app use different station IDs. Instead, I created a join using the Station Names, but several more issues arose, since some subway stations share the same names. I tried a few things to attempt to solve this issue: I created an additional join between subway lines, and I tried to append the station names with subway lines in OpenRefine. Neither of these fixed the problem, as the subway lines do not match between datasets. I ended up using the facet tool in OpenRefine to manually find duplicate station names and append the subway lines to distinguish them, and then manually finding the same stations in the MTA spreadsheet and  repeating the process. This was not an ideal solution, and if I were working with a larger dataset it would have been problematic. Additionally, in order to work with this dataset long-term (for example, if you wanted to create a map or visualization each month with updated data) you would have to repeat this process or fix the station ids.

Despite manually assigning unique station names to the dataset, I still ended up with some duplicate stations in Tableau. I then switched to ArcGIS where I had more success joining the two datasets, and did not encounter any duplicates. 

I then had to tackle the other issue of the categories of rat sighting counts (“one or two,” “so many”). Combining the two categories would be tricky, since “so many” (a.k.a. 3 or more rats spotted) would need to be weighted more than “one or two.” I decided to focus on the “so many” counts. After all, it would be safe to assume that most subway stations are home to one or two rats…what would be most interesting to show would be the stations rated most rat-infested. 

In ArcGIS, I styled the points using color (yellow to red) to represent the level of rattiness. I broke the distribution of counts into 4 categories, 75 being the highest amount reported at any subway station in March.  

Once I had my map exported to ArcGIS online, I decided to take the data I had produced in ArcGIS and put it back in Tableau, so I could continue working on the first version of the map without the duplication issues. This lab has served as a useful comparison between mapping in ArcGIS and Tableau. There are pros and cons to each – I think ArcGIS offers more powerful and intuitive mapping capabilities, while I like Tableau’s annotation tool and the ability to integrate charts and graphs (you can also add charts in ArcGIS online, but they are not as pretty). User testing helped clarify some of the pros and cons to each map.

User Testing

I tested drafts of both maps with one person outside of Pratt (a software engineer) and 3 people from Pratt’s GIS certificate. 

Feedback from the non-Pratt user on this first draft of the Tableau map included: 

The color schemes are confusing, initially you assume that the charts correspond to the map since they are using the same colors. The title “abundant rats” is confusing. The navigation of maps in Tableau is confusing, especially that it is difficult to find out how to pan (click and drag) the map. Initially it was confusing to the user to have charts below the map, but they stated that they would rather have the charts included rather than omitted (like in the first ArcGIS draft).

Feedback for the first draft of the ArcGIS map included:

The navigation around the map is better and more intuitive. The user disliked that there is no hover over function, that you have to click on each point to view a pop-up, and that there is a lag between click and response. The lack of charts is also a negative.

When I tested the second draft of the maps with Pratt GIS students, they all preferred using the ArcGIS map (likely because that is the software that is taught in GIS courses). One thing that ArcGIS doesn’t offer is annotations, and their feedback included to add subtitles or text boxes to the maps to add information that clarifies the rat reporting options.

Final Drafts

https://public.tableau.com/app/profile/gabriella.evergreen/viz/RatsontheSubway/Dashboard1?publish=yes

https://experience.arcgis.com/experience/e9c8a27f2ca44067aba9147eaca78a20

Reflection


One of the major issues with the dataset is the lack of geographic attributes and the disconnect between the station names in the rat sighting dataset and the MTA Subway dataset. If you wanted to replicate the process of making this map each month with updated data, you would need to either manually code them each month or come up with your own unique station ids. 

One of the limitations when working with any citizen science/self-reported dataset is that your data is more likely a reflection of the people collecting data, rather than an accurate capture of the data itself. In this case, the rat sighting data is more likely to reflect the behavior of commuters and Transit app users rather than the actual amount of rats in subway stations. In my opinion, this is the major weakness of the map. Even so, I think citizen science data collection is interesting and important. 

Future directions: In addition to visualizing rat counts by borough and type of subway structure, I would like to be able to show which subway lines have the highest percentage of rat sightings. This proved to be tricky as the data categorizes different combinations of subway lines at each station. 

I would like to produce a map using data from several months at a time, and figure out a more clear way of showing the rat counts. Another potential idea to explore would be to offer 3 different views that show the count categories “none,” “one or two,” and “so many” and allows the user to toggle them on/off. If I continue working with the ArcGIS map, I would like to try Instant Apps with different functionalities (there is one instant app that may allow you to use hover-over pop-ups and there is one that allows you to switch layer views).