NYPD Arrests for the period of Jan – Mar 2020



For this assignment, I used statistical visualizations to understand NYPD arrest data in New York City. I chose to focus on this assignment to understand specifically the nature of NYPD’s law enforcement and how they classify their perpetrators as well as the descriptions of crimes committed. While engaging with the dataset, I chose to narrow my focus on the racial demographics and age range of perpetrators. I also wanted to discover which NYC borough had the most arrest compared to others.


From New York City’s Open Data website, I did several searches before landing on the dataset entitled NYPD Arrest Data (Year to Date). I was intrigued by the data and decided to download the CSV file. The dataset itself had over 44,000 arrest records with detailed information on the perpetrators and crimes committed from January 2020 to March 2020.

NYPD Arrest Data (Year to Date) CSV file

I also downloaded OpenRefine, which is a free open-source tool for working with messy and unclean data. I also used Tableau Public, which is a free data visualization tool that allows users to publish their visualizations to the web to create my graphs and charts.


First, I manually formatted the Arrest Date column in the CSV file to YYYY-MM-DD to prevent any data problems.

Reformatted Arrest Date column

Then I downloaded the CSV file into OpenRefine and looked at all the rows and columns and did not notice any errors or issues with the data.

CSV file in OpenRefine

Next, I uploaded my reformatted CSV file to Tableau Public and began to experiment with the various visual representations of the data. After visualizing the data, I provided a link under each image to Tableau Public so that a user can see the visualization closely.

From the beginning of the project, I was curious to determine what were the ethnicities of the perpetrators and which ethnic groups were arrested the most. I was surprised to see that the NYPD had classifications of “Black Hispanic” and “White Hispanic.” I wondered what was the reasoning behind these separate identifiers of Hispanic people in the New York City boroughs. I looked over the data’s information on the NYC Open Data website and could not find any information on why there was a need to have a granular categorization of Hispanics.

Perpetrator Race by Number of Arrest

Next, I wanted to see the ages of the perpetrators in relationship to the number of arrest records. I was determined to see if a particular age group would surprise me and I did discover that those in the age group of 45-64 were arrested more than those in the age group of 18-24. I wanted to explore this further to see what were the specific crimes that each age group was being arrested for particularly the 45-64 age group, but I found that trying to visualize this data proved to be difficult in Tableau as the “Offense” column had way too many varied crimes.

Number of Arrest by Age Group

I then wanted to see which particular borough had the most arrests. In the following graph, I discovered that Brooklyn had the most arrests and found it interesting that the borough was represented with a key of “K” whereas the Bronx was “B”. I also found it interesting how much Staten Island (S) trailed the other boroughs in arrest records.

Number of Arrest By Borough

Finally, I wanted to see the number of arrests made by the month. In the following graph, I saw the steady decline of arrests and it made me curious to know how much influence did COVID-19 pandemic had on the reduction in March. I am also curious to know what the updated data would reflect for the remainder of the year after the COVID-19 restrictions lift.

Number of Arrest by Month


The following dashboard shows all of my data visualizations and I also included a geographical map that shows the total arrests as dots on the map.


This lab proved to be quite interesting and I was able to learn a lot about the NYPD and the nature of their law enforcement. I discovered that the racial categorizations of perpetrators were distinct and granular especially when it came to Hispanics and the color of their skin. I also saw that the age range of 25-44 of those that were arrested by the NYPD had the highest number of arrest records compared to other age groups. However, the 45-64 age group was not too far behind with a high number of arrest records. I do wish I was able to visualize the offenses that the various age groups were arrested for because I think it may have provided in-depth information on the types of crimes that were committed by these groups. I think it would have been interesting to see if there were any frequent crimes that fell within each age group.

I also wondered how much of a role did the COVID-19 pandemic play in the reduction of crimes in the month of March. I think it would be interesting to see updated arrest data from March to June and see if there was a significant drop in arrest records or even the opposite as a rise of protests against police brutality occurred throughout New York City and the nation. In the future, I think I would like to explore this project further and see what information I can glean from an updated dataset particularly during the summer months of 2020. It would be interesting to see what offenses were annotated by the NYPD when arresting both police brutality protesters and people who were cited for not social distancing or wearing a mask during the COVID-19 pandemic.


  1. NYC Open Data
  2. OpenRefine
  3. Tableau Public