Introduction and Inspiration
Is New York City dangerous? Maybe – having been portrayed in numerous movies throughout decades, the Big Apple seemed to be a city of crime. In David Fincher’s renowned suspense film Seven, for example, the city was painted as a wet, grim urban environment ridden with horrible crimes. However on the other side of the story, as of 2017, crime rates in NYC were among the lowest of major cities in the United States, according to the FBI Uniform Crime Report. Despite statistically being safer than other metropolis, it could still feel dangerous and at times concern me when I don’t know my way around very well, as well as take the subway or walk home late at night. To have a better understanding of what’s happening in different neighborhoods of New York City, I used the most to-date New York Police Department (NYPD) data in 2019 to create a map that would provide visualizations on various levels of crime, including complaints, arrests and shooting.
This topic brought me back to my experience before I first moved to New York. Looking to stay in a safer area, crime rate became a key factor when I was deciding where to look for apartments. While doing research on different neighborhoods I found that some real estate/rental websites such as Trulia and Neighborhood Scout who provide this layer of information had become very helpful in leading me to a final choice of Roosevelt Island.
Upon researching for further inspirations, I also came across this NYC Crime Map created by the NYPD. This interactive map allows users to see crime data by specific neighborhood, choose the type of map and filter by crime type, date range, etc., so that users could get as granular as they would like to.
Materials and Methods
- NYPD Complaint Data Current (Year to Date): This dataset includes all valid felony, misdemeanor, and violation crimes reported to the NYPD for all complete quarters so far this year (2019)
- NYPD Arrest Data (Year to Date): This is a breakdown of every arrest effected in NYC by the NYPD during the current year
- NYPD Shooting Incident Data (Year to Date): This is a breakdown of every shooting incident that occurred in NYC during the current year
- Neighborhood Tabulation Areas: This file contains the boundaries of Neighborhood Tabulation Areas (NTA) as created by the NYC Department of City Planning using whole census tracts from the 2010 Census as building blocks (NTA boundaries and their associated names may not definitively represent neighborhoods)
- New York City Population By Neighborhood Tabulation Areas: This dataset shows change in population numbers from 2000 to 2010 for each NTA
- Carto: Visualization software used to create location-based maps
To see which area has more reported police complaints, I first chose a choropleth map to visualize the NYPD complaints by each neighborhood with a graduated color intensity denoting the magnitude. After importing all the datasets to Carto, I started with the Neighborhood Tabulation Areas (NTA) file and tried to bring in the NYPD Complaints data file. To avoid mapping with absolute numbers, I used a normalized value, which is the rate of complaints by each neighborhood population. To achieve this, an addition of 3 columns (“population”, “complaint_count”, “complaint_rate”, Figure 1) were manually added using the following SQL commands (since the NTA Population file contains data over 10 years and only the most updated census was needed, the first section below was used to extract all values from the most recent year).UPDATE nynta_complaint
SET population = (
SELECT nyc_population_by_neighborhoods.population from nyc_population_by_neighborhoods
WHERE nynta.ntacode = nyc_population_by_neighborhoods.nta_code and nyc_population_by_neighborhoods.year = ‘2010’)
SET complaint_count = (
SELECT count(*) from nypd_complaint_data_current_year_to_date_
SET complaint_rate = (complaint_count/population)
Similarly, a layer of choropleth map was used to show the NYPD arrest. But instead of normalizing by neighborhood population, I chose to use a number of arrests by number of complaints ratio, since I thought this metric might be more helpful and indicative in terms of exploring police enforcement activity. Therefore, another column (“arrest_complaint”, Figure 2) was added using the following command.UPDATE nynta_arrest
SET arrest_complaint = (arrest_count/complaint_count)
These two layers were linked together using the “Link a Second Layer” analysis tool to be controlled by the same filters. In this case I applied two widgets, which enable users to filter by either neighborhoods or boroughs (Figure 3). Pop-up will appear when users click on each area to show corresponding information (Figure 4).
In addition, I also wanted to show more specifically what areas might have more arrests than others. So similar to the NYC Crime Map, I also created a heat map using the original NYPD Arrest Data file to map out locations where the incidences happened. For this layer, a widget of offense type was added (Figure 5). Users can click on each bar or search for keywords to see the intensity of each crime in relation to areas.
Finally, a dot map was created to map the distribution of shooting incidences. A darker color was used to flag the victim’s death. For shooting incidences, I focused the pop-ups on neighborhood, occur time and date, and location description if available (Figure 6). The dataset only specified the borough in which the incidences occurred, so I also joint the neighborhood names from the NTA dataset to this file using the following command:UPDATE nypd_shooting_incident_data_year_to_date_
SET ntaname = (
SELECT ntaname from nynta_complaint
Results and Interpretation
The result is an interactive map consisting of four layers. Providing information on both a larger scale area (e.g. boroughs) as well as specific regions divided by commonly recognized neighborhoods, this map enables users to explore different facets around crimes occurred in New York City. LINK TO INTERACTIVE MAP
From Figure 7, we learn that up to this point in 2019, lower Manhattan, Bronx and east Brooklyn have seen more complaints reported to NYPD compared to others. In Manhattan (Figure 8), Lenox Hill-Roosevelt Island has the lowest complaint rate (0.005) by neighborhood population, and interestingly, Central Park and Randall’s Island have the highest within the borough (0.136). As I continue to look at other boroughs, similar results stood out to me where a higher complaint rate are spotted around parks, and we can recognize the patterns that the major parks and cemeteries were treated as one neighborhood named “park-cemetery-etc-borough”. Although the number of population were aggregated from each of these areas, they were still significantly lower than other neighborhoods as residency is less common. This might explain why they tend to have overall higher rates.
A higher arrest-complaint ratio might indirectly suggest a more active police enforcement activity, although this is to be determined because situation varies, and complaints and arrests have no exhaustive relation of cause and effect. The choropleth in Figure 9 indicates that arrests might not be necessarily proportional to complaints. There are some areas which are consistent in both, such as Lenox Hill – Roosevelt Island and Stuyvesant Town – Cooper Village, as they have both relatively low complaints and arrests. But most of the neighborhoods show certain discrepancies. For example, Midtown South, along with Clinton, and Hudson Yards – Chelsea – Flatiron – Union Square (below Central Park), is high in the amount of complaints but comparably low in arrests. On the opposite, Flushing in Queens only has a complaint rate of 0.012, yet the arrest/complaint ratio has reached 0.747, well above the average (Figure 10).
Figure 11 shows an example of the NYPD arrest heat map layered on the choropleth map. The “Offense Type” widget is applied to see arrested crimes related to “Dangerous Drugs”. The density indicates that upper Manhattan and Bronx has the highest concentration of drug related arrests, as they sort of populated throughout the areas. Midtown South and Lower East Side of Manhattan, Crown Heights, Stuyvesant Heights and Brownsville of Brooklyn also have noticeable clusters.
The Shooting Incidences layer in Figure 12 tells a significant distinction between boroughs, with the majority of incidences occurred in Harlem of upper Manhattan, Bronx and Brooklyn from Fort Green across East New York. When layered with the Arrest choropleth map, it’s not hard to see that the shooting incidents are largely corresponding with the higher arrest rate areas. When I examine the incidences where the shootings resulted in the victims’ death, I found that they mostly occur at night or early morning.
The most challenging part for me in this project is understanding the logic of how to get the information that I need out of the data that I have. The two approaches mentioned in lecture were a little vague and abstract to me at first and I was confused about where to start with. After a couple trials and errors, it became more clear to me how the data work and how I should work with it, yet I was still making changes along the way. As important as data is, there is no perfect dataset that would have everything I need, so during the process I feel it’s important to draw a picture and clarify the logic first before you go all over the places.
For example, I knew I want to use a normalized metric which is the rate of complaint. But I later realized that – one, I didn’t have a population data in the NTA file, so I need to find one; two, the number of complaints was another type of file which has location as points and it does not have a shared column with the NTA one; and three, how do I connect things together from 3 different files to the one metric that I want? So the truth was, I was skipping all the concrete steps.
For the data itself I think it would have been helpful if I look at the data and examine the visualization more closely at earlier stages, so that possible outliers which could skew the data, such as the “park-cemetery-etc-” could be treated differently.
I also struggled a little bit about the Carto analysis tool such the “Base” and “Target” layer and which one to apply the filter in order to control both. Also I hope that Carto would not override the previous style setting when I make changes back and forth. That means if I had styled the visualization using aggregation by hexbins, but switched to the square style which didn’t turn out as good, and decided to reverse back to hexbins, I would have to redo all the settings. This could be a little frustrating.
After testing and critiquing during class, some changes could be made to improve my visualizations:
- The two choropleth maps are so similar that it might be a bit hard to understand at first. It is not immediately available that the Arrest layer is using a different metric and user would think that they represent the same type of information which requires additional cognitive load to process the information. Try to reduce this complexity and use only one choropleth map that are more relevant. For final projects, perhaps keep the Arrest data since they are more factual and reliable while complaints could be more subjective
- Due to the limitations of Carto I was only able to use the same widgets to control two layers at most using the “Link a Second Layer” tool, which also caused confusion to user because it fits user’s mental modal to have everything change accordingly when they use the filter.
- The color of the heat map are too close to the choropleth map. It is fine for the layer on its own but became a little hard to see when combined with other layers. Find a different color to increase the contrast