Before studying in New York City, I visited the city three times, as a tourist. I could still remember how each time I was overwhelmed by the intensity and diversity of everything that was going on around in the city. Even after staying here for almost two years, it still concerns me especially when I don’t know my way around very well, take the subway or walk home late at night. When I was first looking for a place to stay, crime rates of different areas had became a key factor to determine which neighborhoods seemed more appealing. I found that some real estate or rental websites such as Trulia (Figure 1.) who provided this layer of information had become very helpful in leading me to a confident choice. Curiously, I did some research to see other people’s perceptions of crime and safety in New York. It turned out that New York City, second to Chicago, was mostly to be rated “very unsafe” or “fairly unsafe”, according to a YouGov Poll in 2014 which surveyed people to assess the relative safety of 10 largest cities in the U.S..
On the other side of the story however, as of 2017, crime rates in NYC were in fact among the lowest of major cities in the United States, according to the FBI Uniform Crime Report. New York Police Department (NYPD) also reported that the citywide overall crime continued to decline in May 2018 compared to the same periods in previous years (Figure 1). So the question lingers – Is New York City safe, and safer than before? Is there a gap between perception and reality? What about deadly crimes such as dangerous weapons? What areas in the city tend to have more safety concerns and where is safer? These are the questions I try to answer in this final project.
This NYC Crime Map created by the NYPD had inspired the design of my previous Map lab report. This interactive map allows users to see crime data by specific neighborhood, choose the type of map and filter by crime type, date range, etc., so that users could get as granular as they would like to. I like the choropleth map view for showing direct comparison between precinct (Figure 3.) and the location map for detailing crimes using proportional symbols, but the heatmap might not be as helpful because the distribution is all over the places. Using Carto, I was able to visualized 2019 year-to-date (YTD) rates of police complaints and arrests, as well as locations of arrests and shooting incidents. This final project will be an expansion of my previous experiments. My goal was to continue exploring around this topic and in addition to maps, I would like to delve deeper into the historical data and provide different facets around crimes in New York City.
Methods and Process
Datasets and Tools
- NYPD Arrests Data (Historic): This is a breakdown of every arrest effected in NYC by the NYPD going back to 2006 through the end of 2018
- NYPD Shooting Incident Data (Historic): This is a breakdown of every shooting incident that occurred in NYC going back to 2006 through the end of 2018
- Neighborhood Tabulation Areas (Shapefile): This shapefile contains the boundaries of Neighborhood Tabulation Areas (NTA) as created by the NYC Department of City Planning using whole census tracts from the 2010 Census as building blocks (NTA boundaries and their associated names may not definitively represent neighborhoods)
- New York City Population By Neighborhood Tabulation Areas: This dataset shows change in population numbers from 2000 to 2010 for each NTA
- Carto: Open-sourced visualization software used to create interactive location-based maps
- Tableau Public 2019.2: Free software to create interactive visualizations that enables online sharing
The target audience for this visualization are people who plan to or currently live in New York City for either residency or temporary stay. I chose specifically to use the neighborhoods as geographic boundaries over the police precincts that NYPD indexed by standard. Because neighborhoods are identified by names (E.g. “Lower East Side”, “West Village”, although not entirely inclusive), they tend to fit more into the mental model of an average audience as of how they commonly recognize or refer to different areas of the city, whereas police precincts are numbers, which might require more efforts out of viewers and thus not considered preattentive processing.
Rather than the YTD data used for my map project, I chose to use the historical dataset. Because only data from the first two quarters of 2019 is available by the time of this project, it might be more difficult to represent, while the historic data allowed me to explore by the entire year and examine long-term trends.
Based on the user feedback from last lab results, presenting both NYPD Complaints and Arrests data might raise confusion for users. For this project, I decided to keep the later one as they are more factual, reliable documentation of crimes (although not necessarily) compared to complaints, and would therefore be more relevant in this case. Although Carto is specially powerful in maps, there are limitations when it comes to presenting the wealth of information provided by the NYPD data file, such as the date, the type of crime, time of occurrence, etc.. Tableau would be a better alternative to integrate different types and connect pieces of information, and thus creating better consistency for users.
Visualization and Design Process
The subjects of this final project are divided into two parts: maps used to visualize police arrests and shooting incidents in the most recent full calendar year (2018), and the overall crime analysis in the past five years from 2014 to 2018.
1) Compiling and Preparing data for Map Analysis
To see which area has a higher rate of arrest, I started with a choropleth map of NYPD Arrests by each neighborhood. Because the graduated color intensity denotes magnitude, it is the most straightforward presentation to compare between different regions. To avoid mapping with absolute numbers, I used a normalized value, which is the rate of arrests by each neighborhood population.
Since the neighborhood area boundaries, neighborhood population, and arrest records were 3 separate data sources, I extracted the piece of information needed from each of those and join them together. The NYPD data file, however, do not have a shared IDs that reference those of each shapes in the NTA shapefile, so I needed to match the exact location where each arrest happened using longitude and latitude with the corresponding neighborhood shapes. Although I was using Tableau for final visualizations, Carto was a helpful tool to achieve all of this by enabling users to perform SQL operations once all the data files are uploaded to their database.
After importing all the datasets to Carto, I started with the NTA file and tried to bring in population and number of arrests for each area. Three additional columns were added using the following commands (since the NTA Population file contains data over 10 years and only the most updated census was needed, the first section below was used to extract all values from the most recent year).
SET ntapopulation = (
SELECT nyc_population_by_neighborhoods.population FROM nyc_population_by_neighborhoods
WHERE new_nynta.ntacode = nyc_population_by_neighborhoods.nta_code and nyc_population_by_neighborhoods.year = ‘2010’)
SET arrest_count = (
SELECT count(*) FROM Arrest_18
WHERE ST_INTERSECTS (Arrest_18.the_geom, new_nynta.the_geom))
SET arrest_rate = (arrest_count/ntapopulation)
For shooting incidents, I applied similar analysis in Carto to join the number of incidents from the shooting incident data file to the neighborhood spatial file.
Overall compiling and preparing data were the the most challenging process during this part of visualization. Initially, I would like to use the historic data for the map as well and use “Years” as a date filter so users can see how the how the rate of crime arrests and number of shooting incidents within the same area had changed over time. I was able to achieve this with the shooting data since the file size was significantly smaller. However, I had a hard time dealing with the Arrests information. As the dataset contained information going back to 2006 with over 4.8M rows, it was gigantic and exceed far beyond the storage limit of either OpenRefine or Carto. I was able to chunk the data down to the past 5 years (2014-2018) using the filters in Tableau, which reduced the size, but they were still to big to be processed by any other tools. Eventually, I decided to map out the most recent full year.
2) Visualizing in Tableau
From my previous lab report, I noticed that some of the highest rates occur around park areas. Although major parks and cemeteries were treated as one neighborhood under “park-cemetery-etc-borough” and the number of population were aggregated from each of these areas, they were still significantly lower than others as residency is less common. This caused the rate to be much higher and might skewed the overall results. So for this project, I filtered out these areas across all five boroughs in the visualizations (Figure 4.).
I was able to realize two layers of geographical information on top of one another in Tableau using a dual-axis map, which was the rate of arrest by neighborhood, and a categorical variable of “Borough”. In the first choropleth map, the gradient scale represents level of variability, and the neighborhoods with higher rates are in darker shades. I used color to define borough boundaries to facilitate lookup for areas that had relatively more or fewer crimes within a larger region. Therefore for each borough, we can see which neighborhoods has the most rate of police arrests (Figure 5.)
The second visualization was a point map plotted on top of a filled map. Data were joint from two sets of geographic data source: the NTA shapefile with incident counts, and the NYPD shooting data file with each incident location (Figure 6.). This was a combination of Tableau generated geometry fields and customized latitude and longitude fields. I chose the point map because I would like to see the distribution and density of where the incidents happened. Colors were used to differentiate categorical values where a darker red flagged incident of murder and a lighter orange represents that the incident did not result in the victim’s death. Such colors were chosen as they are often assigned different levels of severity. Their transparency was adjusted to avoid occlusion issues. In cases where there are more incident happened around the same areas, the neighborhood boundaries might be blocked the overlapping points, so I also used an underlying filled map to visually aid viewers in quickly distinguishing the heavily affected areas (Figure 7.), and when hovered over each shape, the pop-up will show the exact number of incidents.
2) The Historical Analysis
The first analysis I made with the historical data was a line graph demonstrating the trend of crimes in each of New York’s five boroughs within a five year period from 2014 to 2018. The axis label interval was set to 6 month. Framing by half a year allow viewers to easily identify patterns year by year as well as within a year. I also used a time-series line chart to analyze shooting incidents over the same period. The colors used to reference each borough were kept consistent with the coding used in the first choropleth map.
The third visualization demonstrated the most common type of crimes. I used a bar charts to compare categorical data. I sort them by magnitude in a descending order to reveal overall trends so it’s easy to see which type of offense had the largest share out of the total crimes. A filter was also added to see breakdown of different boroughs.
For the fourth visualization I was interested in knowing if there was a pattern for when shooting incidents were more likely to occur. Taking a different approach from the previous map analysis, I joint the corresponding NTA name and NTA code to each shooting record using the following SQL operation in Carto (Figure 8.).
SET ntaname = (
SELECT ntaname FROM new_nynta WHERE
ST_INTERSECTS (new_shooting_hist.the_geom, new_nynta.the_geom))
I then aggregate the number records from the top 20 neighborhoods with the highest count during the past five years and visualize them on a 24-hour continuum. I also used a line chart for this visualization for the nature of time-series continuity.
I had planned to integrate the maps and charts within the same space using Tableau dashboard, so that different information would be communicated with a higher consistency both visually and mentally. However, several data file was too big for my device to load properly, while the lab PCs did not seem to support either copy/paste or import/export from multiple workbooks and data sources onto one dashboard, I had to divide the product into a two maps and two separate dashboards.
To ensure that the product visualizations are usable and understandable, I invited three international students or graduates from graduate schools in New York City to conduct a user tests, each involving a think aloud method, some task completions and interview questions. During the think aloud portion, users were asked to explore the visualizations while verbalizing all of their thoughts and feelings. This helps me to understand how they make sense of the information, their expectations and overall impressions of the product, as well as identify if anything was unclear or confusing. The tasks were centered around finding two pieces of information, with the goal of determining if information was structured well and easy to find. Following the tasks portion, users were asked a few questions regarding potential worries and suggestions after being more familiar with the visualizations. The testings were conducted in an informal manner and in an causal environment using participants’ own device to reflect actual use cases. Each session was structured as the following.
- Please take a few minutes to explore the visualizations and let me know what you learn from each graph (Follow-up questions depend on users’ reaction, ask “why”)
a) Which ones do you consider are the most dangerous / safest neighborhoods?
b) What’s the most common type of crime in Brooklyn?
a) What would you imagine using these visualizations for?
b) If you are looking for a place to live in New York, where would choose? Why?
c) How easy is each visualization to understand? (1 = very difficult, 5 = very easy)?
d) If you have a magic wand, how would you improve it?
Final Visualizations and Findings
Perceptions haven’t caught up to continuous decline in crimes?
Citywide overall crimes continued to fall. The first thing I notice steep dip at the end of 2014 from October to December and quickly bounced back by the start of 2015. Apart from this striking V-shape pattern, we can see that the number of arrests tend to decline approaching the end of year as well as around mid-year and rise up again in between, but the overall trend had been falling. By the end of 2018, crime rates had hit the lowest point of the past five years across all boroughs. The fluctuations of each borough are mostly congruent with one another. Although Staten Island is relatively flat due to its significantly lower population base, we could still recognize similar patterns, which indicates an overall consistency in police enforcement activities. Therefore statistically speaking, having seen fewer and fewer crimes in recent years, New York had been safer than before.
Drug related crimes were prevalent
Citywide arrests due to Dangerous Drugs exceeds other type of offense by 6 percent, followed by Assault 3 (Assault in the third degree, cause physical injury), Petite Larceny, Theft-related Offense, Vehicle-related Offense, and Felony Assault (Figure 11.). Going deeper into each borough, we can see that the drug issue was particularly severe in Bronx and Staten Island, as they account for over 21 percent out of the total crimes. The only exception is Queens, where Assault 3, slightly outnumber Drugs which also accounts for about 13 percent, was the most common crime. The citywide arrests charged for Dangerous Weapons took up 3.97 percent and is ranked among the top ten most common crimes. In this category, Bronx is the highest reaching almost 5 percent, whereas Manhattan has the lowest share of 2.91 percent.
Arrests are mostly homogeneous, except for a few
Most neighborhoods in New York have approximative rate of arrest in 2018, as the level of variability within a region is visually low. It is prominent that Midtown South in Manhattan has the highest crime rate. Others such as DUMBO/Downtown in Brooklyn, Hunts Point in Bronx, etc., also stand out to me as some of the tougher neighborhoods.
But shooting incidents rose and fell
While the crime statistics proxied using arrests data show a year-over-year drop throughout the past five years except for the plunge in 2014, applying the same analysis for shooting incident does not appear to mirror a similar trend. This chart exhibits a greater fluctuations, where the number of incident tend to grow substantially during the first half of a year, peak around June to August, and then rapidly decrease towards the end of year. NYC had experienced the most serious year of gun violence in 2015, with Brooklyn hitting a steep spike of 84 incidences in August alone. The wave of shootings are also particularly noticeable in Brooklyn in general. The yearly shootings seemed to reduced for the next two years, but 2018 appeared to be another small upsurge. Overall this chart shows a trend that runs counter to the overall decline in crimes (Figure 1).
What happen at night and early morning
The number of shooting incidents is strongly associated with the time of day. Aggregating the historic data, we could tell what is going on in 24 hours. The frequency of incidents throughout a day could be roughly divided into 4 phases, as exhibited by these 20 neighborhoods with highest number of shooting records. From the break of dawn around 5 am to noon around 12 pm, these neighborhoods has been relatively peaceful; in the afternoon through dusk, shootings begin to rise gradually; from around 6 to 7 pm, the number increase substantially and eventually peak from around 10 pm to early morning at 2 am. After that incidents begin to drop drastically and return to how it started out.
Neighborhoods in Bronx and Brooklyn were most effected by gun violence
This shooting incidences map in Figure 15 shows two distinct clusters that tells a significant difference between boroughs. With the density of data points and underlying graduated color, we can see that Brooklyn from Bushwick and Bedford all the way across East New York, in particular Crown Heights, Stuyvesant Height and Brownsville, are the most prevalent with gun crimes. A majority of incidences also occurred in Bronx. Except for Harlem, Manhattan in general has the least amount of gunfire concern. This is also consistent with the findings on arrests by crime types.
User Testing Insights
From the users tests, I received valuable and insightful feedback from participants, which helped me to identify the following UX/usability issues with my current visualizations.
- Because the crime rates for most neighborhoods fall within a similar range, the difference in color was not noticeable enough (Figure 12). The first thing they noticed was that each color represents a borough, but it was not intuitive for users to differentiate rates using a gray scale (also the legend). Also it became more difficult when user want compare neighborhoods across boroughs, since they were in different colors. This is a common issue from all participants. Using color for this category might not be an ideal option here, and using a single color might be more straightforward
- Since I had been working on a larger screen while creating all the visualization, I hadn’t realized how they would scale down in a common laptop screen. As a result participants were confused as they found some titles of the legends were not able to be displayed in full (e.g. Figure 15. the word “Death” from “Resulted in the Victim’s Death” was hidden). Therefore instances as such needed to be refined
- The colors used in the shooting incidence map were too close. The red, orange and yellow were on the same color range, and when they overlapped on top of each other it became more difficult to see, especially yellow against orange, and orange against red (Figure 15). The contrast needed to boost up a little. From a cognitive aspect, one user mentioned that whether or not the shooting resulted in the victim’s death seem less significant. Because shooting incidence itself was severe enough and whether it accounted for a murder would depend on many variables, it might not be as meaningful. So it is to be determined that if this extra layer of information is necessary
- All users were surprised and curious about what happened in 2014 (Figure 10.) and even started Google right away. It would be interesting to do some research on this and provide some hypothesis trying to explain this phenomenon
- For the bar graph (Figure 11.) maybe try using a color scale for data that fall within a certain range rather than just one color (e.g. Top and bottom percent). One users stated that the type of crime would have little influence on deciding where to stay because if an area was high in crime rate in general, he/she would already be less likely to consider it.
- The shooting by time of day chart (Figure 14.) had too many colors and categories that they could be overwhelming (also the legend). It would be helpful to reduce the number of categories, or color-code them by borough and assign different shades to its neighborhoods
- All users were able to complete the tasks and use the filters.
- All users brought up the subject matter of rental or housing for either individuals or agents/brokers during interview before I asked the next leading question
- For the second interview question, Upper East Side and East Village were common answers. One user responded that not only safety was a concern, he/she would also take the neighborhood household income or educational level into consideration
Recommendations and Future Thoughts
To continue improving upon this project I would first refine my work according to the user feedback. Although some visualizations were not able to be fully carried out as planned due to the data sizes and constraints of technologies, being able to accomplish them in the future would be a big step forward. For example, the way users interact, process, and interpret information would be different if all the visualizations are arranged on the same space. Also normalizing the data by population for the arrest line chart would yield a more accurate and valid representation of the trend.
With the current outcome, I was able to have some of my questions answered, yet more research and study need to be done in order to answer them with confidence. It would be interesting to see the trends of crimes within each neighborhood overtime. Another example would be measuring call and police behaviors to look at police enforcement activity in each neighborhood. In addition, combing crime statistics with data on demographic information such as average household income and educational level might yield informative results. In order to know where we are, it is important to know where others are, so another direction would be to cross compare data from other big cities and their public perception of crime and safety, and maybe we will better understand why there is a large gap between perception and reality when it comes to New York City.