Introduction
Throughout the years, the NYPD has received an amount of well-deserved flack for its reluctance to “open” data, along with its blatant disregard for FOIA requests. The agency’s covert practices have been highly criticized by transparency advocates, members of the civic tech community, and even public officials. In an April 2016 interview with GothamGazzette, City Council Member and Chair of the Committee on Technology James Vacca explained how vague criteria regarding quality and timely release of data continues to be a problem for city agencies (Sanborn-Hum). Moreover, Vacca says that the city’s data policy is overly dependent on agency cooperation without offering any guidelines for ensuring accountability (Sanborn-Hum).
Civic data blogger Ben Wellington shares Vacca’s concerns, and has openly criticized the NYPD’s habit of releasing “statistics reports” – which typically aggregate NYPD data into a series of graphs and charts – in lieu of raw, downloadable data (Wellington). Is it possible, however, that the tides are finally changing? The city recently staged a release of NYPD crime data covering individual incident reports that span nearly decade, a massive data dump that allows data scientists to analyze the types and amounts of crime in each neighborhood, and so on. While this crime data release has been framed a triumphant moment for transparency advocates, there are still few data sources for details such as settlement claims, or civilian complaints. Much of the information available regarding the aforementioned topics is either aggregated and entombed in a PDF report or nonexistent. Which brings me to the topic of my CartoDB lab.
Instead of taking a crime statistics angle, I wanted to look at data pertaining to police misconduct, and analyze how misconduct proportionately or disproportionately affects certain precincts and neighborhoods. I initially intended to analyze ClaimStat’s “Policy Injury Personal Action Claims, with data taken from a Google Fusion table containing the lat long values. My idea took root after reading about a surge in NYPD Settlement Claims over the past 5 years, and I was hoping to map and determine the amount of money paid in settlements for unlawful arrests in each precinct. Unfortunately the contents of the table were rather sparse and there were no settlement dollar amounts to be found – only the claim number, date, and geocoordinates.
After googling around, I was able to find a dataset of Civilian Complaint Review Board (CCRB) complaints against the NYPD from 2008-2013. The dataset is housed on Beta NYC, albeit absent from NYC OpenData. While the data is not particularly robust, it contains enough information for a surface-level analysis of policing throughout the boroughs, including:
- Complaint Date
- Precinct
- Officer ID
- Disciplinary recommendation
and
- Type of complaint, which is broken down by FADO (Use of Force, Abuse of Authority, Discourtesy, Offensive Language), then further described based on its FADO classification.
Inspiration
Before embarking on my analysis, I wanted to see if any other projects had been completed with this particular dataset. It turns out that WNYC analyzed the data available for year 2012, and created a simple visualization that unpacks charges recommended by the CCRB, then weighed those charges against charges the NYPD actually pursued.
I also found that NY Daily News had generated a choropleth map displaying the density of CCRB complaints in 2012, by precinct. Because maps are not ideal for visualizing data over time, I decided to pursue a project in a similar vein that instead maps cumulative substantiated CCRB complaints by precinct.
The Data
I found that the data provided by Beta NYC did not require a ton of cleaning, however I made some small adjustments that would ready the dataset for future analyses beyond this lab. First, I renamed the “Command” column to “Precinct” and prepared the values for a column join with the NYC Police Precinct shapefile. Certain precincts were trickier standardize than others because the data set’s naming conventions were slightly different than those found in the shapefile. With a little bit of research, however, the situation was rectified.
Next, I removed also instances of “Substantiated” from the “Recommendation” column – making sure to leave behind the text within parentheses – because this particular data sheet only contains substantiated complaints. I also did a text to column split, which separated FADO values from its specific complain. For example:
Lastly, I created a pivot table to count both the total amount of substantiated complaints per precinct and the amount of FADO types per precinct. I thought it would be interesting to display a FADO breakdown of complaints in the map’s tooltip, which lends itself to richer annotation and sheds some light on the types of policing issues in certain precincts. From this pivot table, I was able to visually determine that “Stop” is the most prevalent Abuse of Power complaint, and that “Nightstick as Club” is the most prevalent Use of Force Complaint.
I also noticed that there seems to be a FADO category that falls outside of authority, force, discourtesy, and offensive language – it is simply labeled “E,” and seems to reference misconduct attributed to discrimination. Because I am unsure of the technical categorization, I have labeled the complaints as “unknown” on my CartoDB map.
With my data cleaned and ready to go, I imported the .csv into CartoDB and did a column join with the NYC Police Precinct shapefile I downloaded from Bytes of the Big Apple.
Visualizations
After merging my datasets, I created a choropleth map showing the “Grand Total” of CCRB complaints in each precinct from 2008-2013. I then uploaded a “Neighborhood Tabulation Area” shapefile and created another layer to show faint neighborhood boundaries – which are far more recognizable than precinct boundaries. I decided that I would show two versions of the map – one zoomed out to provide a broader picture of policing within the 5 boroughs, and another zoomed in with neighborhood labels activated so that viewers can see how neighborhoods overlap with precincts. The neighborhood labels – and the stigmas attached to certain neighborhoods – provide greater context than simply showing precincts.
On both maps, I created a hover tooltip to show the Precinct #, Grand Total Complaints, and a breakdown of complaints by FADO category. Because the choropleth legend struck me as unclear, I used HTML to customize a legend that provides a number range to go along with each color on the map.
I then moused over each precinct to check the accuracy of the data. Much to my surprise, I learned that precincts in the Park Slope/Gowanus/Prospect Heights/Cobble Hill area were not covered in this CCRB dataset. It seems as though the precincts simply did not provide data or that there weren’t any logged complaints in these areas, however I have no real way of knowing why these areas are unaccounted for. Similarly, I found that a large area of Staten Island was also devoid of data. Upon further inspection, I learned that the area – called PCT 103 – was created in 2013, and because my dataset only covers years 2008-2013, it did not include any complaints in this newly formed precinct. I added both findings as annotations on my maps.
To make the maps a bit more informative, I annotated PCT 46, which has more substantiated complaints against the NYPD than any other area of NYC. PCT 46, which covers parts of the Fordham South and Mount Hope neighborhoods of the Bronx, has a grand total of 109 complaints. I also annotated PCT 73, which covers parts of Ocean Hill and Brownsville, and has more “Use of Force” complaints than any other precinct in the city.
Zooming out, it is visually apparent from my maps that the Bronx and parts of East Brooklyn have the most logged civilian complaints against the NYPD. Given more time and more data, I would weigh “Total Substantiated Complaints” against total police encounters in each precinct from 2008-2013. This leads me to an acknowledgement of some error in my map: it can be deduced from my visualization that the areas with more complaints are likely areas with more police activity to begin with. Even so, the visualization does succeed in displaying that more highly policed areas are likely to log a higher number of complaints. The more policing, the more misconduct, etc.
https://kmeiznerpratt.cartodb.com/viz/033097fa-ebbd-4e5b-9c3e-dd0803c34db7/public_map
https://kmeiznerpratt.cartodb.com/viz/c7df1e00-3ca7-11e6-bd23-0ea31932ec1d/public_map
Future Directions
Building off of this data set and concept, I would really like to take WNYC’s project a step further to analyze CCRB recommendations in comparison to NYPD dispositions. From what I’ve gathered, the CCRB is only capable of recommending a mode of discipline, and in very rare circumstances has the jurisdiction to bring a complaint to court. The NYPD is supposed to accept the CCRB’s recommendation, however the WNYC data visualization showed that the NYPD very rarely follows through on pressing charges or exacting command discipline. I would like to take a look at these figures on a map to see which precinct is least likely to follow through on pressing charges against an officer accused of misconduct.
Sources
Cunningham, Jennifer. “Complaints Against NYPD Cops Mirror Stop and Frisk Numbers: A Study”. NY Daily News, July 2013. http://www.nydailynews.com/new-york/brooklyn/complaints-cops-mirror-stop-and-frisk-numbers-article-1.1388735
NYC OpenData. Https://nycopendata.socrata.com/. The City of New York, 2015. Web. 10 Nov.
2015. <https://nycopendata.socrata.com/>.
Sanborn-Hum, Kaela. April 11, 2016. http://www.gothamgazette.com/index.php/city/6272-new-
york-city-s-evolving-approach-to-open-data
http://www.villagevoice.com/news/the-nypds-records-of-its-own-misbehavior-have-mysteriously-vanished-8639201