Introduction
As a New York resident I have become increasingly more interested in how crime and incarceration rates have changed over time in the city. I came across an interesting interactive visualization called the NYC JailViz 2.0 Application which further sparked my interest in this area. This visualization allows users to explore the number of inmates in custody for a certain date in history. This prompted me to want to dive deeper into this area and understand more about the demographics of New York City inmates and analyze trends that can be seen over the past few years. This project explores various trends in inmates from the years 2016-2022 and aims to draw conclusions about the data from the resulting visualizations.
Method
Dataset
I started by searching for a dataset that I found to be interesting and included a variety of quantitative and categorical data. I browsed various sites such as Statista, Kaggle, and NYC Open Data which are all sites that contain a plethora of public datasets regarding various topics. I made sure to select a dataset that contained at least 1,000 records and had time-oriented data. Ultimately I chose a dataset that represented the daily inmates in custody in New York City.
This dataset is based on data from the Department of Correction. It was created on March 23, 2016 and is automatically updated daily with new data. The dataset contains information such as age, gender, race, custody level, etc. of each inmate in New York City up until current day. A description of each column can be seen below.
Column Name | Description | Data Type |
---|---|---|
Inmate ID | This is the primary key column that uniquely identifies each inmate record. | Number |
Admitted Date | This field has the admitted date and time of the incident. | Date & Time |
Discharged Date | This field has the inmate discharged date and time. | Date & Time |
Custody Level | This has the level of custody provided for the inmate. Values are MIN,MED,MAX custody levels. | Text |
BradH | BRADH has values Y or N. The inmate is under mental observation. | Text |
Race | Race of the inmate. | Text |
Gender | Gender of the inmate (Male or female). | Text |
Age | Calculated Age of the inmate. | Number |
Inmate Status Code | Provides the inmate status example if an inmate id a detainee. | Text |
Sealed | Sealed=Y implies that the inmate information is not to be shown in public. | Text |
Srg Flag | SRG_FLG=Y means that the inmate is an approved gang affiliation. | Text |
Top Charge | Top charge for the inmate. | Number |
Infraction | Indicates whether the inmate has infraction. | Text |
Although this dataset contains data from 2012 to 2022, I noticed that the data before 2017 was very sparse. To ensure that the visualizations were not skewed, I filtered out data from before 2017 in many of the visualizations.
Software
In order to create visualizations using this dataset, I utilized Tableau Public. Tableau is a software primarily used for creating a variety of complex visualizations on datasets and Tableau Public is the free offering of this software.
Process
I started by downloading a csv version of the data from the NYC Open Data website and importing it into my Tableau Public project. The data seemed to be pretty clean and was ready to use. The only alteration I made to the data was to change the fields in the Race column. Initially each field in this column was a single character (ex: A, B, W), however I thought it would be more useful to include the full word. Tableau Public made it very easy to make this edit since the option to change the alias of each field was available.
Once the data was imported, I first created a visualization to differentiate the data based on gender shown in Figure 1. I pulled in Age to the Y-axis and the Count of Inmates in Custody to the X-axis. Since this chart does not differentiate by year, it is a representation of the total number of male and female inmates in New York City between the years of the 2016 and 2022. I decided to represent this graph as a line graph since it makes it easy to see the difference between males and females with two distinct lines, as well as the change in admittance across the ages. I intentionally assigned the male trend line the color blue and the female trend line the color pink since these colors are typically associated with the two genders. The distinction between genders is clear even in the absence of a legend.
Next, I was interested to explore the BradH column which was available for each inmate. This field indicates whether an inmate is under mental observation or not. Since mental health and its connection to the prison system has been of relevance lately, I wanted to see how this number has changed over time. I decided to use a stacked bar graph for this visualization shown in Figure 2. The years run along the x-axis and the number of inmates run along the y-axis. I specified the color for each stacked bar to pull from the BradH column. Since each BradH value is either “Yes” or “No”, each bar contains only two colors to represent these values. I thought a stacked bar graph was effective an effective visualization in this context since it is able to represent the BradH field in a part-to-whole manner. It’s also interesting to see how these numbers change over time.
The third visualization represents how another data field, custody level, changes over time shown in Figure 3. I decided to use a pie chart to depict the breakdown of custody level per year. Again, this visualization is effective at representing a part-to-whole relationship in an easy to understand way. The size of each pie chart is based on the total number of inmates that year and the size and color of each slice is based on the number of inmates per custody level. Since there are only 3 custody levels (min, med, max) I decided to use the traffic light color palette since it contains 3 contrasting colors that represent somewhat of a severity level.
For the last visualization I was interested to understand how race data plays into the number of inmates over the year. For the visualization shown in Figure 4, I used an area chart since it was easy to see the change in the number of inmates of each race every year. In order to make this chart a bit more readable, I decided to group together the “I”, “O”, and “U” race groups together since I was unable to find which races “I” and “U” mapped to in the description of this dataset. These categories were also relatively small and did not change the “Other” group by a significant amount. Additionally I decided to include the labels for each race group directly on the chart since there were only four different groups and this increases the readability of the visualization. Ultimately an area chart worked well to visualize this data since it effectively displays the comparison between each race group.
Results and Reflection
The resulting visualizations can be seen on this Tableau Public Dashboard.
Based on the first visualization shown in Figure 1, it is obvious that inmates in New York City are more likely to be male rather than female. For almost every age represented, there is a significantly larger number of male inmates. This observation aligns with the statistic that 93.2% of federal inmates are men and only 6.8% are women. This further enforces the fact that there is a large gender gap in prisons which is very much apparent in New York City prisons as well. Additionally, in Figure 1 it can also be seen that most inmates are between the ages of 20-40 years old. After age 40 there is a significant drop off in the number of inmates. It’s interesting to note that this holds true for both males and females.
Figure 2 shows the visualization representing the change in number of inmates under mental observation per year based on the BradH status. This status is assigned by mental health professionals of the Department of Corrections and indicates that the inmate displays at least one symptom of mental illness. It is worth noting that this dataset is most likely not a comprehensive dataset for inmates between the years of 2017-2019 since the total number of inmates for these years does not seem realistic. However, the ratio of inmates that are under mental observation to those that are not can still be analyzed from this chart. It can be seen that each year more and more inmates are admitted that require mental observation. However, according to this chart, 2022 was the first year in which inmates that were not under mental observation outnumbered those that were. This is interesting since one of the after effects of the Covid-19 pandemic was an increase in mental health issues in the city of New York. For this reason I was surprised to see that a majority of the inmates in 2022 were not under mental observation.
Lastly, the visualization shown in Figure 4 represents the the change in the number of inmates admitted of each race. The area chart shows a breakdown of 4 different groups – Asian, Black, White, and Other. For this visualization I found the interactive capabilities of Tableau to be very useful. I was able to isolate each race on the graph by clicking the corresponding label in the legend. This made it easier to see how the number of each race changed over time as well how much of the total number of inmates each race accounted for. It was interesting to see that Asian inmates were only present from 2019 onwards. This seems highly unlikely and indicates that the data may again not be comprehensive.
I also thought it was interesting to see that the trends displayed in New York City inmates hold true even when looking at data for the whole United States. People of color are incarcerated in higher numbers than white people in both the United States and New York City specifically. In Figure 5 the trends for incarceration based on race in the United States are represented and Figure 4 shows the breakdown of inmates by race for New York City. Similar trends can be seen in both Figures.
Future Considerations
If given the opportunity to continue working with this dataset there are a few things I would like to explore. First, I would like to understand more about the top charges column. According to the metadata of this dataset, this column contains information about the top charge of the inmate, however I was not able to successfully determine the correlation between the numbers in this column and the charge. With more time I would like to research more about this part of the data. I would also like to possibly connect this dataset with another dataset that contains information regarding the duration of stay of an inmate. I believe this could provide valuable information regarding trends in how long inmates remain incarcerated for.
Overall, this dataset was really interesting to work with and provided opportunities to visualize various aspects of the data. Tableau Public was also instrumental in creating these visualizations easily and effectively.
References
- https://www.statista.com/chart/11573/gender-of-inmates-in-us-federal-prisons-and-general-population/#:~:text=There’s%20a%20pretty%20hefty%20gender,only%206.8%20percent%20are%20women.
- https://greaterjusticeny.vera.org/nycjail/
- https://trends.vera.org/?gclid=CjwKCAjwtcCVBhA0EiwAT1fY7y6jqXczZZouYMZgbjiX10rX3DEgtolGSv_GP5vvNthc02IqNjIIOxoCdc4QAvD_BwE