The “Friendly Neighborhood” of New York City


Charts & Graphs, Lab Reports, Visualization
Map distribution of New York City Crime in 2022 Year to Date

Introduction

New York City is known for being one of the “not completely safe” city in the United States for many decades. Everyday in New York City there are numerous of crimes committed, and New York City Police Department (NYPD) takes major efforts to try to reduce the crime rate. As technology and internet become more and more accessible to everyone, it’s very easy to obtain such crime data to study the trends and observe interesting findings using these data. In this study, we will try to use some of these New York City crime data to create some visualizations to help us have a better understanding of the crime trends in New York City.

Software

The software that is used in this study is Tableau. Tableau is a very powerful data visualization software that allows users to simply drag and drop attributes of the data to create amazing graphs and charts. This is the first time I am using Tableau, and at first it looked a little overwhelming with all the information presented on the interface, but once the data is imported, it’s pretty straightforward to use by just creating worksheets and dragging the columns that you want to use in your specific chart.

Figure 1: A snapshot of using Tableau Software

Datasets

The dataset that was used in this study was obtained from NYC Open Data website. This website provide all sorts of open data that is related to New York City, and this particular dataset contains all valid felony, misdemeanor, and violation crimes reported to the NYPD. I have chosen this dataset only contains Current Year To Date because dataset on this topic that covers an even bigger time range is way too big for my computer to handle.

Link to the dataset is here: https://data.cityofnewyork.us/Public-Safety/NYPD-Complaint-Data-Current-Year-To-Date-/5uac-w243

Process & Methods & Results

Data Cleaning/Processing

There were some cleaning that had to be done because there are lots of values with (null). There were also values that are “UNKNOWN”, which are not able to be cleaned because we don’t know what the actual value that it really is based on just the dataset.

For the (null) values, since we don’t know what values it actually should be either, these cells will just be replaced with UNKNOWN to match the other UNKNOWN cells.

Design Methods & Goals

We would like to see some kind of trends based on these crime report dataset, so columns like suspect’s race, victim’s race, and suspect’s gender are very good candidates for the final charts. It would also be interesting to see what kind of crimes are being committed the most for the past 2 quarters of the year, so columns like Offense Description would be very useful.

To count the datasets, we can utilize the Complaint number and perform a COUNT calculation on the column to populate the count for each of the target columns.

Result

First, let’s start off with a simple graph that shows the number of complaints through out the first six months of 2022. Figure 1 shows the data day by day and we can see that the number of crimes never drops below 800.

Figure 3: Number of Complains in 2022 by Month

If we scale up the graph and look at this data based on months, we can see that there is an increase in the number of crimes starting from February and March.

Next, we can take a look at the distribution of race on these crime reports.

From these charts, we can see a problem with the dataset. While unknown might seems to be some kind of data that should be filtered out, the number of rows with “UNKNOWN” values are actually higher than any categories here. Therefore, it’s actually not a good idea to put into any assumption on the trend based on race because the unknown values could add up to any of the rest of the race.

Same problem can be seen in the number of crimes based on the gender. There were quite a lot of (null) and unknown data and they have to be grouped together first in order to make this pie chart more organized. However, the chart’s information can’t really represent anything because the “Unknown” portion is way too huge and could actually have an impact if we find out on the percentage break up between male and female if we find out what the actual values of the unknown data are.

Other than the problem with the dataset, originally the bar chart are all single color, but after creating the pie chart, I realized that having different colors for each “race” or “gender” is better to distinguish each section and easier to read for the user.

Finally, we have a map distribution of all the crimes mapped with the corresponding latitude and longitude

Red was chosen here with each of the data points as one row of the dataset because red stands out to the eye and with the opacity turned down, we can see that some areas are not as dense. However, majority of the NYC map have this thick red color, which shows that the number of crimes are pretty much the same everywhere.

Future Direction

If I am doing this project again, I would definitely want to focus on more other columns and make more complex graphs with more dimensions. For example, each of these crime complaints have a time range (complaint open date and close date), these columns could be useful if we are making a visualization to show how long do these complaints usually last, or in other words, how long does it take for NYPD to resolve these problems that people are reporting. Furthermore, I believe the styling and designs of these graphs can be improved as well. For the map distribution graph, each data points are very close to each other and the overall area that are being covered is too big. It would actually be better if I group the datasets into smaller sections of New York City and plot it to something similar to a heat map.