New York Crime From 2008 – 2015

Final Projects, Visualization

New York Crime From 2008 – 2015

Observation of the crime data

December 11, 2017

Xiaxin Chen


LIS-658-01 Final Project


The final project, mainly to build a visualization with the knowledge got from this class, based on the data we choose. I choose two datasets, one is talking about the criminal cases happen in New York City and the data of inmates under custody in the jail around the New York City from 2008 to 2015. I find that I could use the crime type to find the relationship between those two datasets. Because there are too many categories there, I group part of them and filter only four most influential crime categories. They are four groups of crime, Assault, Sex Assault, Weapons, and Violence. Assault here include all the assault crime except those related to sexual assault, like Assault 1,2,3; Sex Assault includes rape, sexual assault, and other sex crime; Weapons include those crimes related to firearms; Violence includes kidnapped, robbery and so on. I choose those four types of the crime to study because I believe those 4 crimes are the most sensitive topics and most easily affects the victims. Therefore, I think if through the research could find any issues in this area or help the relevant data collecting in the future. The assumption users are the government officials and police officers. My data comes from the website In addition, I choose to use the tableau to build the data visualization diagram and then export those visualizations as the static images and transform them into the poster. Because I imagine that use the poster as the final product could be better spread for my assumption users, and I also hope to practice my ability of design and present the data. The final visualization as Fig.0.



Example 1 & Example 2

At first, I think I should introduce the example inspired me. This example is also a long poster, but for a better description, I cut it part. And as the Fig.1 showing, I like the way it separates the different sections and uses the different type to outstand the special data number. So, I Imitate this to make some subtitles bar to separate my sections of the poster. The reason I do not make the bar full screen but just more than half is because all my content is left-justified, then most of my users will read my poster as the following order, up to down, and left to right. Therefore, if I make the full-screen subtitles bar and middle-justified the title, the user will have to start with the middle some time. This is also the reason the end line of each section is on the right side because the user should finish their reading on the right side of the poster. and the end section line has different size with the start one is because of the idea that I think the start and the end should be different to help viewers’ cognition. As for the example, even I know that the color here is meaningful, but I disagree that to use the different color on the axis of a diagram. What’s more, the lines with different color will only dizzy the viewers, but it is a good idea to draw the line by yourself to redesign the instruction of data.


Then, as the Fig.2 showing, it shows a great example to use the symbol and color to describe the data and the story. Because of the design, it also shows the hierarchy relationship well between those data here. And it is very impressive that in this example, the text descriptions are very limited, but the author utilizes the image and symbols talk the story well.


But finally, about the text description part, I choose to reference the idea of example 2. As the Fig.3 showing, this is also a poster talking about the data of crime. But this one uses more text description to help explain the author’s idea. And it uses the different color to guide the readers that the contents of the text have different important level.


As Fig.4 and Fig.5 showing, they are the complete poster of the example 1 and the example 2.

































































                              (Fig.4)                                                                                                                               (Fig.5)

Design Choices

After I collect the data and prepare to create the visualization, the things I considered mostly is about the relationship between the two datasets. One is focused on the location and criminal cases, include where it happened, what is the scene of the crime scene and so on. The other one is about the inmates, include the inmates’ age, gender, security level. Besides, both data have the part about the categories of crime. For the inmates’ data, the data will record the most serious crime type of the inmate, and for the location crime cases data, each crime cases will be defined in one of the crime types. Thus, I made the decision that the visualization will discuss based on the type of the crime. Then the second question is that there are too many categories there, and the categories for both datasets are not totally the same. I decided to narrow down my focus and make groups to solve this problem. Moreover, as mentioned in the introduction, finally I choose four types which I think will be the most sensitive topics and influential criminals as the focus of my research. Then reference with the example I mentioned above, I decided to talk about the relationship about how the quantitative number of those four types of crime changing during the time, how they distribute in the New York City, and how the inmates relate to the four types of crime changed during the time.

And about the decision on my visualization, the things I considered mostly is that if I need to edit the diagram created by tableau. The answer is yes, because the diagram created by tableau will be the interactive one, and the axis of the diagram sometimes may hard to combine with your design idea. So, I decide that just use the shape and color part of the diagram and change the data marking and axis part into a way more continually with the infographic.


Discussion & User Test

Generally, I share and discuss my work with my roommates and friends. Some of them are coming from the major of public relations, so I think they may help me in understanding the assumption user: government officers or police officers. And the discussion is finally focused on some details of my work. As the Fig.6 showing, the first things my friends argued is that the part of the title is red, this gives them a feeling that those letters have been separated so that they could not recognize the sentence sometimes.


Another issue they point out is showing as the Fig.7. Sometimes, they feel hard to distribution the section of section. I made the big white subtitle bar as the beginning and the thin white line as the ending, but it seems not everyone could get this point and they will consider the between the thin line and the next title part should have some contents inside. But in fact, nothing there, and this make them confused.


So, in my next version, I make the thin white line near to the next white title. Indeed, their suggestion is that I could try to make the thin line has the same horizontal with the next title bar. But after several trying, I made the decision as the Fig.8 showing. Because if the thin line overlaps with the title bar, it will separate the section, but it is too hard.

As for the data visualization, they could understand the meaning of the number and those diagrams, therefore, I think this part is working.


Datasets and Design Process

At first, I am talking about the number change of the crime data. As it resources description, the data contain the records of those known crime cases. It records the crime handled by NYPD from 2008 – 2015. And it obviously that during the end of 2013, the criminal cases rise sharply. And the marvelous things are that if we go insight of the data, the data only for the 2013 crime, it has the same pattern with the data across 2013 to 2015. However, the contradiction part is that the inmates’ data not only never show any dramatic change, but also it shows that the total people of the crime in New York has decreased year by year from 2008 to 2015. This does not make sense because this is just like saying that the number of criminals has become less and less but they are willing to create more and more crimes. Based on the data, we could find that in 2014 there are almost 6 times of crimes happened during this year than the number of 2013. But the number of inmates in 2013 is similar to the number in 2014. What’s more, based on my understanding of both datasets, only those criminals who have been arrested will be recorded in the data. This means that for the inmates active in 2014, they need to work 5 times hard to create this result. This lets me have to consider if the data of crimes is correct before 2014.

This is the first things I found interesting in these two datasets, and the other one is coming from the data in the location file, which could describe the data distribution. From the crime data, we could find out that the Sex Assault has taken part of the seats of the data. But if we read closely on the map data, we could see that there is only two of them has recorded the exactly location of the incident. What’s worse, these two are describing the same place. This is not difficult to let people want to call that if the sex assault happens, the victim should claim out.

This is what I read from the data, and I hope my visualization could also help the viewer notice those contradictions and issues. In addition, I took away a lot of types did not belong to the four group I defined in the project. This may lead to misunderstandings on some issues, I hope in the future I could have the ability to add more crime categories to analysis.



As the Fig.9 showing is the full poster of the infographic I made for the final project. It has a top to the bottom hierarchy with 4 sections. The content of each section is as follows: the change of the quantitative number of crime; Then the next is the incident location map; Then is some support materials, like let us know that it is really hard to stop a crime, most of the crime will be completed before the criminal has been arrested.


Future and Insufficient

I have to say that I think I did not go so deeper into the data. There should be more relationship could be found, and even with the type of the crime, I think I could create more data visualization not only the flow map, bar chart, and the symbol map. What’s more, in fact, the information comes from the bar chart is very important, but when I finished the visualization, I feel I did not put the bar chart in a good place.

And in the future, if I could continue this project, I think I will make each section has more stable connections. And try to use more visualization to tell the story not to use the text. And I also think I should go to do some research in the crime field, to have a better guidance when I read the crime data. What’s more, maybe go to find some history and old news will help support the idea and assumptions I have mentioned in the project. And those research may also help me find some unbelievable connections between those parameters in my data.

Generally speaking, even with a lot of pity and insufficient, this still an interesting project and I hope the data visualization could help reveal more stories behind the data.