exploring nyc shooting incident networks using gephi



After my initial exploration of NYC Open Data’s record of shootings from 2006-2020, I wanted to take a closer look at the victim/perpetrator demographics in order to better understand who was involved in these incidents.  Using the same data set, I focused on the information gathered on the ages, races and genders of both the victims and perpetrators.  My purpose was to determine what networks, if any, existed between the the different groups within those three categories.  For my investigation I utilized Gephi.  Gephi is a free and open-source software that enables you to import data in order to identify networks within it and generate visualizations based off of those networks. 

Gephi is a free, open-source software for investigating data networks


Initially I uploaded the entire dataset as is.  As show in the image below, the end result was a network visualization that was essentially a massive blob since it was simply charting every single shooting incident.  I thus realized I needed to filter through the dataset to create groups that could be more easily compared.  

The results of my initial attempt to create a network visualization using Gephi

With the assistance of my professor, I uploaded my data into a Google sheet and began combining the columns for Age, Gender & Race into one column for both the perpetrators and victims.  Using the count function I was able to create a new column that listed the the number of times each combination of the variables occurred.  I used another function that divided the instances of perpetration by the number of victimizations, and created a new column to display those ratios.  

After loading this updated data set into Gephi, I was immediately able to see an improvement in the visibility of the network as compared to my first attempt.  However, the resulting visualization, as shown below, still seemed crowded —with too many groups to make a meaningful network.  In order to limit the number of groups, I merged White Hispanic with Black Hispanic to create one group for all Hispanics.  I also removed any instances where one of the three categories (age, gender, race) was unknown.  These filters helped to limit the number of nodes, and also ensured that each group’s data was accurately represented by the subsequent network visualization.    

My second attempt to create a Gephi network visualization was better, but was still overcrowded with too many groups


Once I had made the final changes to the dataset and imported it in to Gephi, I was able to examine the data more acutely and generate a network visualization that actually provided some insight.  One of the great features of Gephi is that it easily allows you to run statistics on your data.  After creating the network I ran some statistics and was able to find out that the average degree was 9.659, the network diameter was 4 and the graph density was 0.225.  

Statistics about my data as generated by Gephi

After running these stats, I then turned to formatting the visualization to make sure the information it provided was easily transmitted to users.  I chose a red gradient against a black background to emphasize the groups that had the highest numbers.  Including labels on each node made identifying the various groups easier.  I was also able to move each node in order to separate the different race categories and place each age group together by gender. My goal was to organize the findings as much as possible for analysis by future users. After consulting with my peer reviewer, Staci, I also included arrows on the edges between the nodes to illustrate the relationships between the groups involved in the shootings. 

My final network visualization created in Gephi


From my final network visualization it became clear that the majority of shootings incidents involved 4 main groups as evidenced by the large, dark red circle for each node.  These main groups included Black and Hispanic males in the 18-24 and 25-44 age ranges.  The larger arrows and darker edges clearly show how the majority of shooting incidents involve engagements between these four groups. The curved edges that start and end within the same node indicate incidents where both the perp and the victim came from that group. There was a high occurrence of incidents where both the victim and perpetrator were members of the same race, if not within the same age group. Considering how NYC neighborhoods tend to be divided by race, these findings made sense from a geographic standpoint. Overall within each race category, males were more often involved than females and most incidents involved the 18-24 and 25-44 age groups. Further research might map out the occurrences and look at the socio-economic backgrounds of the perps/victims to get a fuller understanding of who is involved in these shootings.


Overall I found that Gephi was relatively easy to use, however there were a few drawbacks.  The first being that it runs a little temperamental, and tends to quit or stop working if there is not enough memory available.  After the tragic loss of a lot of my progress because of this issue, I learned the value in saving my work frequently.  It is also unfortunate that there is no “undo” button in Gephi since it then becomes tricky to revert back to your original network after making any unwanted changes.  My last issue is that it is difficult to navigate within the network when working on it in the Overview screen.  Zooming in and out without being able to grab the screen to move around was quite frustrating when trying to sort through and arrange the various nodes. While the software works well for creating networks and generating visualizations, some improvements in these areas would make it more user-friendly and easier to explore the data.