Social networks have become increasingly popular in the past decade. Mobile applications and websites such as Instagram, Twitter, and Facebook are becoming the primary means of communication for both individuals and businesses alike. But what can these social networks tell us about our communication patterns and social structures? By analyzing network data, we can see how different users (or “nodes”) are connected. A connection between nodes can be anything from a “friendship”, a mention, a comment, or a private message. We call these connections edges, and they can provide valuable insight into how information passes through society and different social groups.
In this lab, I looked at data that was gathered from Twitter. The nodes represent distinct Twitter users, and the edges represent interactions between users – ie, mentions, retweets, or follows. By analyzing the data with tools in Gephi, I was able to identify a few distinct social groups. The visualization revealed how certain groups seem to be more connected to each other than to the greater Twitter universe. Using this visualization, we can start to make predictions about how information is shared and/or consumed within these groups, and how the concept of an ‘information echo chamber” can come to exist.
Before starting this project, I watched a youtube tutorial on how to visualize communities. The video helped me to understand how to use some of the available tools in Gephi, as well as gave me some ideas for how to present my data (picture of the example visualization from the video is below). Perhaps the main takeaway from the video was to make good color choices for the different communities, or social circles, represented in the visualization. While Gephi generates random color schemes when you partition the data by modularity class, sometimes a few color adjustments to the will make the data more visually appealing.
I created my visualization using Gephi, a network data analysis software that can be downloaded for free. The data that I used was gathered from Twitter and made publicly available by Stanford University.
I began creating this visualization by first downloading my dataset and saving it in a CSV format. I briefly looked over the data in a plain text editor to make sure it was normalized and formatted properly. After confirming that my data was in good shape, I then imported it into Gephi to start the visualization.
Upon importing the data, the initial visualization is a hairball of nodes and edges. My first step to achieving some visual clarity was to re-size the nodes based on degree (ie, how many connections they had). Using the tools on the left-hand side of the screen, I made nodes with higher degrees of connectivity appear larger, and nodes with lower degrees appear smaller. Pictured below is the tool and settings that I used to achieve this.
My next step was to analyze the network to see if there were certain nodes that were more closely connected to each other. In other words, I wanted to see if there were distinct social circles within my network that communicated with each other more than with other nodes on the network. To do this, I found the Statistics tab on the right-hand side of the screen and ran the Modularity process (pictured below). This found a handful of Twitter communities within my network.
After identifying my Twitter communities, I then wanted to visually separate them by color. To do this, I used the partitioning feature (pictured below). When I ran this, Gephi randomly selected a palette of colors to assign to each module. However, I made a few adjustments to the colors so that they would appear more distinctly on the visualization.
My next step was to eliminate nodes with low degrees of connectivity in order to “clean up” the visualization and only show the nodes with higher degrees. I accomplished this by using the filter on the right-hand side of the screen, and selecting the In-Degree Range option in the Topology category. Pictured below, I found that only displaying nodes with 40+ degrees provided the maximum amount of useful information with a minimal amount of visual clutter.
Finally, I ran the ForceAtlas2 Layout on my visualization using mostly out-of-the-box settings. I made a handful of adjustments to some of the visual aspects of the image, such as labeling the nodes and adjusting the opacity of both the nodes and edges before saving it as a PDF. Pictured below is the final product.
The biggest challenge in this lab was learning the software. Gephi is an extremely powerful program with lots of features that I would like to explore further. Many of the features in Gephi lend themselves to different types of network datasets. Given that social media data represents only one type of network, I would really like to try more visualizations in the future with different data to see the full extent of what Gephi is capable of.