Lab Reports, Networks


When a Facebook page for a business, organization, or public figure ‘likes’ the page of another business, that business establishes a connection that all Facebook users are able to see. A mutual like between two organizational pages is not only a sign of support between the organizations, but is also mutually beneficial for extending each page’s reach by expanding their audience to the ‘Fans’ of the connected organization. With this understanding, I set out to explore the social network of Facebook Page-to-Page connections for my first network analysis and visualization project.

Process & Materials

To begin, I sourced a social network dataset from SNAP, the Stanford Network Analysis Project. The Facebook Large Page-Page Network dataset is an undirected network of 22,470 nodes and 171,002 edges representing the page-to-page mutual likes as of November 2017 and restricted to include the four categories of pages as defined by Facebook: politicians, governmental organizations, television shows, and companies. The .CSV files included in the download came ready for analysis, with a spreadsheet for edges and a nodes spreadsheet for nodes that included the page name and page type.

With limited experience using Gephi or conducting exploratory analysis with a large social media dataset, I relied heavily on Par Martin Grandjean’s Introduction to Network Analysis and Visualization Guide. I further explored examples linked on his site like these visualizations from Twitter or League of Nations, offering both inspiration and rationale for shaping my graph.

Drawing upon these examples and our course readings, I began my exploratory analysis of the Facebook Page-to-Page Network dataset asking the following questions of the data:

  • What pages, or nodes, have the most connections?
  • What clusters, or communities, exist in the dataset and how, if at all, are these communities related to the category of Facebook page?
  • What pages are the most central?

Guided by these questions, I worked to untangle the dense ‘hairball.’ First, I ranked the size of the nodes by degree, which ranged from 1 to 709, then applied the Fruchterm Reingold layout algorithm to untangle the dense mesh and start the process of identifying communities within the data. When unable to identify any movements in the structure of the dense hairball, I decided to filter out all nodes with less than 70 connections. I then applied the force directed layouts ‘Force Atlas’ and ‘Force Atlas 2,’ which made visible the network’s clusters and structural holes.

Applying the concept of the strength of weak ties, or the power of key nodes to connect communities otherwise separated by structural holes, I ran the Network Diameter and Modularity statistics to highlight the key players and the communities they connect. By ranking the node sizes by ‘Betweenness Centrality’ and partitioning node color by ‘Modularity Class,’ I was able to identify that the nodes with the highest centrality are located at the periphery of their communities and serve as bridges between to neighboring clusters (Figure 1).

Next, to explore how page category interacts with the communities identified by the modularity report, I partitioned the node colors by ‘attribute,’ illustrating the pattern of clustering by page type (Figure 2). I then added labels to the nodes and set out to study the points of polarization and centrality in more detail in order to determine which color partition to apply in the final version.

Scanning the graph in detail, I identified that the points of polarization are separated by region, with the exception of the top left outlying cluster of TV Show nodes. The most densely knit community on the lower left are governmental pages for the branches of the American military, tightly connected to pages for individual army corps. Outside this big cluster, the majority of clusters can be grouped by region including Australia, Brazil, Germany, and Canada, all are linked to the larger network by a central node, which is most often a governmental organization or politician.

I therefore concluded that the roles of these central nodes are the most important to emphasize in my network visualization. To illustrate this point, I used the Gephi ‘Brush’ feature to subtlety highlight the ‘Neighbors,’ or first-degree connections, of the Barack Obama node, which serves as a crucial bridge in the network graph indicated by the node’s high degree of centrality while surrounded by structural holes (Figure 3).

Results & Next Steps

By highlighting the first-degree connections of the Barak Obama node yellow, Barak Obama’s popularity and reach within the larger network is easily seen amongst politicians and governmental organizations. A ‘neighbor’ with a high degree of centrality is the governmental page for “The Obama White House,” while the page for “The White House” maintains a purple hue signifying a lack of mutual connection (Figure 3).

When this data was pulled from Facebook in November of 2017, former President Barack Obama had been succeeded by the divisive and polarizing presidency of of Donald J. Trump. However, Obama remains very popular, especially abroad, so it could be easy to understand why Facebook pages of governmental agencies and politicians were still connected to the pages for Barack Obama and his administration in 2017. Furthermore, being succeeded by a Republican administration controlling the Facebook page of the White House could explain the lack of mutual connection. Curiously enough, however, upon searching the dataset after finishing my visualization, I could not find a node for Donald J. Trump. I am not sure how this happened, whether Donald J. Trump is missing for lack of any ‘mutual likes’ or if his profile was still defined as ‘entertainment’ rather than ‘politician’ in 2017. Despite the missing data point, I think the visualization effectively highlights the key role of central node to bridge communities in a large social network just as a U.S. President would bridge governments across the globe.

 Although, I am satisfied overall with my decision to keep the node color partition by attribute instead of community, I still think the visualization could benefit from some visual cues for identifying the communities beyond their clusters. I was disappointed that I was unable to partition the color of the edges according to community while coloring the nodes by page type. However, a potential solution for helping identify communities could be annotation, a feature which is not offered yet in Gephi. If I was able to expand upon this project more, I would include a title, legend, caption build out in Adobe Suite in lieu of the feature missing from Gephi.

Explore the final interpretation of Facebook Page-to-Page Network visualization below (Figure 4):