Introduction
For this lab, I wanted to use Gephi to create a visualization of a social network. To do this, I used a dataset from a study that looked at the social relationships of a class of German schoolboys from 1880 to 1881. This socio-relational data was originally collected by the boys’ teacher, Johannes Delitsch and is considered one of – if not the earliest studies on social networks (Heidler, Gamper, Herz, & Eßer, 2014). The data contains directional information about the social relationships between 53 students. In terms of network data, this set contains 53 nodes, 179 edges, and is a directed graph.
Inspiration
Originally, I was unsure about what kind of network I wanted to create but after seeing the following visualization of the relationships/affairs of Zeus, I was interested in trying to create a social relationship visualization of my own. The inner circle is supposed to represent Zeus and his lovers, while the colored lines connect the lovers to their children. The colors represent the classical sources in Greek mythology that wrote about those familial links.
I really liked this visualization for both its aesthetics and because of the variety of information it provides. Not only can we learn more about the complex relationships depicted in Greek myths, but we learn which authors wrote about who and when they were being written about. This gives us an idea of the overall topic trends across time and author.
Materials
- Gephi Wiki: I used his to explore different kinds of network data. Ultimately, I chose to use the “Class of 1880/81” social network dataset.
- Relationships in the 19th century (article): This article by Heidler, Gamper, Herz, & Eßer (2014) helped me to better understand the labels used in the dataset, gave me more context about how and why the data was collected, and some ways to interpret the results.
- Gephi: I used this open-source network visualization software to create my social network visuals for this lab.
Methods
To start, I opened my GEXF file of the dataset in Gephi and ran a few statistical analyses including the following: average degree, network diameter, graph density, modularity, average clustering coefficient, and the average path length. Then I began exploring the different layouts and appearance features that I could apply to create my visualizations. Now I will go into more detail about how I created each visualization specifically. Starting with this first visualization: To create this visualization I used the Fruchterman Reingold layout and then ran the contraction layout three times to condense the graph. The size of the nodes is set to reflect the degree (overall number of connections), with larger circles having a higher degree and smaller having a lower degree. I set the range of the size to be 1 to 40, this was the widest range I could use without the circles becoming overly large and crowding out the direction lines and each other. The colors reflect the modular classes (or communities) detected by Gephi.
Next, I created this visualization:
For this visualization I used the circle layout and ran the expansion layout 3 times to spread out the nodes. Similarly to the previous visualization the size of the nodes is dependent upon the degree and the color corresponds to the modular class. Additionally, in this visualization the nodes are ordered, counter-clockwise by modular class around the edge of the circle.
Results
Starting with the statistics I ran to begin:
- Average Degree: 3.377
- Network Diameter: 8
- Graph Density: 0.065
- Modularity: 0.287
- Average Clustering Coefficient: 0.147
- Average Path Length: 3.382
Overall, this dataset represents a fairly well-connected network. The average path length is relatively short, which tells us that each student is closely connected (even indirectly) to every other student. It is also apparent that there are 5 distinct modular classes (or cliques) present in this group of schoolboys. The first thing that stands out in these visualizations is the idea of reciprocity among the students’ relationships. This is to say that it is often the case that if student A indicates they are friends with student B, student B will indicate they are friends with student A. It is only in a few of the outlying students (best seen in visualization 1) that there seems to only be a one-directional relationship and in only 4 students do we see no relationships indicated.
The other interesting point that I came across while creating these visualizations is that despite having the highest degree (or most connections), Pfeil wasn’t part of the largest modular class (or friend clique). This is most apparent when viewing visualization 2. Upon further review, it appears that Pfeil falls into that particular clique because not only is he well connected towards the other members (i.e. he indicates that he is their friend), the members are also connected towards him (i.e. they indicate that they are his friend as well). This demonstrates that in some cases reciprocity is more influential or important than degree.
Reflection
To be honest, I did not enjoy using Gephi for this lab. It is difficult to work with, not only because there is no undo feature (which is insane), but I was often confused about what the different layouts meant. The major-set back of power loss and loosing most of my previous work while working on my visualizations was very frustrating and despite having the file for the visualization below, I couldn’t figure out the steps I had taken to get there due to a lack of an undo option and also I losing the work I had done prior to this visual. So, despite having this more interesting visual I couldn’t exactly use it as a part of my methods/results report. Next time I’ll be more vigilant about taking notes when working with Gephi.
Reference:
Heidler, R., Gamper, M., Herz, A., & Eßer, F. (2014). Relationship patterns in the 19th century: The friendship network in a German boys’ school class from 1880 to 1881 revisited. Social Networks, 37, 1-13.