Friendship Network in a German boys’ school class from 1880 to 1881, Directed Network Graph


Lab Reports, Networks

Introduction

For this lab, I selected a dataset linked in the Gephi Wik, hosted on Zenodo (Heidler et al., 2014a), a friendship network from a German boys school collected during the school year from 1880 to 1881 by school teacher Johannes Delitsch. I selected this mainly because I wanted to do a social-based network analysis for this lab, and looking into the sources for this data, I was intrigued by how early an example of social network study this data was. 

Discussed in-depth in a 2013  article by Heidler, Gamper, Herz, and Eßer, Relationship patterns in the 19th century: The friendship network in a German boys’ school class from 1880 to 1881 revisited, this data set gave me an opportunity to work with materials that have been studied at length, allowing me to inform my design choices by reviewing previous visualizations and analysis of the scholarly findings.

 Described in the article as one of the earliest, if not the earliest, social network analysis, Delitsch’s study occurred before many of the theoretical underpinnings of network analysis were created. Delitsch harvested data using his fourth grade class, at a time when Western countries were beginning to create the infrastructure for public education. Previously an in-home instructor, Delitsch was new to instructing in such a large classroom setting (his class had 53 pupils), and took the opportunity to study the social networking of his students. Delitsch gathered data using “observation, interviewing pupils and parents, and analyzing school essays during one school year (1880-1881)” (Heidler et al., 2014b). The result was a network-matrix, illustrated below and originally published in 1900, whose rich data on directed network relationships can serve for contemporary network analysis study. 

Original network matrix, created by Johannes Delitsch. (Heidler et al., 2014b)

Inspirations

The article housing the data for my visualization included an example of a visualization using this data done by the authors of the article. The visualization employed a number of tactics: no use of color beyond a grayscale, very limited, if any, size differentiation between the nodes to show an individual’s influence in the network, and the use of gentler gravity within the visualization to allow the pupils’ nodes to be spread out and allow viewers to better read labels and track interactions. 

Visualization included in article Relationship patterns in the 19th century: The friendship network in a German boys’ school class from 1880 to 1881 revisited. (Heidler et al., 2014b)

I felt that some of these tactics worked well, such as the use of negative space between the nodes to allow for better comprehension by the viewer, but some were not as successful at a glance. I felt the use of a legend to show statuses, rather than direct labeling and other direct visual effects, meant that it took longer for a viewer to parse the most influential members of the network. 

Going into this lab, I also took a look at Cambridge Intelligence’s article on Force-directed graph layouts, by Andrew Disney (Disney & February 2021, 2021), which gave me an idea of how I wanted my visualization to appear. My goal was to use color and scale to create a more visually interesting and graphical approach to my visualization.

Example of a Force-directed visualization from Cambridge Intelligence. (Disney & February 2021, 2021)

Materials

Since my dataset was downloaded as a .gephi file, I did not need to do any additional manipulation to the data before working with it. To create my visualizations, I used the open source program Gephi (https://gephi.org/), which is available as a free download. Originally developed as a prototype in 2006, then titled, Graphiltre, the program was created to fill a void in the field of sociology research at the time, where most network visualization programs were proprietary and expensive. Creator Mathieu Jacomy envisioned the program as a “graph-dedicated Photoshop” (Heymann, 2010) that did not prioritize the use of programming scripts. 

Methodology and Process

Before starting work with Gephi, I first researched what layouts might be most appropriate to utilize for this visualization. Looking through Gephi’s tutorial for layouts (Tutorial Layouts, n.d.), I identified two layouts I wanted to work with: Yifan Hu and ForceAtlas 2. Both seemed like good beginner-level layouts to create force-directed visualizations. 

Once I loaded my dataset into Gephi, I first did a couple of statistics calculations to better understand the data. The Average Degree was 3.377, the Network Diameter was 8, and the Graph Density was .065. This is a directed graph, rather than undirected, and overall the density seemed low to me compared to standard statistics for other social networks. 

Having read the article that studied the dataset, I knew that there were four key influencers within the network: Pfeil, Vetter, Schnabel, and R. Schubert. Many of the other pupils had similar standing within the network, but I knew that these four and their connections would stand out, and I wanted to capitalize on this to allow viewers to get that takeaway at a glance. 

To do so, I set the size of the nodes to be determined by degree, or the number of edges connected to the node. To allow the size to be dramatically visible, I set the range from 1-30, or from 1-40, depending on the layout used. I also used a color scheme, in this case a preset in Gephi using varying shades of purple, to again highlight the most influential nodes/pupils. The deepest and most saturated nodes in color had the most connections, while the lighter, less saturated nodes had fewer. 

Within the network, there were also some pupils that had no connections/edges. I chose to use Gephi’s filter function to remove these pupils from my visualization. To do so, I added a filter for Range (Degree) and filtered out any nodes with zero edges connected, setting the range from 1-26. 

To continue honing my visualization, I then moved over to the preview window. First, I set the edges to not be curved, as my network was directed, and the arrows indicating direction only show when the edges are straight. I increased the size of the arrows slightly, and then set the color of the edges to reference the parent node, to further drive home the direction of each edge connection. I created versions with a couple of different aesthetic decisions, to try to determine which method felt most successful. In the end, I felt that scaling the node labels, and setting their color to match the parent node, created the most satisfying result for me. That said, it was certainly a more graphic approach, so I’ve also included renderings here where the labels are unscaled, and also where the labels are consistent in color. 

I repeated these steps across both the Yifan Hu and ForceAtlas 2 layouts. My favorite ended up being a ForceAtlas 2, but I feel both layouts communicate the network successfully. To spread out the nodes more, I decreased gravity and increased the scaling in ForceAtlas 2, and in Yifan Hu I increased the optimal distance and step size. Without these steps, the graphs were much too dense to parse many of the connections, and once the nodes were better distributed the labels and edges were easier to discern. 

Results and Reflection

Versions 1 and 2 of my visualization using the ForceAtlas 2 layout. On the left, the labels are all black and are not set to scale to the size of the node. On the right, the labels are set to reflect the color of the parent node, and are scaled to degree of the node.
Versions 1 and 2 of my visualization using the layout Yifan Hu. On the left, the labels are all black and are not set to scale to the size of the node. On the right, the labels are set to reflect the color of the parent node, and are scaled to degree of the node. I felt the layout of this was more difficult to read based on the central cluster and outliers, so the ForceAtlas 2 felt more successful overall.

My final result was a visualization depicting the degrees of the nodes/pupils within this network, highlighting the most influential peers using scale and color. I chose not to include some of the filters used in the article’s visualizations that aimed to explain why some pupils had fewer/more connections. Namely, I did not mark the pupils identified as having disabilities or illnesses, and did not highlight the pupil who was identified as “bribing” peers with sweets. If I worked with this dataset more in the future, it might be interesting to see how further groupings of the peers could continue to change the way a visualization would look, but I wanted to rely more on the statistics than the didactic aspects collected by Delitsch, as I felt those aspects were more suited to textual interpretation as they require more context to be best understood. 

I was happy with my result in creating a force-directed visualization for a directed graph of these social relationships. Given that this dataset was quite straightforward, smaller in scale, and limited in data collected due to the date of the original study, it made me interested in working with larger and more complex data sets in the future, that would be more appropriate for further filtering and grouping, which could flex more of the power of Gephi as a visualization tool. 

My preferred final version of the visualization.

Sources Used

Disney, A., & February 2021, 24th. (2021, February 24). Force-directed graph layouts explained. Cambridge Intelligence. https://cambridge-intelligence.com/keylines-faq-force-directed-layouts/

Heidler, R., Gamper, M., Herz, A., & Eßer, F. (2014a). Dataset for the article “Relationship patterns in the 19th century: The friendship network in a German boys’ school class from 1880 to 1881 revisited” by Heidler, Gamper, Herzd & Eßer (https://doi.org/10.1016/j.socnet.2013.11.001) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4612153

Heidler, R., Gamper, M., Herz, A., & Eßer, F. (2014b). Relationship patterns in the 19th century: The friendship network in a German boys’ school class from 1880 to 1881 revisited. Social Networks, 37, 1–13. https://doi.org/10.1016/j.socnet.2013.11.001

Heymann, S. (2010, February 1). Gephi initiator interview: How “Semiotics matter.” Gephi Blog. https://gephi.wordpress.com/2010/02/01/gephi-initiator-interview-how-semiotics-matter/Tutorial Layouts. (n.d.). Retrieved April 7, 2022, from https://gephi.org/users/tutorial-layouts/