How to Win Friends and Influence People: 1880-1881 German Schoolboys Edition


Networks, Visualization

One of the earliest studies in the field of social network analysis was in the year of 1880-1881, where German teacher, Johannes Delitsch, collected the socio-relational data of a class of 53 students. This study is particularly significant for providing “insight into the process of friendship formation in school classes during an interesting time where western [civilization was] establishing [its] public education sector” and “how the school class can be described in regard to transitivity, reciprocity, clique behavior, and stratification” (Heidler et al., 2014). In the 19th century, schooling became a governmental concern and educational objectives were directed at creating a national consciousness. This historical context inevitably highlights the added value of relational reciprocity or asymmetry in Western schools, including Delitsch’s group of pupils.


OBJECTIVE

Highlight aspects of the data that may influence variation between students’ popularity (or lack thereof).

INSPIRATION

I had originally planned on creating a network visualization that color-coded nodes based on certain attributes with the person’s name as the node label (see visualizations below).

In addition to relationships and names, the German boys’ relationship dataset also identifies the direction of relationships (reciprocal and non-reciprocal relationships between students), based on Delitsch’s deductions from observation, interviewing pupils and parents, and analyzing school essays during the 1880-81 school year.

MATERIALS USED

Dataset: From Gephi’s list of sample datasets, Class of 1880/81

Software: Gephi 0.9.2
Although Gephi is still in beta and known to have glitches/some crashing, I didn’t have any problems when working this dataset. However, previous to this dataset I attempted working with the Marvel Social Network (which is substantially larger) and encountered consistent glitches and long loading times. You can also download that dataset from the Gephi GitHub.

PROCESS

My goal was to find and analyze the varying degrees of relationships across the Germany boys’ class based on the attributes measured by Delitsch (i.e. handicapped, class ranking, sweetsgiver, and repeater).

For clarification, “sweetsgiver” refers to a student who ‘buys into’ friendships with sweets and money. “Repeater” refers to a student who is repeating the class/grade.

Nodes
I chose to set the scale of the nodes based on how many relationships Delitsch identified for the student. Figure 1 shows the visualization without any other attributes. Originally, I used the total number of relationships as unit for scale, but I also created visualizations using “in-degree relationships” for scale (in-degree meaning a specific, singular direction of friendship a.k.a. one student seeing another student as his friend). After I noticed the nodes were better dispersed yet more clustered, I used the in-degree values for sizing the node scale since it revealed the dynamic of the relationships more clearly.

This made sense especially since the in-degree value refers to the quantity of incoming relationships a student has, it can be likened more to popularity than the total number of relationships one has. A student could have a large number of relationships, but if only one of them is incoming (another student considers them a friend) then it does not adequately signify popularity. Notice the huge flip in node sizing when I adjust the scale using only the number of outgoing (out-degree) relationships (Figure 3).

In order to focus on distinct attributes, I decided on making small multiples of the networks. I used various node colors in order to convey the four recorded attributes, then node size as a rough reference for the amount of relationships each student had.

Edges
The edges are a mid-gray so as to not distract from the varying node colors, however this also made the directional arrows difficult to see. The arrows are supposed to clarify which direction the relationship is oriented (one-sided or mutual). This flaw is somewhat accommodated by the proportional node sizing.

INTERPRETATION OF RESULTS

Popularity
When observing the in-degree networks, we can see that Pfeil and Vetter have the largest nodes, with several other students (Schnabel, R. Schubert, Lasch, Schlegel, and Wolf) trailing behind.

Pfeil and Vetter are both repeater students, but are not the highest ranked in the class. Whereas Schlegel is ranked number 1, but his node size is noticeably smaller than theirs, the other repeaters, and Lasch. The majority of highly ranked students have low quantities of in-degree relationships, with the exception of Schlegel and Wolf . It would be interesting to dive in further as to why these two are outliers in that area. Lasch’s node is quite large in scale despite not being very highly ranked or one of the repeaters. In Figure 6 we can see that probable cause is his sweetgiving habit, which is mentioned in Delitsch’s study. “Delitsch labels him the ‘sweets giver’, because he generously ‘buys into’ friendships with sweet and money from his grandmother, who sells sweets at fairs” (Heidler et al., 2014). Unfortunately, many of the handicapped students are extremely small in scale, with one student having 0 in-degree relationships (can be found, with much difficulty, at the very bottom left of the visualization and connected to Meinhold). Although, it’s quite interesting to see the stark contrast in node size between the number of in-degree (Figure 2) vs. out-degree (Figure 3) relationships these handicapped students have.

Cliques
The proximity of nodes to one another helps identify what groups of students tend to gravitate towards one another. Looking at Figure 4 and 5 reveals that highly ranked students tend to mingle together, as well as the repeaters amongst themselves.

REFLECTION

As I mentioned in my Methods, the arrows were difficult to see and I’m curious if a radial diagram would have better articulated the networks between students. I imagine reciprocity might be harder to discern, but that wasn’t something I focused on and connections could be more easily seen/less convoluted by the nodes and labels overlapping each other. There was also the option to make edge colors the same color as the parent node, however these varying colors became distracting are hard to extract information from. This was a result of the node/edges arrangement, whereas in a radial diagram, all the edges would be located on the perimeter. 

I realized at the end of my project that I could have added a column into the dataset combining handicapped, sweetsgiver, and repeater attributes since they do not apply to all 53 students and never overlap. This would have removed the need for a series of small multiples.

Additionally, the color choice I used for denoting class ranking can be confusing. The idea was to go from bright blue to dark blue to gray. I chose gray to signify the lowest ranked students so their nodes wouldn’t show up as clearly as the higher ranked students, however the bright blue doesn’t automatically make me think the nodes are of higher rank than the darker blue or blue-gray nodes. The varying node size also makes it difficult to distinguish certain colors from one another (or see them at all in the case of the handicapped visualization). You have to zoom into the visualizations to better evaluate the colors. This could potentially be alleviated by including a legend on the ranking visualization.


BIBLIOGRAPHY

Heidler, R., Gamper, M., Herz, A., & Eßer, F. (2014). Relationship patterns in the 19th century: The friendship network in a German boys’ school class from 1880 to 1881 revisited. Social Networks, 37, 1–13. https://doi.org/10.1016/j.socnet.2013.11.001