Harry Potter and the order of the characters 🪄

Introduction

I first watched Harry Potter in 2004, and it changed my life forever. Even though it’s been a while since 2004, I still like watching films and reading books. Books often have a dense plot that might be difficult for a reader to follow throughout the book. We explore the social networks of characters in cultural works such as books and movies. I wanted to try something different and integrate the characters from Harry Potter with the context of data analysis. Many of the traits of complex networks, such as skewed degree distribution and community structure, are present in these character networks, although they may be of relatively small order with a large number of edges. As a Potterhead who has engaged in many conversations with friends and read books, I thought it’d be an intriguing chance to conduct a Harry Potter character network analysis using Gephi to figure out the central characters that emerge and how their interactions have been with everyone else.

Inspiration

I started my search by looking for materials on-network data visualizations, and one of the sources I could find was Beveridge and Shan, who used network techniques to create a social network from A Storm of Swords, the third novel in George R.R. Martin’s A Song of Ice and Fire trilogy. PageRank, proximity, betweenness centrality, and modularity gave an empirical way to evaluate communities and major actors inside the network. The dynamic network for Lord of the Rings was examined, which included data such as character species, making the resulting network extremely fascinating.

Materials

Tools –

Gephi (Windows version 0.9.2) — Open-source network visualization and analysis program. This program was used to produce and examine my network data visualization.
Microsoft Excel is a spreadsheet program developed by Microsoft that is included in the MS Office suite. I used it to inspect and comprehend the dataset CSV files before importing them into Gephi.

Dataset –

The dataset was built by linking two files. The edge weights are proportional to the number of interactions between these characters. The remaining columns in the data were allocated to the characters’ alignment, whether they were good or bad, or simply if they were on Team Harry or Team Voldemort. There are also sections for gender and house assigned! There were a few spots where the data was missing or I thought was incorrect; however, because I have a vast knowledge of the characters, it was pretty easy for me to provide those elements.

Process

Feeding the dataset into Gephi and running some statistics

The initial step in my method was to comprehend the two CSV files that comprised the node table and the edges table. When the dataset made sense to me, I imported the nodes and edges tables into Gephi. The program created a visualization by default, but it did not make sense because all of the nodes were jumbled together. As a result, in order to increase usability, I needed to make certain layout changes as well as visual changes.
The round bubbles of varying sizes are referred to as nodes, while the lines that connect them are referred to as edges, arcs, links, ties, or relations. Edges are the connections between nodes, they can also be weighted, making them heavier or lighter based on the parameters you define for the data. Gephi also calculates the centrality of nodes, with the larger nodes being the most essential to the network.
I obtained an average degree, network diameter, graph density, and modularity data. The average degree tells me how many edges are connected to a node. The network diameter tells me the longest and shortest path between nodes in the graph. Graph density informs me of how near the graph is to completion. Finally, modularity is a community detection technique that displays the various communities that exist in the data.

For example, each node represents a character, and each color represents the house to which they belong. The edges/relationships might then be who they talk to, with a larger emphasis on those who have talked to each other the most times.
Initially, I utilized Gephi’s Force Atlas2 layout and separated the nodes to make it simpler to understand who was talking to whom. It was clear that Harry, Hermione, Ron, and Voldemort are the most prominent characters, chatting with practically everyone, although other characters do not speak with individuals from other houses. These are the four-character that connects all these different houses together. I would add that it is telling to see how many heavily-weighted edges are directed toward Harry compared to the edges directed out from his node—he is talked to more than he talks to others. This could add to a character analysis of Harry if a scholar were to go back to the text and analyze those conversations. This network would also be interesting if it were to add a self-loop, in which an edge coming out of Harry Potter, for example, would loop back in on itself. Since Harry talks to himself often, it would be telling to see these kinds of edges as well. This would necessitate a change in the data set to include character-to-character conversations.

Improving the layout of the visualization

I created a Gephi network using indirect conversations as the edges and characters as the nodes but organized by the house. I used the Yifan hu proportional and Fruchterman Reingold layout because I thought it would produce a particularly interesting graph by using a force-directed model and reducing the complexity. The Ravenclaw and Hufflepuff characters are “obviously vastly outnumbered” by the Gryffindor and Slytherin characters. I would also note that the size of Harry, Ron, and Hermonie’s nodes are fairly similar, which is interesting considering that Harry is much more a central character, being the one Voldermote wants to enact revenge upon. This might also mean that they are more pawns in the conflict between the characters as well.

At this point, I needed to look up how to zoom into the layout and see if there was a method to relocate the nodes that were extremely far out and remote. Because I had been informed that the program has a tendency to crash, I made sure to save every 10 minutes and that all other software remained closed.

Playing around with colors and weights

I created a Gephi network with nodes representing characters, directed edges indicating direct talks, and colors signifying houses (Gryffindor, Hufflepuff, Ravenclaw, Slytherin). For the characters that couldn’t be assigned to one of the Hogwarts houses, like those at Beauxbatons and Durmstrang, I picked their school colors to symbolize them. There are other characters in this category whose background is unclear, who are muggles, house-elves, or animals, they have not been allocated a distinct color. Once the circular algorithm made the layout a lot better, there was still a huge scope for improvement as the visualization was still quite difficult to understand. To remedy this, I performed the Expansion algorithm several times by a scale factor of 2x each, to reach the right point where the nodes were nicely spread out without making the whole visualization too large. Finally, I also ran the Label Adjust algorithm, to make sure there was enough room for the labels of each of the nodes, without any overlap.

Results & Analysis

Gephi networks cannot give us definitive conclusions but can help raise more questions about the text that we are using. Unexpected results can provoke a new question about the text we would have never asked before: for instance, is Sirus more of a central character in Harry Potter than a reader would originally think? If so, how is that done, and how is it hidden in the text and revealed in the network visualization?! A degree centrality assessment of each node was analyzed to explore who the most integral characters were. To help in recognizing which major characters played that specific role and that led to them obtaining greater levels of centrality. As centrality is used to establish how significant specific nodes are, it was important to figure out what exactly is meant by impact in this case. Based on the data, it was found that the character with the highest degree of centrality was one of the main protagonists in the narrative. Harry Potter had the most centrality, followed by Ron Weasley, and Hermione Granger, all of whom were Harry’s best friends, as well as Albus Dumbeldore the headmaster, and then he who must not be named – Voldermort.

Reflection

There were times when I wanted the infographics to just magically appear from waving a wand and casting a spell. Though the transformation of information such as character information, location, plot points, and any other interesting tidbits into numerical data to input into Gephi is time-consuming and tedious, it means that we have control over exactly what data we are collecting and what we really want to see in a 3D visualization. And though the networks cannot give us definitive data since it has been reduced in this way, it is an easier way to see what we cannot just by reading a text. I found it far more engaging to work with a dataset that was concentrated on a subject I was quite knowledgeable about. My network enabled me to notice all of the connections other than the obvious ones depicted in the books and movies.

Potential options for this investigation would be to compare the Harry Potter story to another well-known story to examine the parallels and differences. There is a distinction between how the characters are depicted in the novels and how they are shown in the movie. I believe it would be incredibly intriguing to investigate it and even see how it progresses through the books and movies, which would go a step farther than this effort of gaining an overall picture of the characters’ connections.

Information Visualization

Student work at the School of Information, Pratt Institute