Digital humanities has brought on a wave of network analyses of popular books and movies. When applied to fiction, this can show fans how the characters they love connect and cluster together. I decided to visualize characters in the Lord of the Rings, one of my favorite bodies of work, using data from the books assembled by José Calvo Tello. Being more familiar with the movies than the books, I was curious to see how the character network aligned with my expectations.
I first explored other analyses of characters in works of fiction, including Marvel superheroes and Star Wars characters. The analysis of Star Wars by Evelina Gabasova was helpful in showing the value of a network analysis for a story with a manageable number of characters, although I found myself wishing for more labeling of the minor characters in the network visualization:
I then looked at this analysis of Lord of the Rings by Alon Cohen. The data represents links in the Lord of the Rings wiki (each edge is a link from one character’s wiki page to another):
While valuable, I was more interested in how the characters connect in the source text. I then found the Lord of the Rings analysis by José Calvo Tello, which used text analysis to create tables of characters (nodes) and connections when two characters appear in the same paragraph (edges). The nodes table also includes data such as character species, which makes the resulting network particularly interesting:
To visualize the data, I used Gephi, a popular and powerful software for network analysis and visualization. The data was provided as a nodes table and edges table, and after a minor edit to the column headers, it imported to Gephi easily.
First, I added labels:
This showed that the nodes included more than just characters. I then filtered to only nodes with type “person”, omitting groups and places:
I then experimented with layout, and found that I needed to adjust the repulsion and attraction strength:
I eventually settled on the ForceAtlas layout, with repulsion and attraction set to space the characters in a readable format:
I then experimented with size, opting to size by the frequency of that character’s appearances, with a minimum size so the smallest nodes would be visible. Finally, I applied color, experimenting with modularity and the attributes in the nodes table, and added a thematic font for the labels.
This resulted in three graphs. First, the character network using a single color:
This clearly shows the main characters as the largest and most central nodes, with Frodo as the largest. The nine characters of the Fellowship (Frodo, Sam, Merry, Pippin, Gandalf, Aragorn, Boromir, Legolas, Gimli) are generally the largest and most central, although others like Bilbo, Elrond, Sauron, and Saruman are also near the center.
The average degree is 20.9 (meaning each character is connected to 21 characters on average). Frodo, Gandalf, and Aragorn have the highest degree (38 each), and highest closeness centrality (0.91). Frodo has by far the highest betweenness centrality (84.7) followed by Aragorn (44.8).
We also see minor characters clustered together as we would expect, such as the dwarven leaders in the bottom right and the leaders of Rohan in the top right.
When we color by species, as the original author did, we see that characters generally connect most closely with characters of the same species, aligning with how Middle Earth society is structured:
And the hobbits (purple) are at the center, while the humans (green) and dwarves (orange) form distinct communities and the elves (blue) are more intermingled. The closest relationship between a dwarf and elf is Gimli and Legolas, as we would expect. We also see that there is only one orc character included (Gorbag).
Coloring by gender also provides insights:
We see how few female (green) characters there are, as we would expect, and that they are generally minor characters. This does not correspond to the character’s importance in the world: Galadriel, one of the most powerful beings in Middle Earth, is one of the smallest nodes.
These graphs generally align with what one might guess based on knowledge of the movies or books. As someone more familiar with the movies, they contain no major surprises, but show how many characters were cut out in the film adaptation.
One major limitation is that the data only includes co-appearances of character names in the same paragraph. While the author has accounted for different versions of names, there is no accounting for pronouns or other times when a character is referred to without its name (e.g. “the wizard” instead of “Gandalf”). This could bias the data against the more frequent characters which might be more likely to be referenced this way. Another limitation is using paragraphs as the unit of analysis. An interesting direction for future analysis would be to use the movies’ scripts to see how often characters reference one another.