A Musical Network Analysis of Les Mis



Les Miserables is a 19th-century novel about a series of characters reckoning with the meaning of law and justice in 19th-century France. It was later adapted into a highly-acclaimed musical opera that holds the distinction of being the second longest-running musical in the London West End Theatre. The work is meaningful to me, as I had the distinction of playing Gang Member #3 in a community production of Les Miserables when I was 18. In addition to being a part of the cast, I also had the opportunity to assist with the musical direction of the production and became intimately familiar with the musical score.

A unique characteristic of both the Les Mis novel and musical is its large cast with high levels of interactivity across characters. It boasts multiple settings, time periods and plot lines that lead to a result in a complex and interconnected story. In an effort to become comfortable with network analysis, I decided to conduct a two-pronged network analysis of the characters of Les Mis to understand how characters interact both in the book as well as musically.


After scouring through multiple network analytics reports, I was inspired by the MAA’s ‘Network of Thrones’ social network analytics report on Game of Thrones (GoT). The designer focuses on the book A Storm of Swords by George R.R. Martin because the characters are scattered throughout the world and find themselves in distinct clusters given the storytelling style. Nonetheless, you still get a fulsome picture of a magical universe and how its character relationships impact the story.

Additionally, a former classmates Les Miserables network analysis inspired me – they used a very similar dataset and made a compelling network visualization of all of the characters in the book focused on character co-appearance. It did not capture any of the musical aspects of the work, which I hope to represent in my analysis.



Dataset Preparation

After searching through a variety of network datasets (including multiples Les Mis-specific datasets), I settled on a dataset from GitHub that I found was organized in a tidy format and easy to download. Two files were downloaded: a complete node file with all of the character information, and an edge file indicating the character relationships and weight (degree of interaction).

Complete LesMis node and edge file from GitHub

Before pulling my source files into Gephi I added an additional column to my nodes .csv file – voice part. Using the Les Mis Casting Breakdown document, I was able to tag these characters with the appropriate voice part based on the indicated range. (For the purpose of this exercise, I used five basic choral voice parts: Soprano, Alto, Tenor, Baritone and Bass).

Added Voice Part column to node file in Microsoft Excel

Voice part served two purposes. First, it identified the type of voice that each Les Miserables character possessed as an additional piece of metadata to use for analysis The secondary impact of this voice tagging was identifying those characters for which no voice part was listed or observed (155 out of 181). This suggests that they were either not included in the musical, or did not have a singing role. (Fun Fact: Based on this dataset, only 14% of Les Mis characters have featured singing roles within the musical). With the addition of a very simple column, the Les Mis character network now contained a musical overlay that enabled an additional lens of network analysis.

Fun Fact: Based on this dataset, only 14% of Les Miserables characters from the novel have featured singing roles within the musical.

Gephi Import

Once the dataset was fully prepared, I imported it into Gephi. This was a relatively simple process as both my node and edge files were cleanly formatted .csv documents, which Gephi is able to natively import. While performing the import of my edge file, I indicated that the edges were undirected as direction was not relevant for character relationships. Although the import was straightforward, the resulting visualization was effectively unusable and required significant design uplift.

Auto-generated Gephi Network Visualization of Les Mis Dataset

Design Uplift

I made multiple changes to the network visualization in order to make it both aesthetically pleasing and visually informative.

  • Layout: For my primary network graph, I selected a Fruchterman Reingold layout as it made very clear which characters were closely related to each other (strong weight) as well as those central to the story (or musical)
  • Color: Given my focus on music as the key variable of interest, I used color to indicate the 5 voice parts. Black indicated a null value for voice part, meaning that the character (or node) was not featured in the musical.
  • Sizing: I edited both the label sizes as well as the overall network size to make the visual both legible and representative of the vastness of the network
  • Filters / Partitions: I created filters for each of the voice parts as well as a singing / non-singing filter in order to look at different variations of the network based on my musical variable.


Upon creating my network and running some network analytics, I identified multiple interesting musical findings.

  1. Jean Valjean is unequivocally the most important node in the network for both the book as well as the musical. We can see in the maps below that Jean Valjean always carries the highest degree in the network and strongest relationships. He is the protagonist of the story and often the narrative thread that links together the multiple plot lines within the work.

  2. From a musical perspective, Les Mis is a male-dominated show. This holds true for both # of nodes as well as degree. Only 19% (5 / 26) of characters are women (soprano/alto). The average degree for characters with male voice parts is 3.6x as high as that for female voice parts. (Average degree for Soprano / Alto = 2.4 | Average degree for Baritone / Bass / Tenor = 8.6)

  3. Tenors and Basses tend to hold more important roles (higher degree and more central) compared to baritones. Although Baritone is the most common voice part, these roles tend to be of lower importance and cluster together. In the musical, the baritones are mostly ensemble members that are either revolutionaries (the left blue cluster pictured below) or gang members (the bottom blue cluster pictured below).

  4. The musical characters literally run the show – removing them from the network creates a hole at its center. Upon filtering out the musical characters from the Les Mis network, a hole forms at the center of the network. From a statistical perspective, the non-musical network had a diameter of 5 and a density of .01. Including the musical characters results in a network with a diameter of 7 and a density of .03.


There are additional musical and non-musical data points that could be captured in the dataset to make for a more fulsome network analysis. Character gender would be a valuable addition as we can observe that there are ~4x as many singing roles with male voice parts  (tenor / baritone / bass) compared to female voice parts (alto / soprano). This data suggests that musically, Les Mis is a male dominated show. If we were to conduct a breakdown by gender of all 181 characters in the novel, I am curious if we’d see the same trend of male dominance in the text. This data point would also enable us to see if network clusters tend to be cross-gender or single gender.  

Additionally, it would be interesting to create a Les Mis musical-specific edge file based on stage co- appearance by scene or co-singing. The musical network map may mirror that of the novel or manifest differently based on character prominence, specifically in regards to weight. A great example of this would be the characters of Gavroche and Thenardier. In the book, Gavroche is the oldest son of Thenardier and they have multiple interactions amidst a strained relationship. In the musical however, Gavroche’s parentage is unexplored and he is a street urchin who roams the slums of paris. They have very little (if any) meaningful stage interaction. The weight of their edge would diminish significantly, if not disappear outright were we to use a music-based edge file.

From a more tactical perspective, if I were to repeat this sort of network analysis I would pay more attention to my usage of the visual attributes (color / label / thickness) as well as my dataset size. I noticed that another classmate had chosen a color scheme based off of the musical poster, a clever nod to the origins of the source content. I mimicked this for my pie chart but likely should have used a similar approach throughout the graph. Specifically for the voice parts, having the male / female voice parts with distinct hues but varying shades for the subgroups (e.g., soprano / alto) would have made the chart more informative. In terms of dataset size, had I removed some of the low degree characters I might have ended up with a map that was more visually digestible and made clusters more apparent. While this large dataset was the most authentic to the text, it may have led to the inflation of degree for certain nodes and led me to focus on clusters that were not as interesting as I initially perceived. All in all, this was a fascinating exercise that further expanded my appreciation for the great work that is Les Mis.