While browsing different datasets for this assignment, I knew that I wanted to focus on the social network and to investigate the relationships between people. I have found the dataset of the Les Miserables novel characters and their co-appearances on Gephi wiki website. It contained 77 nodes (here: characters) and 254 edges (relations between them). The topic was interesting for me – I like French literature, but I have never read this particular novel. I thought, that while working on this assignment, I can combine pleasure with the possibility to learn something new. I wanted to investigate the general relationships between characters with a focus on Fantine and Cozette (mother and daughter).


As Les Miserables dataset is very popular, I found out that definitely, I will not be the first one working on it. I liked in particular work of other Pratt student. In this visualization, the node groups are nicely separated and it is easy to see how they are interconnected. I have recreated this graph and I struggled with the same issue: nodes that appear less frequent are barely visible when visualized. 
I have also looked at Mithun Sridharan work Social Network Analysis of Les Miserables, but I found the chosen layout difficult to read, groups were not clearly separated and the edges were hard to follow. I would also like to briefly mention the work of Mike Bostock, which is both visually appealing and informative.


At my work, I have used a sample dataset provided by Gephi, which contained the same nodes and edges as the dataset I initially planned to use and which I downloaded from Gephi wiki. There was only one difference, which influenced my choice – in this dataset, the network was indirected. I have analyzed the data in 0.9.2 release of Gephi software.


Before starting to initiate the visualizations, I adjusted the data in the data laboratory by dividing prefixes from the last names. Afterward, I ran all the statistics and used the ForceAtlas 2 layout, with the option to prevent overlap.

Visualization 1: Character groups in Les Miserables. ForceAtlas 2 layout.

In this graph, I used the ForceAtlas 2 layout to process the data. Additionally, I used ranking by the degree to visualize which nodes (characters) appear most frequently in the book. The data were categorized into 6 groups and I have used different colors to differentiate them. I did not adjust the palette as I thought that color significance is not important in this case. In the preview, I have used nodes opacity 50%. This allowed my labels to be visible. I have also adjusted the thickness of edges, as before, they were dominating the image and made it hard to read.


On this visualization, I displayed 6 groups, the biggest one has Valjean (main character of the novel) in its center and counts 32.47% of relations between nodes. The next biggest group is gathered around Gavroche (a child important for the revolutionary movement) and counts 22.08% of relations. It is interesting to notice that these 2 groups have Marius in common. Marius is the love interest of Cozette (adopted daughter of Valjean) and participates in the failed attempt of revolution. Without him, Valjean would leave France with Cozette and did not try to fight the government. In this case, Valjean and Gavroche groups would stay separate. In future research, it would be interesting to further investigate these groups.

Visualization 2: Character groups in Les Miserables. Fruchterman Reingold layout.

On the data used in the previous visualization, I have used Fruchterman Reingold layout and in the preview, I adjusted the font size to be more visible.


This visualization emphasizes the relationships between different groups and not necessarily their size. It is visible, that the “green group” linking people connected to the Revolution has the strongest internal tights and these characters appear most often in the group. There are strong links between Valjean, his adopted daughter, her future husband – Marius, Javert (police inspector), Thenardier (caretaker of Cozette) and Fantine (Cozette’s mother). However, surprisingly Cozette and Javert seldom appear together. Javert and Marius never meet, even though these are one of the most important characters in the books. There is a very weak link between Cozette and Eponine – even though they were raised by the same family. It also looks that Fantine does not meet her daughter. There is no link between Gavroche and Javert. It would be extremely interesting to compare the book with its adaptations. For example, in the movie adaptation from 2012, the death of Gavroche is one of the leading causes of Javert’s suicide.

Visualization 3: Relationships between female characters.

Relationships between female characters.

I used the data from Visualisation 1. In Data Laboratory, in the Nodes section, I added a column: Gender (F, M) based on my knowledge of a book and French names. I used ForceAlas2 and filters: Edge Weight (more than 1), Partition (Gender): F. 


After filtering, I have received very simplified visualization with connections between only 10 nodes. It means that only 10 different women were appearing together in a novel more than once. Mme Thenardier is responsible for Cosette, Fantine’s daughter and creates the link between 3 different groups of nodes. This visualization presents that there is no link between Cosette and Fantine and Eponine and Cosette, even though it would seem natural (mother-daughter, sister- sister).

Visualization 4: Character groups in Les Miserables with the strongest edges. ForceAtlas 2 layout.

Character groups in Les Miserables with the strongest edges. ForceAtlas 2 layout.

I filtered the data from Visualization 1 by the Edge weight (7.51 and 31). 


This visualization shows that strikingly there is no strong connection between Gavroche and Valjean, even though their nodes are in the center of their groups. Moreover, Gavroche does not have a strong connection with the rest of its group members, he just “knows” them all. Javert follows Valjean in his life, which is visualized. He changes the life of Valjean, which affects all the networks, as Valjean has the biggest number of connections to other characters. This is however not visualized on the network and it looks like this node could be taken away without harming the visualization. For future research, I would assign a weight to the connections between characters to avoid this problem.

Limitations and future research

The Les Miserables dataset is rather small as it contains only 77 nodes. However, I believe that it is very interesting. I potentially could have emphasized more relationships between a few of the most important nodes. In the future, I would have added additional details to the dataset which would allow me to filter the data in bigger detail.