Inspiration
I’ve been went through several dataset online including jazz music network, good flight path network and character network of movies from IMDB. By reading and dragging dataset of “character network of movies from IMDB”, it’s able to join two tables together to form a network table about the network of two characters collaborated in one movie with the rating score. (Figure 2. )
Saving the joined graph as csv and importing it into gephi, I ran both the Fortlas2 and Fruchterman Reingold layout to see how it looks like for such a huge dataset. It contains 1114 nodes and 11339 edges with the weighted undirected connections within characters and rating scores. I set the size and the color of the nodes and it took a while for gephi to run such a huge network. However, by running it I realized no matter how I adjust the filter(degree range), it made not much of the sense to human to read and comprehend the network. Other than that, when it comes to “Preview” session, gephi was not able to show the network but only the nodes. I doubted it’s because of the dataset is too large but I’m not able to find a solution for that issue in gephi. In the end, I have to go back to R studio and utilized some visualization code to create a network graph as shown in the beginning of the report.(Figure 1)
Hence, I decided to pick a smaller dataset which is meaningful regarding to the network but also in the field of characters of movie or artworks. After I narrowed down the topic and the dataset of nodes and edges, I encountered and explored some graphs of Les Misérables and there are a lot many. I found several that’s interesting and do inpsired me in term of creating a simple but meaningful graph visually(Figure 4).
Introduction
This lab project explored the network of characters in Victor Hugo’s novel, Les Misérables (1862). Hugo’s novel follows the lives and interactions of numerous characters in France leading up the 1832 June Rebellion in Paris. Over the years, this novel has been adapted into various plays, dramas, musicals and more. I watched the film “Les Mis” and the musical many times since I really like this masterpiece. Hence, I am interested in understanding the overall social network of the characters and their connectivity. Through analysing this data set of Les Misérables, the goal of this project was to clearly distinguish groups of characters based on their connectivity, which will help us understand the character relationships, and the main characters, also providing with a potential possibility to manipulate and play around.
Materials
For this lab project, I downloaded the dataset “Les Misérables” sourced from the Gephi Github open source database, comprises multiple character networks the weight of their connections. It contains 77 nodes and 254 weighted, undirected edges.
The software used to create visualizations from this dataset was the network-mapping program, Gephi. Gephi is an open-source visualization platform for “networks, complex systems, dynamic and hierarchical graphs, layout and metrics”. It uses a 3D render engine to display graphs in real-time and speed up the exploration. It is important to understand the basic terminology and concepts of nodes and edges, source and target, undirected and directed graphs. In creating my visualization I referenced the Gephi tutorials and the class network-mapping lecture. I also utilized Adobe color wheels, found on the course LMS site, to assist with selecting the color palate of the visualization.
Working in Gephi
The “Les Misérables” dataset was downloaded as a GML file. As the dataset was clean and did not require any adjustments I was able to easily upload it into Gephi. Once in Gephi, I used “ForceAtlas2” to generate the network graph, it’s a standard force directed graph to give my network an arrangement and shape. I ran applied a few Statistics frameworks (Average Degree, Network Diameter, Graph Density, Modularity, and Average Path Length) on my data which would allow my social network to be analyzed by program. the interesting part that worths mention is the range of degree that ranges from 14 to 76 in this dataset. By dragging the slider to narrow down the range, I can decide the complexity and the size of the network. Another intereting element is Modularity. Modularity shows that the cluster and group for the network which indicates the bigger or smaller communities, the size of the neighborhoods. At first I set the resolution to 1.0, afterwards, I tried 1.25 and 0.75 to see the clusters, by comparing the differences, I decided the 0.75 resolution made more sense visually. (Figure 6.)
Next, I made some adjustments to the tuning, behavior and performance settings of the layout to give it the shape that it currently has. I then changed the appearance of my network’s nodes and edges by altering the colors and size. Within the preview settings, I also made adjustments to the nodes’ label size, the thickness and curvature of the edges. The weight (thickness) of the edge between each node also indicates how often co-appearances between characters occur throughout the novel. Therefore, a thicker the edge informs us that those characters appear together more often than those with a thinner edge.
Creating the Network
After implementing the colour palette and synthesizing the graph further, the results were as follows(Figure ). There are 77 nodes and 254 edges.
Results
As seen in the graph, each node represents a character that appears in the novel. The edges represents an association between characters. The size of the nodes and names of the characters have a direct relation with the number of connectionsa character has.
As you can see here, it can be concluded that the major characters in this novel are Valijean, Javert, Fantine, Cosette, Marius, Gavroche and Myriel. From the chart above, it is clear that Jean Valjean, the main character, has the greatest number of connections which means he is the most dominant. Valjean and Cosette, Valjean and Marius, Valjean and Javert, and Cosett and Marius, are the four co-appearances that occur most often within the novel. Meanwhile, Fantine, Gavroche, Enjolras and Thenardier are also taking a great propotion of space in the novel. It reminded me that I never noticed the co-apperance relationship when I was watching the movie or the musical of Les Miserables. It is interesting that Javert and Marius never directly appear together, and Javert and Cosette rarely appear together.
Since Fruchterman Reingold layout works well for large social networks and the algorithm uses an iterative process to adjust the placement of the vertices in order to minimize the “energy” of the system. To make the result more clearer and more persuasive, but not the greater number of connection to a node is completely conclusive of the factor that the character is the most dominant. I generated a Fruchterman Reingold Layout to consider the force between nodes or in this context, the characters.
Future Plan & Reflection
While one could continue to conduct an analysis of each of the character’s connectivity in Les Misérables, from the other side, it would be interesting to explore in a broader scope about the connection of this masterpiece with other works that created during the similar time period. What’s more, a future project could be to add additioanl data to this network, such as the locations and emotions of each character. By including location to our network filter, we would be able to determine in which locations characters most frequently co-appear throughout the novel. I also encountered a character map made by an artist which intrigued my interests on the combination of the artpiece to a data visualization.
Furthermore, as I mentioned before about my failed exploration of “characters network in movies from IMDB”, I’d like to learn more about the statistic concepts and how it works will affect the visualization as I need to further understand the software and alogrithims in more detail. What’s more, I’d like to ask the professor and the professionals in terms of the industry application that what is the software and the langauge students should learn and practice for generate great and aethetic visualization but at same time would really align with the real case scenario in industry.