Introduction
Hwang Sok-yong is a celebrated South Korean novelist born in former Manchukuo (now Changchun, China) who returned to Korea in 1945 after the end of Japanese colonization. He embarked on a literary career after first publishing the short story “The Pagoda” in 1970, based on his experiences as a Marine in the Vietnam War where he was responsible for “clean-up,” or erasing proof of civilian massacres by burying the dead.
His 2015 novel “Princess Bari” is named after the protagonist of the famous Korean myth who’s abandoned by her father for being a girl and later resurrects her dead parents with the flower of life. However, the overlap basically begins and ends with the character’s name as this book actually follows a Korean woman named Bari in more recent times (starting in the ’80s). This modern Bari is born and raised in Chongjin, North Korea, where she lives with her grandmother, father, mother, and sisters Sook, Hyun, Jin, Sun, Mi, Jung, her dogs Hindungi and Chilsung. Then, there’s also the dear family friend Uncle Salamander who keeps appearing throughout the book, and toward the end of the book appear the less prominent characters of Bari’s coworkers Zhou, Xiang, and Chen.
I wanted to visualize the network of the characters’ locations throughout the book. Some end up in/travel through the same cities while others do not at all.
Inspiration
I was inspired by data scientist Dimitris Manolidis’s network analysis of the major characters in chapters one through 30 of the 14th century Chinese novel “Romance of the Three Kingdoms.” His visualization analyzes the interactions of the characters as well as using centrality measures to determine who the most influential characters are. Because there are over a thousand characters, he focused on just 70 of them determined by how frequently they appeared in the book in addition to who appeared the most on websites, in articles about the novel, and who appeared in the series.
Materials Used
I first flipped through the book and created an edge table in csv format of each character and the cities they travel through and/or end up in. This file ended up being 67 rows long.
Then, I used the data cleaning and transforming software OpenRefine to transform my data so the header of each column would be a city name with the characters who’d lived in or passed through that city located in that column.
This was done in order to, then, transform the data yet again using the computer programming language R within the development environment of the open-source software RStudio. The reason for this was to create a weighted edge list by aggregating matching source and target values and summing their edges.
The new weighted edge list was written into a new csv file with the new column headings “source,” “target,” “type,” which are required for processing in the network visualization software Gephi. The character names in the same row reflect those two characters’ passing through or living in the same city. Under the column heading “x” you can see the different aggregated weighted amounts assigned to each row. This new spreadsheet had 115 rows.
Methods Used to Create Visualizations and Results
Then, I imported this newest spreadsheet into Gephi, turned on the labels to show each character’s name, adjusted the text size so the labels would be clear without looking overbearing, and chose the layout ForceAtlas 2 to take a look.
For my first visualization, I took a look at the network by degrees, and because there are only 17 main characters, I chose to use a sequential color scale by leaving everything one color. This way, the viewer could understand that the lighter green reflects characters who existed in cities with less overlap versus the darker green which reflects characters who share more overlap in cities with other characters. My initial hypothesis before seeing the visualization was that at least Chilsung, Uncle Salamander, and Bari would be found in the top degree networks as they do move through the greatest amount of cities, and this turned out to be true. On the contrary, Great-Great-Grandmother, Grandfather, and Hindungi had passed away long before many of the other characters, so I was not surprised to find them with few connections and on the lighter color scale in the visualization. The less-frequently traveled characters of Chen, Jin, and Sook are also seen on the outskirts of the visualization with fewer connections.
Looking into the statistics, the degree report showed an average degree of 6.706.
The network diameter (the longest shortest path between nodes within the graph) was calculated to be 3 with a radius of 2 and an average path length of 1.77.
And the network density, which measures how close the network is to complete (with a complete graph having all possible edges and density equal to 1), was calculated at 0.419.
For the modularity report, which runs an algorithm detecting community, I turned on the edge weights, and the modularity was calculated 0.166 while the number of communities was calculated to be 2.
This can also be visualized by changing the graph’s appearance according to modularity class of the nodes:
In the book, Great-Great-Grandmother, Grandfather, Hindugi, and Grandmother live and pass away much earlier than the others. With Grandmother living the longest and traveling the most of that group, it makes sense that she’s the connecting point between the two communities. And then of the characters in the pink community, it makes sense that Father, Mother, Hyun, Mi, Bari, Jung, Chilsung, Jin, and Uncle Salamander are connected as they are family and often find themselves in the same cities. Then, Xiang and Zhou have many connections as they travel to several with Bari. And it’s unsurprising that Chen and Sook are found to be a bit isolated as they had little movement throughout the book.
Reflection
Overall, because I only have 17 characters, while in Gephi I did look at each of the node partition options of Between Centrality, Closeness Centrality, Degree, Eccentricity, Harmonic Closeness Centrality, and Weighted Degree, I felt that showing these visualizations would actually be confusing to my viewer as, then, the graph basically becomes rainbow-colored, making it difficult to discern differences and patterns. For my particular project, I think using just the monochrome sequential color scale for the degree visualization and then using a qualitative color scale for the modularity visualization served the correct purposes. I also tried using the filter to consider filtering the degree range, but my range was only from 1-12, and even when I filtered it to be 2-12, I felt like I wanted to see the characters I’d just filtered out. So I decided not to use the filter at all.
I would be interested in next working with a much larger dataset in order to use the filter in an adequate way to see what happens to my visualization, and I also think that I’d be able to work with a much larger range of colors.
Also, I feel like I need to study the software more in order to understand exactly what each statistic is trying to explain, etc. It’s also a bit difficult to work with since you can’t undo any of the application you make to your appearance. I was always making sure I saved my project at each step only if I was certain I wanted to at least start with my graph at a certain base level.
Future Directions
If I were to continue on with this project, instead of cities the characters lived in or traveled through, I might want to look into each character’s outcome, such as whether they died, went missing, survived, etc.