Lab Reports, Networks, Visualization


Dream of the Red Chamber, written in 18th century China bu Cao Xueqin, remained one of the four most influential novel throughout Chinese history. It represents the standing as The Tale of Genji to Japan or as In Search of Lost Time to France. The book narrates the story of the fall of four families, Jia, Shi, Wang, Xue. The novel is regarded as the encyclopedia for the life in ancient China. It is especially known for its meticulous depiction of humanity and criticism on feudalism. Unfortunately, only half of the book was written by Cao Xueqin, the original author, and the other half was implied by Gao E and Cheng Weiyuan while they were preparing the printed editions of the book. The unfinished plot also adds another layer of charm to the literature piece. More and more people get attracted to the book; “Redology”, the study dedicated to the book, hence appears. In the 21st century, examining the book through statistics might provide some new insights to the study.


The article by Francesco Cauteruccio on analyzing his favorite book One Hundred Years of Solitude describes how he analyzed both text and network of the book. For the network visualization, the author used Gephi, matplotlib and Tableau. Beyond creating the graph of all characters with relationships and name labels, he further find the diameter, degree centrality and betweenness centrality of the network. Throughout such analysis, the author find out the most important character of the novel is Úrsula, and the main thread of the entire novel is Colonel Aureliano Buendía and Aureliano Segundo.

The project inspired me to work on my favorite book as well, which is called Dream of the Red Chamber. While reading the book, I was amazed by the number of roles appeared, and how events correlated together like a web. After reading Cauteruccio’s article, I feel that visualizing the network of the characters and events can probably help me better sort out the plot.


The dataset used for the lab is found through CSDN, a Chinese website for developers, similar to Stack Overflow. It can be downloaded through here (password: wgbr). The software used for generating the directed network graph for this lab is Gephi 0.9.2., an open-source software that allows users to explore, analyze, spatialize, filter, cluterize, manipulate, and export the network graphs.


The dataset was already clean and refined in csv. file, so I imported it to Gephi directly. There are two parts of the datasets we need to import, one is the node table, and the other one is the edge table. The node table includes three categories of nodes: people, place, and event. The edge table records the relationship between these nodes. Next, in order to make the graph more readable, I partitioned the graph by modularity. However, it is still hard to read as nodes scrambled together.

Therefore, I tested out the different layouts embedded within Gephi, and decided to choose the Force Altas 2 layout. I also realized that the weight of the direction from Zijuan to Daiyu is 2.0 meanwhile all the others are 1.0. Based on this, and my knowledge of the novel, I view the 2.0 as a typo, and corrected it to 1.0. I further partitioned the graph by different color according to the personality of each character. For example, red is chosen for the hero since the author mentioned that the hero likes red the most.


By filtering the in-degree and out-degree numbers in Gephi, I find out the 3 characters are Jia Baoyu, Lin Daiyu, and Wang Xifeng. Meanwhile, the 3 locations connected to most people are Jin Lin, Jing Cheng, and Rong Guo House, the house of all the 3 characters. The 3 major event that have most people attending are the begonia poetry club, messing around at school, confisticate the Grand View Garden, the scene depicting the final downfall of the Jia’s family.

  • Relationship Between Characters


Although the graph could make the relationship much more clear, it is still hard to explain to those who have never read or even heard about the book, especially for the locations and events. The most meaningful use of this graph is to demonstrate to those faithful readers a more clear relationship of the characters in the book. The book could be the most important novel throughout the Chinese history, and there are researchers in China particularly study this book called redologists. This graphics could be a very helpful tool for those researchers as well as other scholars who are trying to study the relationships in books. In the future, maybe can take time work on the translation to expand the audience of the network chart.