Charles Dickens first published David Copperfield serially 1849–50, and as a book in 1850. It is his most autobiographical novel and is told from the perspective of an adult David Copperfield, looking back at his life and his development over time.
As a former student of English Literature, I was excited to work with a topic that was familiar to me. I was curious to see how the language in a book that is nearly 200 years old would appear when visualized and to see if any themes emerge in the data that would give someone who is otherwise unfamiliar with the novel a basic understanding of the book based on the network of words.
My first thought about network visualization was that they reminded me of word clouds where the use of scale and color informs how the viewer perceives hierarchy/frequency of different words.
Additionally, after I got started with experimenting, I took a look at a couple of other student examples from Alvina Lai and Anna Size who used the same dataset to make sure that I was on the right track.
The software used for this project is called Gephi, which is free and open-source for Windows, Mac, and Linux. It is used to calculate network statistics, detect clusters, and filter, style, and label. Plugins are available from open-source developers which provide additional features. Once the visualizations are finalized, they can be exported as PDF, PNG, or SVG.
The data set I am working with was found on the Gephi GitHub page. The file was in Graph Modeling Language (GML) format, which I was unfamiliar with, but I opened in a text editor and it seemed to be consistently formatted so I decided to try out importing.
Methods & Process
My first step was to open the GML file in Gephi and review the data import. There were no errors so I was able to move forward. There were 112 nodes and 425 edges in the GML file. I added a column for Word Type as well so that I could style nouns and adjectives differently.
Next, I ran the following statistics and checked that the data tables updated with the calculated values:
- Degree = 3.795
- Diameter = 5
- Density = 0.68
With the data ready to go, I started to experiment with layout and styling on the Overview tab and Previewing/Exporting some samples:
Results & Analysis
With a few changes to the labels (font & color) and the node sizes and edge style, I decided that the version below is the most clear and shows which nouns and adjectives are most common and have the most connections. I suppose since the words ‘little’, ‘old’, and ‘good’ stand out most, this somewhat achieves my initial goal of conveying broad themes of David Copperfield.
Reflection & Next Steps
Overall, I was nervous about working with Gephi at first since I had no previous experience with graph theory or networks. However, once the data imported cleanly and I started experimenting with the styling I became more comfortable. Seeing how the data and layout adjustments impacted the visualization helped me to understand the statistical data better. This data set was relatively small and simple, so I would be interested in trying out something with more complexity or modularity in the future.
Dickens, Charles. David Copperfield. http://www.gutenberg.org/files/766/766-h/766-h.htm#link2HCH0001
M. E. J. Newman, Phys. Rev. E 74, 036104 (2006).