Interpreting Political Opinions through network of books


Lab Reports, Networks, Visualization

Introduction

Politics is shaped by people and many times people’s opinions are shaped by what they see and read. For the writers, writing a political book is inherently a political act just like selecting who to vote for. So by that means, books can instigate change. Extreme book titles can affect views through hatred rather than debate. In a year of presidential elections, frequent purchasing of books by the buyers can represent the political scenario and show the patterns. 

The dataset that I used showcases a network of political books published around the time of the 2004 presidential election and sold online by amazon.com. The colors indicate whether these books provide liberal, neutral, or conservative opinions and the edges between the books and node size represents frequent co-purchasing of books by the same buyers and its degree.

Inspiration

During my initial research to see different types of networks, I came across Facebook data collection and photo network visualization. This data shows people connected in a friend list which was done by getNetwork.

Here, figure 2 also demonstrates the use of a Gephi plugin called Image preview released by Yale Computer Graphics Group which allows the use of pictures in Gephi network visualization. 

Even though I liked how the network appeared in figure 2, my inspiration for this project came from figure 1 as I encountered problems while getting the plugin. 

After doing a lot of research through different websites like Github, Casos, Snap, etc. the data that I finally worked on was found here. This data is compiled by Valdis Kreb. 

Validis Kreb created a network visualization for the same data which helped me understand the two major clusters in the data and was a leading point in the creation of the network visualization I created for this project. 

Visualization by Valdis Krebs

Methodology

In order to create the visualization, I used various steps which are listen below:

Step 1: Finding the Dataset

For this project, I collected the data from the page of a professor called Mark Newman at the University of Michigan in which he has provided a collection of network data sets from various sources.

Dataset Source

Step 2: Gephi

After finalizing the dataset, I was able to directly import the data to Gephi and run some statistics. These include average degree, network diameter, graph density and modularity. In this dataset, Nodes have been given values “l”, “n”, or “c” to indicate whether the books are “liberal”, “neutral”, or “conservative” respectively. 

Visual Structure

Inspired by the network created by Validis Kreb, I initially created a visualization using Circular layout but it had many flaws including the unstructured labels and edges. This could have become very confusing and after playing around with other layouts like Force Atlas and Force Atlas 2, I decided to work with Fruchterman Reingold and because it was compressing the data, I used the Expansion tool in the layout section to give it a definitive structure.

Circular Layout
Text and Color Palette

The text size is derived from the degree in the data ranging from 2 to 25. The font used in this visualization is Sans Serif. 

Initially the colors used for this visualization were inspired from the global political colors where blue is used for conservatism and the color yellow is most commonly associated with liberalism. 

Initial visualization

Since the data in this project is derived from the books on US Politics, I changed the color palette which is now inspired by the political colors used in the USA. A unified colour scheme (blue for Democrats, red for Republicans) began to be implemented with the 1996 presidential election. Unlike the global color scheme, the USA uses blue for liberalism and red for conservatism. 

Visualization in Gephi

Step 3: Figma

I used Figma in order to represent the data with the legend and captions.

Results & Discussion

This visualization is created by 105 nodes and 441 edges in total. This network of political books based on purchase patterns from amazon.com presents how books are connected through their frequent co-purchasing by the same buyers as indicated by the “customers who bought this book also bought these other books” feature on Amazon. Books like “Bushwhacked” or “American Dynasty” show the liberal side of politics and books like “A National Party no more” and “Off with their heads” showcase the conservative side. 

Final visualization with labels

Through this project I tried to generate curiosity about the following points:

  1. What does this information suggest?
  2. What can you do with this information during the 2004 campaigns?
  3. Could it be that our book readers are key opinion leaders in their communities?
  4. In a year of presidential election, is this the new arms race?

Reflection

It was a rough start for me in the beginning of this project as I struggled to find a dataset I could work with. I had certain topics that I wanted to work with but couldn’t find any compatible dataset. For example, I wanted to work with a dataset network for the tv show – The Office but since the data was not organised and clear, I had to drop it. Also, Gephi as a software was pretty new to me in a way that I didn’t understand what kind of data would work well with it. In the beginning while exploring it, I couldn’t even see the data in the data laboratory which I realised only takes two clicks to appear if it’s not already visible. After going back and forth as my software crashed a few times, I realised that I had to find a dataset that had fewer nodes and edges. 

The final dataset that I used for this project got me thinking about the network of books in relation to the network of individuals who read them. There appears to be two very clear divisions in political opinions. There are some political experters who may read books from both sides but commonly, people would choose books with similar opinions as them. 

Future directions for me would be to be able to understand Gephi in more detail and provide better avenues and explanation for the projects through it. I feel like this was a good starting point but I definitely will have to explore more datasets with Gephi in order to fully understand the software.