Amazon POLITICS BOOK NETWORK: THe relationship of Books purchasing by the same buyer


Lab Reports, Networks, Visualization
This image has an empty alt attribute; its file name is Screen-Shot-2019-04-19-at-2.44.11-PM-1.png
This image has an empty alt attribute; its file name is Screen-Shot-2019-04-19-at-2.44.23-PM-1.png

INTRODUCTION & RESEARCH

In the past few months, I have always been researching on the passionate topic that it would drive me to visualize. Sometimes, it didn’t easy to look for relevant data on a certain topic. Amazon has more than 8 million Book sales on their platform. With this pig amount of books, it must include numerous data related to its books in terms of reviews, prices, names, and categories. Until we learned the Gephi which is a very good network visualized tool. I had an idea in my mind that if the book data is too big to understand, it might be a good chance to visualize the relationship of books with Gephi.

During my research, there were various types of datasets on the CASOS Data sources page. and I tried to look for a dataset which is related to the book and the source is under a suitable amount of data because it is more ideal to explore with Gephi. Then I found this dataset which is represented US politics books sold by Amazon. Therefore, it also shows the recurringly co-buying books by the same buyers which means that we can see the relationship between each book that purchased by the same buyer.

Dataset & Software

Gephi Examples

While I researched the Gephi examples, I found the example 1 ( Verified Twitter Accounts of Brazil) which has various colors to indicate the different categories of topics. The blue represents the sport twitter accounts and the blue shows the accounts of politics and media. On this visualization, I can clearly to understand the relationship on each topic by colors. Although this type of visualization is hard to view the size of the nodes to understand the connections between each topic of accounts, it definitely a very aesthetic looks of visualization. Example 2 ( Great Lakes Cisco Activity) has much clear to see the network on each type of activities because I can see through clear and distant of lines that show activities with different groups. It helps me to understand how the connections of this account network that many of activities came from Great Lake Scientists and Influential followers and have lower activities came from Anglers & Outdoor Enthusiasts.

What I learned from both examples that I can utilize in my Gephi experiment:

  • Shows different categories by colors
  • Clear lines or nodes can show more detail of connections
Fig 1 Guide: Analyzing Twitter Networks with Gephi 0.9.1
Fig 2 Great Lakes Cisco Activity

DESIGN ITERATION & METHODOLOGY

In the beginning, I used Openrefine to clean Data and I found it very confused to clean on the XML file, some of the important values were missing. After few times tried, I found that it is very important to select the right sections before creating your file in Openrefine. Later I combined the two CSV files to match ID number with the books name in order to explore the visualizations which’s nodes with the books’ name on it.

Later, I opened the CSV file in Gephi to edit the appearances by changing the colors of the nodes and the lines (PO Book -1). I tried different layouts to showcase the relationship such as Force Atlas (PO Book – 2) and I finally decided the Fruchterman Reingold layout is best to present the network.

After I decided the layout, I started to give the text on each node (PO Book-3). Additionally, I found there are many texts overlap each other, resulted in hard to read properly. I tried to edit under the Label below to change the size by nodes size, but it couldn’t work well until selected the ” hidden-noun selected” bottom so that it won’t show the text until your mouse hovers over the nodes. Then I thought it might be a good way to enhance the readability of this visualizations. Therefore, node size is one of the best ways to resent the weights, I modified them on the node section to show the values by different sizes of nodes, then I had visualization with several sizes of nodes!

PO BOOK -1
PO BOOK – 2
PO BOOK – 3
PO BOOK – 4

FINAL OUTCOME & FINDING

PO BOOK – 5
PO BOOK – 6
PO BOOK – 7

The Gephi experiment journey was very fun, it divided the books ranking into different colors and different sizes of nodes that show the types of politics books and how frequently the same buyer buy this book and other books. Obviously, the book ” A national party no more” have higher frequently purchased rate compares to another small size node book.

What I learned and the next step

Gephi is a very effective tool to show the relationship of a network, but I don’t know why I had a problem with previewing the image Therefore, I found it is hard to understand the value by looking at the number on the Data and it was also difficult to create an advanced appearance if there aren’t many datasets to show. (If I want to have more nodes and lines) so my next step is to learn the meaning of each category on the data files in order to imagine if the selected data is enough to explore. Additionality, the optimal goal for this network visualization I hope to merge political books with another categorized book to show more complexity of Amazon book networks.

Reference

Fig. 1. Luca Hammer; “Guide: Analyzing Twitter Networks with Gephi 0.9.1”; Web, Sep 6 2016, https://medium.com/@Luca/guide-analyzing-twitter-networks-with-gephi-0-9-1-2e0220d9097d; April 18 2019

Fig. 2. Great Lakes Cisco; “Great Lakes Cisco Activity“; Web, Web, 15 Mar 2019, https://medium.com/@Luca/guide-analyzing-twitter-networks-with-gephi-0-9-1-2e0220d9097d; April 18 2019

Felicia, Tu. “PO BOOK-1 – PO BOOK-7 .” 2019. PNG file.