How Similar Are Your Favorite Anime?


Lab Reports, Networks, Visualization
Anime Figurines
Photo by Melvin Chavez on Unsplash

Introduction

Anime is a popular animation style that originates from the country of Japan. You can find anime on most streaming services nowadays, and in fact, there are various anime-centered streaming services such as Crunchyroll and Funimation. Crunchyroll boasts having the most anime out of any streaming service with over 1000 different titles. For this lab I was interested in seeing the relationships between some of the most popular anime of all time.

Inspiration

Originally, I wanted to visualize a network of the relationships between characters in an anime, but finding enough data on certain anime proved to be tough. Nevertheless, my inspiration for the visualization of my data was still based off similar character relationship networks. I found one for the Book / Movie Harry Potter. And I felt that it perfectly matched that type of visualization I wanted to make. Building off of this I was able to make my network replacing the character names with the Anime titles and where the edges represented the similarity between nodes (the various anime).

Harry Potter Character Network by hzjken

Materials

The first step for creating my visualization was finding the data. I was able to find a dataset on Kaggle that provided an exorbitant amount of information about seemingly every anime from the beginning of time to 2016. The spreadsheet had over 1500 unique titles! Attached to this original spreadsheet was another that provided calculations on how related each anime was based on their descriptions. This spreadsheet used a python calculation that was somewhat complicated to understand but it did follow a standard procedure.

I filtered down this data to top 15 anime to make it easier to work with. To find these 15, I used MyAnimeList.com, which is which is where the Kaggle dataset originated from. MyAnimeList provides rankings for all known anime, this is primarily done through crowdsourcing, having its users provide ratings. To clean my data I used Microsoft Excel, which enabled me to convert the matrix to an edge list. Next, I used the network visualization software Gephi to convert quantitive network data featuring a list of nodes, edges and weights to a visualization based on various formulas and layouts. Lastly, I did some final touch ups in Adobe Illustrator to make the key and move around the graph’s labels.

Methodology

  1. Clean Data in Excel
Final Excel Sheet

This was the hardest steps of this lab due to the size of the original spreadsheet. To import the data into Gephi, I know I wanted an edge list that contained a starting node, an ending node, an edge weight, and whether the edge is undirected or directed. To start, the similarity matrix dataset used “Anime Id” numbers to represent each anime instead of the titles. I had to use an additional spreadsheet that mapped Anime Id to titles. Once I had the titles, I extracted the columns and rows of the matrix to make a smaller matrix that only contained the top 15 anime. Lastly, I copied the data from the matrix to a final Excel sheet that would be my edge list. It is important to note that my data was in the form of a CSV file. There probably is an easier to extract this data but this was the way I went about it.

  1. Set Up Gephi

Now that my data had been cleaned up it was time to import it into Gephi. I imported my CSV file as edge list and copied the name of each anime over to the “label” attribute in the data laboratory tab, as well as added a new column to display the number of episodes for each anime. Next, I went to download some different layouts under the tools > plugins screen. On the overview tab, I used the following layouts and options to create my network:

  • Circular Layout
    • Fixed Diameter at 750, ordered by type(episode count), displayed counter clockwise, with prevent node overlap checked
  • Expansion (as needed)
  • Contraction (as needed)
  • Rotate (as needed)

Next for the nodes I ranked the size of each node by type(episode count) using the y=x^2 like graph for the spline. Then for the edges I ranked the color of each edge by the weight(similarity) and used on the preset options that would provide the most contrast between the high and low values on the y=x^2 spline also.

Lastly, on the preview tab I kept the graph straight because I thought the circular graph was too busy on this already busy graph as pictured below. And made sure that the edges were proportional to their weight with a slightly lower opacity. The final step is to export as an SVG file.

Example of a previous curved version
  1. Final Touches in Adobe Illustrator

In Illustrator I mainly cleaned up the node labels, and added a key to provide some context for the graph (This was the easiest step).

Results

My Visualization: How Similar Are Your Favorite Anime? Network

Initially I was slightly surprised at how related all the top anime titles were, but it does make sense since most of these are action / adventure anime with male protagonists who are typically underdogs of sorts. I think it’s also neat seeing anime that are apparently not as related to the rest (Fullmetal, Naruto, etc). I am curious about other ways episode count could’ve been included in this.

Reflection

This lab was hard for me. Primarily due to external factors, but also due to how much data was in the initial spreadsheet. My laptop almost couldn’t even handle working with an Excel sheet of this size and it was a very slow process to extract the information about the top 15 anime I wanted to visualize. I also had a hard time because my graph was fully connected, I was nervous it wouldn’t look interesting enough. The circular layout seemed to be the best for visualizing a fully connected network, but I wonder if there are better ways to represent this data.