Introduction
Comic books are a product that is consumed internationally. Audiences of all ages enjoy the visual and storytelling aspects of this medium. English language comics are among the most popular in the world, with certain countries and titles exceeding in publication more than others.
The charts and graphs illustrate information about the publication of ongoing English language comics, including popular countries, trends over time, and the comics with the most issues.
Materials
I used a dataset named “Statistics for Comics in English” provided by the Grand Comics Database.
To clean data, I used OpenRefine to clean the data. It is a free application. I also used Microsoft Excel to resolve some duplications.
To create graphs and charts, I used Tableau. It is a free application.
Methods
The Grand Comics Database (GCD) is a “nonprofit, internet-based organization of international volunteers dedicated to building an open database covering all printed comics throughout the world.” Because it is information provided by volunteers, I was concerned about the accuracy and consistency of the information.
Open Refine was useful in easing some of that concern. I used the text facet to resolve typos. I trimmed leading and trailing whitespace. I also limited the dataset to only ongoing comics. I also used cluster to merge some rows.
There was one column in the data that were numbers, where each number represented a country. I looked at the schema the organization provided to see what it was. It stated,
“The code column is the ISO 3166 code for the country, and should arguably be used as the foreign key, but due to the very rushed development timeline of the December 2009 release (as the prior server was no longer viable), we went with the standard numeric id column for reasons which seemed like a good idea at the time, but involved a great deal of sleep deprivation 🙂 We’ll probably revisit this eventually.”
Because I wanted country information as a category, I sorted the data to see which numbers were most popular. Then, I searched some of the comics from the filtered data and was able to see that comics with the same number (for example, 225) had the same repeating countries. I used this to reveal the names of the top five countries (225 represented the U.S.) with ongoing comics.
The charts and graphs in Tableau were created by looking at the dates comics were published, the country they were from, and how many issues they have. I initially chose red for all of the values that were the “most.” However, this meant that the comic with the most issues, as well as the country with the most publications (the U.S.), were shown as red, despite not being directly related. To resolve any potential confusion, any charts with data with the United States as the “most” (most comics, or most published over time) was highlighted in red, with other countries in blue. Other charts with data not specific to the U.S. were given the color green.
At some point, I noticed duplications. I resolved them in Excel, and refreshed my data in Tableau.
In creating the dashboard, I struggled with prioritizing the charts and graphs. I organized the charts and graphs by type and color to create some organization.
Results
The charts and graphs can be viewed on the Tableau dashboard. There are four charts.
The top left chart illustrates how many new comics are published per year. There is a slow increase in the early 2000s with a massive increase in the 2010s.
The top right chart illustrates how many new comics are published per year, by country. The United States leads in publication.
The bottom left graph illustrates the top 10 comics with the most issues. There are ongoing comics that began publication after World War II, which is reflected in this data.
The bottom right graph illustrates the number of comics that are published by country. Like in the top right chart, the United States leads in publication.
Reflections
It is very possible that the data is influenced by volunteer efforts, which may have increased with recent cultural interest in comics and comic characters in films. This would make for skewed information. This may also explain the dramatic difference between American comics and comics from other countries. The information is most likely dependent on what volunteers can access, and where the volunteers live. It was frustrating that the countris were coded by number, and that the schema was not helpful.
The next step would be establishing ways to organize the dashboard to prioritize chart information.
It would also be helpful to have another place to look for datasets that may have more accurate information. However, due to the proprietary nature of pop culture material, other datasets with similar products would most likely be crowdsourced like the dataset used for this project.