Introduction
For this R and RStudio Lab, I once again decided to look for a data set I found interesting. Hoping to find something that fits the theme of my other lap reports, I came across a data set of Video Game Sales on Kaggle. With my data ready, I set out to see how I could represent the information in an interesting and digestible format.
Materials
As with previous labs, my materials include: Kaggle for finding the data set, R and RStudio to process that data into a visualization. After combing through the data retrieved from Kaggle, I set up RStudio. This meant running R, running the additional plugins (such as the “tidyverse”), and importing my folder containing the .csv file.
Process
Arguably the hardest part of this for me, I began by trying to recreate the same environment that we worked in during class. Making sure to load all the right plugins, name things the right way, etc. I was stuck for a while on why df <- filter() wasn’t working until I realized I had forgotten to define what df was. I retraced my steps from the code we had in class and got to a point where I could separate my data. Going from a full data set that included Rank, Name, Year, Platform, NA Sales, EU Sales, JP Sales, Other Sales, and Global Sales, to a few smaller ones that would be better for the visualization. One with the Year and highest Global sales, one with Year and total Global sales, and a facet_wrap that included sales by year per console. I landed on these three because I could not conceive of what other ones might make sense here, due to my limited understanding of RStudio.
Result and Critique
In Figure 1, we can see the correlation of total sales by year. I used an area graph to show the trajectory of overall game sales from 1980 to 2016 (the limits of the data set). It is interesting to see the rise and fall until the 2000’s, where sales skyrocketed. This is most likely due to the Nintendo Wii bringing games to a more casual audience, something that is seen in Figure 2 as well. The decline, however, is harder to explain. I worry that the data set is too limited, as there is most likely raw data out there for video game sales until at least 2020, which would be helpful to explore this trend.
Similar to Figure 1, this visualization tracks what the highest number of game sales for a single game was per year. The highest amount here is the 2008 release of Wii Sports, which sound about 82 million copies. I tried for a while to be able to add labels at the top, perhaps highlighting which game was at the highest peak and just how many units it sold, but I could not accomplish this.
Figure 3 is where I decided to get more experimental. Using the facet_wrap function, I created one small graph for each console and tracked their total # of units sold per year. The visual aspect left something to be desired, as I found the graphs to be a bit too small. Most of the feedback I got from a colleague I spoke to was about this visualization, as it was originally a black area graph with not distinguishable line. I felt that adding a thicker line and giving the are a lighter color and some opacity would help, but I think it could have used more. Interesting to not that some of axis dont make sense due to the way it had create numbers that correlated. the Game Gear (GG) for example, does not have any data on the graph due to the low numbers it sold. Overall, this visualization is interesting but could be expanded.
RStudio is a very complicated piece of software that requires a good amount of prior understanding before something visually appealing can be created – but the tools are expansive. Getting a hang of the facet_wrap so that it was at least understandable felt very gratifying. The information displayed, most focused on sales, year, and console, is interesting as you can track the ebb and flow in a couple of different ways. There are a few interesting trends one can point out, such as the way most consoles reach an apex and then never return there, or the way that overall video game sales seem to have reached a peak that has yet to be reached again. I could have tried different graphs for different objects, such as the Genre and Global Sales, or Global Sales per Region. This will have to wait for a later R project.
References:
Ali, A. (2020, October 8). Sales of video games. Kaggle. Retrieved October 26, 2022, from https://www.kaggle.com/datasets/arslanali4343/sales-of-video-games