Introduction
At this point, we all know which artists and songs will be crowding the top charts. But, when we look at the top charts and trending songs, what traits do they have in common? What makes a song so popular that it captures global attention? There are so many different factors that go into a song and make it the way it is, whether it’s the tempo, the length, or the musical key. These more discrete factors that make up a song could give us some key insight into what makes a song a top global hit.
In this second lab, we were asked to create a visualization using Openrefine and Tableau Public. Through visualizations, we will explore key characteristics of songs to find the most popular combinations and when these characteristics are most popular. The dataset that was used for this lab includes all of the weekly top 200 songs from 2020-2021 according to Spotify. It is important to point out that this lab will not be focused on specific songs and genres, but rather the characteristics of each song.
Materials and Method
There were three main steps to getting the final visualizations: the first was finding a data set on a public database, cleaning the data with Openrefine, and then creating the visualizations with Tableau Public.
The initial step in this process was finding a reliable dataset that could be used to create the visualization. After sifting through many unfinished datasets and ones with questionable credibility, I came across a dataset on Kaggle.com that featured all of Spotify’s weekly top 200 songs from 2020-2021. In addition to the data set having the song title, artist, and rank on the chart, the dataset also included different measures such as danceability, popularity, energy, tempo, and valence, just to name a few. According to the data set’s creator, the songs were scraped using BeautifulSoup from spotifycharts.com, and the additional song features were calculated using a python library called Spotipy.
Openrefine is an open source web-based program that allows users to clean and manipulate datasets. The Spotify dataset was downloaded as a CSV file and uploaded to Openrefine. After cleaning the dataset and getting rid of irrelevant information, the dataset was ready to be uploaded to Tableau Public. From there, various combinations from the dataset were explored and some interesting trends were found regarding the top songs from 2020-2021.
Results and Visualizations
The most interesting results within this dataset were found when measuring the highest charting position and various song features. After learning the definitions of the song measures as given by the author of the dataset, there were three measures that stood out when determining the most important variables. The three variables were song length, danceability, tempo, and valence. The song length was measured in minutes, the tempo referred to the average beats per minute of the song, the danceability measure referred to how easily you can dance to the song on a scale of 0 to 1 (1 being the most danceable), and the valence measured the song’s overall tone on a scale of 0 to 1 (1 being the happiest). Since each variable was measured based on the highest charting position, each measure was the average of all songs with the same ranking.
<script type='text/javascript'> var divElement = document.getElementById('viz1664971715752'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>
From the above graph, you can see each variable’s sweet spot for the top 200 songs. These songs all had similar tempos, valences, durations, and danceability rankings. With these four rankings put together, many aspects of the songs are represented despite the dataset featuring many other measures of a song’s features. You can view this interactive graph here.
Some Additional Visualizations
While exploring Tableau’s features, I ran into some other interesting visualizations that give insight into listening behaviors and song trends.
<script type='text/javascript'> var divElement = document.getElementById('viz1664972058702'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>
In this graph, the largest circle represents the time in which people were listening to the most danceable songs. Coincidentally, this time period was the week where people celebrated the New Year. The two other red circles on the graph also represent time periods within the month of January 2020. You can view this interactive graph here.
<script type='text/javascript'> var divElement = document.getElementById('viz1664972391370'); var vizElement = divElement.getElementsByTagName('object')[0]; vizElement.style.width='100%';vizElement.style.height=(divElement.offsetWidth*0.75)+'px'; var scriptElement = document.createElement('script'); scriptElement.src = 'https://public.tableau.com/javascripts/api/viz_v1.js'; vizElement.parentNode.insertBefore(scriptElement, vizElement); </script>
In this graph, the main song chord was measured against the highest charting position on the Spotify charts. The color represents the mood of the song, with red being the happiest and black having the lowest valence. An important aspect to note is the lower the highest charting position, the better since it is closest to the number one position. You can view this interactive graph here.
Behind the Visualizations
Since there were many different variables coming into play with these visualizations and measures with scales of 0 to 1, it was important to choose a color scheme that had a large contrast to increase visibility. The color scheme chosen for these graphs was a red-black diverging theme.
Reflection
Despite having difficulty with the visualizations, the lab itself was very entertaining since we were able to explore topics relevant to our interests. If I were to do this again, I would probably choose a dataset with fewer measures to use for my first official visualization project. There are definitely far more interesting things one could uncover with this dataset, but due to my lack of experience with the separate platforms I ended up running into different problems that could have been avoided with more practice. Additionally, there were certain song attributes that I thought would have strong correlations but ended up having little to no relationship. This affected the final results and final visualizations created for this lab. I was still able to find some interesting connections within the dataset, and give insight into a few of the most popular characteristics that make a song a top-charter.
References
Pillai, Sashank. 2019. “Spotify Top 200 Charts (2020-2021).” Kaggle. https://www.kaggle.com/datasets/sashankpillai/spotify-top-200-charts-20202021.