SENTIMENT ANALYSIS OF SONGS


Visualization
Casette Tapes

BACKGROUND

Sentiment analysis or opinion mining refers to the application of natural language processing, computational linguistics, and text analytics to identify and extract subjective information in source materials (Hovy, 2014). It is a fascinating and important area of study, and I’ve  always wanted to learn more about it as it is incredibly useful for music producers, artists, and marketing teams who want to understand how their music is being perceived by their target audience. 

In my Lab report, I used the dataset “DataSong” from Kaggle. This dataset is a collection of audio features extracted from over 26,000 songs across various genres. Each song is described by 18 different audio features, such as tempo, loudness, and energy, which can be used for music analysis and machine learning applications

In this exploratory Sentiment analysis, I wanted to explore the following research questions 

  1.  Top 10 most common words used
  2. The difference in sentiments in NRC and Bing lexicons.

MATERIALS

Dataset: Song Dataset – Kaggle 

Tools: R:Studio (data analysis and data visualization)

METHODS

Screenshot of the code in R:

First, I started by analyzing the most common words that are used in songs since I thought that this would provide me with some valuable insights into the themes and emotions conveyed by songs. Since the dataset was so huge, I just wanted to explore the top 10 most common words used in songs as this might help me uncover patterns and underlying trends in popular music. 

I first tokenized the datasong.csv into individual words which are then filtered to remove stop words. The remaining words are then counted and sorted in descending order to identify the top 10 most common words. Finally, I created a bar plot using ggplot, with the words on the x-axis and their corresponding frequency on the y-axis. The bars are colored red and labeled with the word frequency. The resulting plot provides a visual representation of the most frequently occurring words in the song lyrics dataset.

Then, in order to find the difference in sentiments in NRC and Bing lexicons, I started with a simple sentiment analysis. First, I defined a vector of undesirable words and filtered them out of the text data using the filter function. I then filtered out words with less than 3 characters and then removed common stop words. Next, I created sentiment datasets using different lexicons ( NRC and Bing) and joined them with the songdata. I then created the visualizations of the sentiment datasets, using ggplot to produce bar plots of the count of words for each sentiment category.

Finally, I created a comparison cloud, which is a type of word cloud that highlights the most common positive and negative words in the data using the comparison.cloud function.

RESULTS

Top 10 most common words 

The visualization displays the top 10 most frequently occurring words in the song lyrics dataset. The horizontal bars represent the frequency of each word, with the word itself labeled in white text and the frequency count in parentheses also displayed in white. The plot is titled “Top 10 most common words” and has x and y axis labeled, “Word” and “Word Count,” respectively. The resulting visualization shows the top 10 most common words, with “love” being the most frequent. Therefore, it can be concluded that romantic songs are most popular among artists.

Songdata NRC Sentiment
Songdata Bing Sentiment

Two bar plots are created using ggplot2 to visualize the sentiment of the songdata. The first plot is for the NRC sentiment lexicon, and the second is for the bing sentiment lexicon. The NRC plot reveals a higher number of positive sentiment words in comparison to the negative ones. The Bing lexicon displays a prevalence of negative sentiment words compared to positive ones, indicating a discrepancy with the NRC lexicon. 

Bing comparison cloud 

The comparison.cloud function from the wordcloud package was used to create a comparison cloud plot that shows the most frequently occurring words for each sentiment in the bing sentiment lexicon. The comparison cloud shows the most frequent words associated with positive and negative sentiments according to the Bing lexicon, with larger font sizes indicating greater frequency. The visual results suggest that the most common positive sentiment words are related to love and relationships, while negative sentiment words are related to betrayal and heartbreak.

REFLECTION

I have prior experience working with R studio so I enjoyed exploring it a bit further. The only minor issue I always run into with R studio is loading the packages, since they can be a hit or a miss. There were instances where I had to restart the project a couple times since the packages were not loading properly. My lab partner also provided me with some helpful feedback by suggesting that I explain the x and y-axis more clearly to make it easier for readers to understand the results. I found this feedback to be extremely helpful and it was great to have a fresh pair of eyes on my work. Overall, I think this lab report addressed my research questions regarding the song dataset – The top 10 most common words used in the datasong.csv and to see if there is any difference in sentiments in NRC and Bing lexicons. The difference between the results generated by these two lexicons raised further questions that I want to explore further in future projects where a more comprehensive analysis involving bigrams and negation words might be needed to fully comprehend this disparity.


REFERENCES

Download the RStudio Ide. RStudio. (n.d.). Retrieved March 20, 2023, from https://support–rstudio-com.netlify.app/products/rstudio/download/ 

Hovy, E. H. (2014). What are sentiment, affect, and emotion? applying the methodology of Michael Zock to sentiment analysis. Language Production, Cognition, and the Lexicon, 13–24. https://doi.org/10.1007/978-3-319-08043-7_2 

Mixtape. (2012). photograph. 

Thapliyal, A. (2021, November 20). Datasong. Kaggle. Retrieved March 18, 2023, from https://www.kaggle.com/datasets/anshthapliyal/datasong