Background
Emojis have become an important mechanism for conveying human-emotion in writing. In this vein, they can be useful tools for analyzing public sentiment expressed on online channels as emojis can span languages and express sentiment in a more universal way. For this lab, I wanted to see how we could explore public sentiment during an international online social movement analyzing only the emojis.
I found two emoji-related and publicly available datasets to help me do this. The first dataset provided a general sentiment analysis of 751 emojis, while the second dataset included a collection of 28,629 tweets (and their containing emojis) from the #MeToo hashtag.
Data Collection and Cleaning Process
The first dataset I found was the Emoji Sentiment Ranking data available on Kaggle. This dataset was created in 2015 and is comprised of 751 emoji characters and their assigned sentiment. Sentiment for these emojis was calculated based on 70,000 tweets that were labeled by human annotators in 13 European languages. Sentiment is broken down into 3 classes: positive, negative and neutral sentiment. Emojis are then assigned values based on the number of times annotators found that they expressed positive, negative and neutral sentiment and instructions on how to calculate an overall sentiment score was provided by the authors. This dataset included records for:
- The emoji symbol
- The emoji’s UNICODE name
- The emoji’s positive, negative and neutral score given by annotators
The second dataset, Tweets With Emojis – #MeToo, was downloaded from data.world and included a sample of tweets from the #MeToo hashtag. Tweets were pulled from a sample of 28,629 English-language tweets on Oct. 16, 2017. This dataset included records for:
- The tweet text and URL
- Number of emojis used (if any) in the tweet
- The emoji symbol
- The emoji UNICODE name
I wanted to join the two datasets together so that the emojis in the Tweets With Emojis – #MeToo tweets would correspond to the sentiment scores from the Emoji Sentiment Ranking data. To do this, I needed to check that the emoji UNICODE names for both datasets matched so that I could use this as a primary key to join the two tables. I worked with OpenRefine to perform the data cleansing and I used find and replace for any UNICODE names that didn’t match the UNICODE names from the Emoji Sentiment Ranking data. Since the sentiment ranking dataset was from 2015 and the #MeToo dataset was from 2017, some of the emojis used in 2017 weren’t around in 2015 therefore those emojis without a sentiment analysis were removed as records. After joining the two tables together, I deleted any records that didn’t contain emojis in the tweet. Some tweets had more than one emoji, so I used the split multi-valued cells function in the ‘Emoji’ column to create a new record for each emoji.
After data cleansing, the total number of tweets included in the analysis was 1,218, with 1,633 emojis. The final step in the process was done in Excel to calculate the sentiment score based on the calculation outlined by Kralj et al. (2020).
Visual Inspiration
I knew that I wanted to visualize a part-to-whole relationship to show the breakdown of each emoji’s positive, negative and neutral sentiment, so I drew inspiration from a New York Times article that showed part-to-whole percentages for the tendencies of politicians to lie or tell the truth. I liked how they highlighted the two opposing values in contrasting color while keeping the neutral value grey.
I was also inspired by other sentiment analyses that used scatter plots to show how data points fall along a spectrum of emotion, while also using emotionally charged colors like green for pleasantness and blue for unpleasantness.
Data Analysis and Visualization Process
I wanted to discover which emojis were most frequently used in the collection of #MeToo tweets so I could narrow down the visuals to a more targeted set of datapoints. For all graphs (but the scatter plot) I filtered by the top 20 most frequently occurring emojis in the #MeToo dataset.
To contrast positive and negative sentiment, I used opposing colors and also tried to incorporate some color psychology by using green for positive sentiment and red for negative sentiment. I desaturated the colors to avoid too much eye fatigue and ran it through a color blindness simulator to make sure it passed red/green color blindness tests. For other graphs I chose to incorporate the shade of pink used by metoomvmt.org to show homage to Tarana Burke, the founder of the Me-Too movement, and her organization.
The titles for each graph posed the question I was trying to answer in that visual, and I added brief captions to each visual to provide supplemental instructions when looking at the data.
Results
Of the top 20 most frequently used emojis in the #MeToo tweets, heart emojis were the most common (8), followed by faces (6) and hand gestures (6). ❤️ was used more than twice as much as any of the others (249 occurrences). The overall sentiment of the top 20 emojis was moderately positive with an average sentiment score of 0.37 on a scale of -1 (completely negative) to +1 (completely positive). ❤️ had the most positive sentiment score, while 😕 had the most negative sentiment score. Hearts and hand gestures made up the most positive sentiment scores, while facial expressions encompassed the most negative sentiment scores.
When looking at the sentiment breakdown, none of the top 20 emojis had sentiment scores made up of entirely negative or entirely positive sentiments. While this is to be expected as sentiment falls on a spectrum, it’s still an important reminder that emojis cannot always provide an accurate depiction of intended sentiment in absence of text. It’s also interesting to see some of the emojis with very little neutral sentiment (😡).
Lastly, we see a bigger picture of the overall shape of sentiment in the below scatter plot. Every emoji used in the #MeToo tweet collection is plotted on the graph, and I annotated selected tweets that used emojis that fall into positive, negative and neutral sentiment ranges to show an example of text accompaniment. While these annotated tweets seem to generally express the same sentiment that their emoji counterparts do, I’m sure there are tweets that have text that doesn’t match the same sentiment of their emojis.
Limitations and Future Work
While this lab was an exploration into using only emojis to gauge sentiment, this method has its limitations as emoji sentiment can differ from overall textual sentiment. The #MeToo dataset is also limited in that in only captures tweets from one day. Still, it’s interesting to see how people might interpret sentiment from a tweet based solely on emojis, particularly for people that don’t speak the same language. In a global online social movement like the #MeToo movement, I imagine emojis may have helped people gauge general sentiment when language barriers were present. Potential future work could investigate the differences in with emojis might be viewed by differing cultures and how this would impact using emojis to gauge sentiment on a global scale.
References
Kralj Novak P, Smailović J, Sluban B, Mozetič I (2015) Sentiment of Emojis. PLoS ONE 10(12): e0144296. doi:10.1371/journal.pone.0144296