Connecting emojis


Lab Reports, Networks, Visualization

Background

This lab report is a follow-up to Exploring Sentiment Through Emojis – a sentiment analysis of #metoo related tweets. Using the same dataset from the previous report, this network analysis examines the co-occurrences of the #metoo tweeted emojis in an effort to gather insight on the relationships between these emojis and their sentiment.

Data Preparation

Creating the Edge List

In the ‘Exploring Sentiment Through Emojis’ dataset, I had split multi-valued cells for any tweets that contained more than one emoji and created a new row for each emoji. For this network analysis, the first step in the data preparation process was using OpenRefine to re-join those co-occurring emojis in the same cell again. I removed the emoji symbols and kept only the Unicode names as the visualization tool I used for this network analysis, Gephi, could not render emoji symbols.  Next, I split multi-valued cells so that each Unicode name appeared on a new column (instead of row) so that I could eventually prepare a two-column edge list for source and target

I had originally used RStudio to create the edge list, but I ran into issues getting the r.bindlist function to work (this may be due to similar errors being reported by MacOS users when installing the data.table package?). The dataset was composed of tweets with mostly 2 co-occurring emojis (therefore already establishing a two-column undirected edge list), so for any tweets with more than 3+ emojis I manually added those additional sources and targets to the edge list.

Creating the Node List

While Gephi could automatically extract a node list from my uploaded edge list, I decided to create my own node list so that I could examine isolated nodes and associate sentiment scores for each node. Using OpenRefine, I removed any duplicate Unicode names from my original ‘Exploring Sentiment Through Emojis’ dataset and kept the column for their calculated sentiment score.

Visualization Process

Inspiration

I took inspiration from a similar sentiment analysis performed on a subreddit page called ‘skeptic’ (done by author Siobhán Grayson and available on Wikimedia Commons). I liked how the edges (reddit post) were colored according to the sentiment expressed by that node (reddit user).

I drew additional inspiration from a visual included in Seyednezhad’s (2019) A Network-Driven Approach for Characterizing Emoji Usage in Social Media. While creating a chord diagram was out of the scope for Gephi, I liked that they displayed emojis as node labels.     

Gephi

Once I fed the data into Gephi, weights were computed for the edges in my edge list (based on the sum of any repeated edges) and I ran statistics for the average degree, density and modularity. In total, this network had 141 nodes and 230 edges. The average degree was 3.26 and the network density was very low (0.02) indicating a loosely connected graph. I increased node sizes for higher degrees to draw attention to these highly connected nodes.

I added color to nodes based on their sentiment score. Gephi’s ranked color selection palette was limited, but I opted for a green to brown sliding scale – green to represent positivity in nodes with high sentiment scores and brown to show negativity in nodes with low sentiment scores.

I used the Force Atlas 2 layout and selected to prevent node overlap. I set scaling to 50 to disperse the network for better visual clarity. Further visual refinement was done in Adobe InDesign to manually add emoji characters to the network graph. Only emoji characters with degrees 4 or higher were added to the network graph.

Results

There was a clear visual divide between positive sentiment and negative sentiment, with negative sentiment largely clustered together on the left and positive sentiment clustered together on the right. The ❤️ was the most frequently occurring emoji in the #metoo tweets and was also the most connected emoji in this network with a degree of 31. While ❤️ was closely connected with other positive emojis, it was also closely connected with emojis that had the most negative sentiment score (💔, 😢, 😔). The emojis with the highest degree appeared to fall predominately on the positive or negative sentiment scale, whereas few high degree nodes encompassed more neutral sentiment (grayer in color).

As I examined the network, I noticed that all of the self-looping nodes were comprised of emojis for hand gestures. It’s interesting to see that hand gestures were so commonly self-reinforced. Though I did not cluster by Gephi-defined communities, there appears to be some communities forming for the emojis for hearts, facial expressions and gestures as they were closely connected to themselves.

Limitations/Future Work

While this network visualization provides a structural overview of the network with an emphasis on sentiment, I think there is more opportunity to dive deeper into the relationships of these emojis and move away from sentiment. Specifically, seeing how certain categories of emojis are connected – e.g., how connected emojis for facial expressions are with emojis for animals. I also found myself wanting to experiment with network graphs (i.e., chord and arc diagrams) that weren’t available in Gephi, and I felt a bit limited in the amount of online support I could find through the Gephi community to help troubleshoot when needed.