Say it with an emoji 🤭


Final Projects, Visualization

Introduction

‘Say it with an emoji’ is an interactive data visualization project that explores how we use emojis to communicate sentiment. Emojis have become an important mechanism for conveying human-emotion in writing. In this vein, they can be useful tools for analyzing public sentiment expressed via online channels as emojis can span languages and express sentiment in a more universal way. Inspired by online social justice movements, this project looks at how emojis might help us explore public sentiment during one of the heights of the #MeToo movement on Twitter.

While emojis are commonly used as a supplement to written text, here they’ve been brought to the forefront. Using a network graph for the basis of this visual, this data viz allows users to click through emojis to discover its connections with other emojis during the #MeToo movement and learn more information about how it was used in the dataset. By clicking on an emoji, a user will be shown its related metadata about how many times it occurred in the dataset, its average sentiment score, its associated tweets, and other emojis that it’s connected with. The two key goals of this visualization are to:

  1. Be discernible to a variety of audiences
    1. From those with no familiarity with network graphs to those that are well-versed – as well as those that may be more interested in the #MeToo movement versus those that are more interested in emojis.
  2. Be easy to navigate and explore for differing purposes
    1. Users should be able to explore the data visual based on their information needs from several different facets: users can explore by the emoji itself, its sentiment score, degree centrality, and/or related connections.

Data

Data collection

Data for this project was collected from two openly available sources. The first dataset was the Emoji Sentiment Ranking data made available on Kaggle. This dataset was created in 2015 and is comprised of 751 emoji characters and their assigned sentiment. Sentiment for these emojis was calculated based on 70,000 tweets that were labeled by human annotators in 13 European languages. Sentiment is broken down into 3 classes: positive, negative and neutral sentiment. Emojis are then assigned values based on the number of times annotators found that they expressed positive, negative and neutral sentiment and instructions on how to calculate an overall sentiment score was provided by the authors. This dataset included fields for:

  • The emoji symbol
  • The emoji’s Unicode name
  • The emoji’s positive, negative and neutral score given by annotators
  • Average sentiment score – this was calculated manually following the calculation outlined by Kralj et al. (2020)

The second dataset, Tweets With Emojis – #MeToo, was downloaded from Data.World and included a sample of tweets from the height of the #MeToo hashtag. Tweets were pulled from a sample of 28,629 English-language tweets on Oct. 16, 2017. This dataset included fields for:

  • The tweet text and URL
  • Number of emojis used (if any) in the tweet
  • The emoji symbol
  • The emoji Unicode name

Data cleansing & transformation

Using OpenRefine, the two datasets was joined together into one table based on their matching Unicode names. Because the sentiment ranking dataset was from 2015 and the #MeToo dataset was from 2017, some of the emojis used in 2017 weren’t around in 2015 therefore those emojis without a sentiment analysis were removed as records. I then deleted any records that didn’t contain emojis in the tweet. This resulted in a table with 1,218 tweets, containing a total of 1,633 emojis of which 141 of those emojis were unique. This table created the foundation for the nodes and edge lists that were manually prepared for the network graph. The final nodes list included fields for:

  • The emoji symbol
  • Unicode name
  • Avg. Sentiment score
  • Tweet text
  • Number of occurrences

The nodes and edge lists were then imported into Gephi to create the network graph. Edge weight and node degree were computed, and the Force Atlas 2 algorithm was run. Node size was then ranked by degree, edge thickness was ranked by weight and color was ranked by average sentiment score. Once the network graph was rendered the way I wanted it to be in Gephi, the graph was exported into a sigma.js file using the Oxford Internet Institute plugin. The plugin also exported a collection of HTML, CSS, JavaScript and JSON files to build an interactive version of the Gephi-created network graph. The JSON file containing the node & edges data was further cleaned in OpenRefine to ensure that node attributes (i.e., tweets, sentiment score, etc.) appeared in a chronological order when shown on screen.


Design

Research process

The design for this data viz followed an iterative process that was grounded in user research. Three user testing sessions were conducted throughout the design process with participants selected based on a defined target audience. The target users for this visualization are intended to be broad but could be defined more specifically into roles: social media analysts, social researchers and journalists – and those of the general public that have an interest in the #MeToo movement and/or emojis. For this user research, two of the testing participants were from this ‘general public’ group that had an interest in both #MeToo and emojis and one participant was a social media analyst. Due to the exploratory nature of the visual, the user testing was conducted as an open observational inquiry rather than a more controlled task-based testing session. Participants were shown the viz and instructed to explore at their leisure while ‘thinking out loud’ throughout the process. This method utilized unstructured interview questions where questions were formulated on the spot based solely on the user’s unique behavior.  

Inspiration

The design for this visual was based on a previous network graph I had made for another class lab that used the same dataset (see Figure 1). This network graph was also created in Gephi but was exported as a static image. Emojis had to be manually added to the graph in Adobe InDesign as Gephi wouldn’t render them as labels. For this project, I wanted to build on this network graph by providing context to each emoji and the only way to do this was to create an interactive web visualization.

Figure 1. Network Lab Graph

I drew inspiration from a project called ‘Movie Galaxies’ that had taken a Gephi rendered network graph and made it web interactive (see Figure 2). I liked that they added their own customization to the Oxford Internet Institute plugin by changing the labels, hover, and side panel design, which inspired me to add my own custom touches to my interactive network graph.

Figure 2. Movie Galaxies Interactive Network Graph

Stage 1

The first iteration of the design seen in Figure 3 provided a skeleton for which the visual would later be built upon. The color of the nodes on the network graph were selected based on aspects of color psychology – green to represent positivity in nodes with high sentiment scores and a rust/brown to show negativity in nodes with low sentiment scores.

Figure 3. First Design Iteration

After rendering this first iteration, I decided to restructure the JSON data so that labels would show up as emoji characters rather than Unicode names. I also returned to Gephi and re-rendered a new version of the network graph by changing the Force Atlas 2 algorithm with scaling set to 30 and preventing node overlap to create more room for the nodes (see Figure 4). I also set gravity to 50 to bring the node outliers closer to the center of the network so that they would appear on the screen.

Figure 4. Second Design Iteration

Stage 2

With the second iteration of the visual developed, I wanted to put this in front of as user to do some rapid testing of my new design. I decided to do this first testing session keeping the same minimalist legend from the first iteration to understand how a user with no knowledge of a network graph would be able to (or not able to) comprehend the graph. I provided them with a brief overview of the dataset and underlying concept of a network graph to give them an idea of what they would be looking at and then gave them free range to explore.

Key findings

  • Language was too jargon-y. Even after providing the user with a preliminary explanation of a network graph, the language on the legend (i.e., nodes & edges) was still too jargon-y for the average user to easily intuit. However, the small icons next to the legend (i.e., circle & line) made the concept easier for them to understand and provided me with an idea to make the legend more visually narrative.

With my first user testing session completed, I started to focus my attention towards crafting a new legend (see Figure 5). 1). I developed a legend that added more text description and visual cues. I removed any references to ‘Nodes’ and ‘Edges’ and instead referred to them as circles and lines. 2). At this point, I also crafted a written overview of the dataset so that I could situate the user without them needing my verbal narrative to understand what they’re looking at. This written overview took the form of a pop-up modal, as I didn’t want it to take up too much space on the legend. 3). I also expanded the search section to include a link to all the emoji Unicode names so that users had the tools to be able to search effectively.

Figure 5. Third Design Iteration

Stage 3

After developing the new legend, I returned to user research once more with two new user testing participants. This time I provided no narrative upfront, and only provided them with a link to the visual and told them to explore while thinking out loud.

Key findings:

  • Users skipped over the dataset overview. My goal was for them to read this narrative overview first as it provides crucial information about the visual that users would need to know upfront. Yet, both users skipped over it and immediately started reading the ‘How to Read’ section.

With this key new finding from my user testing sessions, 1). I moved the ‘dataset overview’ link to the ‘How to Read’ section and included instructions to ‘Start here’. I gathered additional feedback from a classroom workshop session to include 2). a subtitle that better associates this project with the #MeToo movement, and to 3). rename the ‘Information Pane’ with something a little less monotonous (see Figure 6).  

Figure 6. Fourth Design Iteration

Results

Overall, I was pleased with how exploratory users were with the data visualization. In fact, my user testing sessions went on longer than I had expected because users wanted to continue exploring with the different emojis. I was surprised that all my users were first drawn to explore with the outlier nodes, with one user saying they were more interested in learning about what they consider to be the “not-so basic” emojis. The users were also able to grasp the concept of a network graph easily and understood this concept of an ‘outlier’ node without me providing any explanation.

As users were exploring with the outlier nodes, we realized that some ‘trolling’ tweets made it into the dataset. Some tweets were hateful towards the #MeToo movement, while some only used the #MeToo hashtag to promote their content on a trending movement (with their tweet having no relation to the movement). I decided to keep this data in the dataset as it felt representative of the types of content that come along with any online social movement.

It was interesting to see how each user interacted with the visual differently. One user would click on an emoji randomly and would navigate the visual through an emoji’s connections. Another user only looked for specific emojis that they found interesting and did not navigate via connections. While another user was more interested in extreme ends of the sentiment scale and looked for emojis that were in dark green or dark brown circles. The two participants from the ‘general public’ user group navigated the visual with less of a purpose in mind, while the ‘social media analyst’ user navigated the visual more purposefully. The social media analyst was much more interested in an emojis connections, saying that its difficult to get emoji combinations “right” and that they were interested in how other people used emojis together to express sentiment. They were also more engaged in reading the tweets associated with the emoji and seeing how each emoji matched their interpretation of what the tweet was expressing.