Introduction
For more than a decade, Twitter has been a place for people to express their most visceral reactions and opinions on topics they find important. As time has gone on, more and more users have flocked to the website as a resource for information and a sense of community as the platform has grown. In 2009, Twitter recorded roughly 30 million monthly users—that number has since grown to nearly 330 million as of 2019.
In that time, both sides of the U.S. political spectrum, dubbed the “far-left” and “far-right,” have gained support from more center-aligned constituents. When Twitter was launched, groups like Black Lives Matter, the Proud Boys, Democratic Socialists of America, and the Oath Keepers were nowhere near as prominent as they are today, if they even existed at that point. In 2021, these groups—once considered to be political fringe groups—make up a large portion of the political landscape that drives the two-party system.
The gigantic volume of users on the platform means the amount of information and content posted is exponential, and that information can be a treasure trove for data analysts to unlock. However, analyzing this data can be a massive undertaking and requires a great deal of care and attention to ensure the information is not being misinterpreted. In this report, I go over the methodology of collecting Twitter data to visualize the engagement around the political groups in the U.S., as well as the research outcomes I took from this project. To view the final version of the dashboard, follow the link below:
Inspiration
When this project was pitched, it was initially focused on Black Lives Matter as an organization and topic specifically. I was very interested to see how activity surrounding the Black Lives Matter movement had changed over a longer period of time, and I hoped that in doing so, I would be able to identify what events cause the higher and lower points of engagement.
An issue I ran into in my early exploration was the limited access levels for the Twitter API. I had applied for the Academic Research level of access in order to be able to collect data that spans a longer period than the 7 days basic users are allotted. I was not approved, which meant I’d only be able to collect a smaller subset of data; however, this opened up other ideas for what I could do with the smaller timeframe.
As I was brainstorming ideas, I came across a topic that was trending on Twitter: the Oath Keepers. The Oath Keepers are a far-right movement whose main set of principles revolve around the idea that the U.S. government cannot be trusted and that they are actively plotting against the American people. Because Black Lives Matter and the Oath Keepers are on opposite ends of the political spectrum in terms of policy goals, I figured it would be insightful to see how the engagement around these groups and others like them compare on Twitter. My research then shifted to discovering which of these groups have the highest engagement traffic, with the same goal of identifying specific events that drive that attention.
Methodology and Process
Tools
For this project, I decided to use a combination of Python and Tableau for my analysis and visualization. For the data collection, I experimented with using the Twitter API service, which allowed me to build my own dataset with specific queries related to the topics I was researching. This combination of tools proved to be incredibly useful, and made the process of analyzing my data a seamless one.
As for the usability testing portion of this project, I posted my dashboard on Tableau Public and created a remote user test on Userbrain.com. The platform allows researchers to upload their work and recruit participants to evaluate its usability.
Data Collection
In order to dive into the massive amounts of Twitter data, the first step was to become familiar with the Twitter API documentation. Twitter allows for most users to collect information at a surface level, allotting those with essential access to collect up to 500,000 tweets per month to use for research or other purposes. Collecting the data requires the researcher’s authentication, which is achieved by using tokens that Twitter verifies. To simplify this process, I used the tweepy library in Python, which also helps to simplify making requests to the API without the need for other libraries.
Once my profile was authenticated, I studied both the Twitter API and tweepy documentation to help refine my queries to pass. Because I wanted to have a variety of results containing several different fields, refining my query was an important step in making sure my data was properly filtered. The query I ended up passing to the API endpoint was as follows:
q = ' (("Black Lives Matter" -is:retweet) OR (#BlackLivesMatter -is:retweet) OR ("Proud Boys" -is:retweet -"us proud") OR
(#ProudBoys -is:retweet) OR ("Democratic Socialism" -is:retweet) OR (@DemSocialists -is:retweet) OR ("Oath Keepers" -is:retweet)
OR (#OathKeepers -is:retweet)) lang:en'
After testing this query on smaller requests, I was ready to launch the big request, which required a good bit of critical thinking and coding in Python. Because users are limited to 100 results per request and 180 requests within a 15-minute timeframe, it was important for this chunk of code to run smoothly so as to not waste any time or credits on bad requests. The screenshot below shows the process for the data collection, which resulted in a list of results with 33756 records. The records were then inserted into a dataframe using Python’s Pandas library, which helps manage data for easier analysis.
Analysis
After collecting the data, I went a step further to analyze the tweet content for each record, calculating a general sentimentality score using VADER, an open-source Python tool that uses natural language processing methodology to score the words used in each Tweet. The scores are ranked on a scale of -1.0 to 1.0, with a neutral threshold of <=-.05 and >=.05. After generating the scores, I added them to the dataframe I created for an added layer of insight. With the analysis finished, I exported my dataset to a CSV and went ahead with continued analysis in Tableau.
While Tableau is often seen as the last step in an analysis project, I found it to be useful in examining the data as well. My first thought was to insert the data in a bar chart to determine which groups simply had the highest volume of mentions during the weeklong time period. The results showed that Black Lives Matter was talked about the most, with all other groups lagging behind significantly.
Once I understood the distribution, I wanted to also see how the sentiments around the topics compared, so I added a color filter to show the sentiments in the bar chart as well, with green representing positive, grey as neutral, and red as negative:
The visualization above became the inspiration for the direction I took with the rest of my charts on the final dashboard I created. Using several different visualization methods combined into one dashboard helped to give me perspective on the more interesting insights I could gather from the data, and the process of refining my work was interactive as I discovered more through experimenting with different visualizations. After settling on four types of visualizations to use in my dashboard, I sought feedback through participants in my usability test.
UX Research
With a working dashboard, I provided a link to the visualization to Userbrain.com and created a set of questions to ask two participants in an unmoderated user testing session, which involves delegating tasks which the user attempts to accomplish. In the session, I asked users to explore the dashboard and to think-aloud while describing their thoughts and opinions on the information displayed. The first task was to simply describe what they were seeing at a glance. Both participants were able to understand the nature of the dashboard and what insights they were gaining by exploring the information. However, both noted that they were confused by the spacing and organization of the dashboard.
The next task was to explore the bar chart using the slider to filter through dates and comment on what they can learn from the visualization. This task proved to be insightful, as both participants were initially unable to locate the slider in the upper-righthand corner of the dashboard. I took this feedback and implemented it by increasing the size of the slider for better visibility. As far as the feedback for the actual chart, participants found it insightful and were intrigued by the distribution of positive and negative sentiments.
The last task was to examine the line chart at the bottom, which both participants found to be the most visually engaging and approachable part of the dashboard. Because the chart showed an anomaly where the Proud Boys and Oath Keepers were mentioned significantly more at 5 pm on 12/14/2021, both participants were curious to learn more about why this may be the case.
For general feedback, both participants suggested that the dashboard could be organized a bit cleaner. The first user mentioned that the bar chart had too much empty whitespace and that he would have liked to be able to use the dashboard without scrolling. Coincidentally, this same feedback was mentioned by my peer reviewer for this project, so it was clear that this suggestion should be taken into account. I also helped correct spacing by taking a suggestion from my peer reviewer to limit the max value on my line graph to 500 in order to help differentiate the lines a bit more clearly. Overall, however, I received great feedback on the information displayed and only had to tweak a few design choices.
Results and Reflection
The process of collecting, organizing, analyzing, and visualizing my data was great experience. Through my research, I was able to accomplish my initial goal of visualizing the volume of engagement around these far-leaning political groups on Twitter. What stood out in my findings was that while Black Lives Matter was the most talked-about group out of the four I decided to highlight, the average sentiment expressed by the tweets I collected was generally negative. It was also interesting to see that the far-right groups were often talked about in a more positive light, which may shed light on how the constituents of each group view each other.
Another interesting insight I gained by visualizing the data was the anomaly that occurred on Dec. 14, 2021. The data shows that tweets mentioning both the Proud Boys and Oath Keepers far outweighed those mentioning other groups at that same hour. Upon further inspection of the tweet content, I discovered an event that spurred the attention. At that time, several Twitter users were engaged in discussions about the news that D.C. Attorney General Karl Racine would be suing the Proud Boys and the Oath Keepers in relation to their involvement in the Jan. 6, 2021 insurrection attempt. Being able to highlight the anomaly and trace the higher volume of tweets to a specific event was encouraging, and having a dashboard like the one I created may be helpful to identify other events like this one in the future.
Reflecting on the process, I certainly faced some challenges that I would love to rectify given the opportunity to continue further work on this dashboard. One of the main challenges as I mentioned before was that I only had the essential level of access to the Twitter API, limiting my search results to the past 7 days. Without the announcement of A.G. Racine’s lawsuit, I may have not been able to identify any anomalies within the collected data. I hope to be approved for academic research access for Twitter’s API in the future to paint a larger picture of the volume of engagement for each group.
Another issue I encountered was that some results were unable to be identified as matching keywords related to the groups I was researching. Though I was able to refine my search query to the best of my abilities, some tweets went unclassified and were therefore delegated to the null category. Furthermore, because of the API request limits, I was forced to search for mentions of each of the groups in one query, making it harder to classify exactly how many tweets mentioned each group. I ended up classifying each tweet with multiple group names if they were mentioned (i.e. “Black Lives Matter, Proud Boys”). In hindsight, it may have been a better idea to query the results separately in order to have a better representation of each individual group’s metrics and combining the datasets after. However, once I realized this, I had already neared Twitter’s monthly cap for requests and could not query more.
This project was very thought provoking and fun to work on, and I found that I was engaged throughout the whole process. I would definitely like to further refine my work in the future and hopefully expand the scope of data for even deeper insights.
References
— https://public.tableau.com/app/profile/kailen.santos3222/viz/Politically-DrivenMovements/Dashboard1?publish=yes
— https://blacklivesmatter.com/about/
— https://www.splcenter.org/fighting-hate/extremist-files/group/oath-keepers
— https://www.splcenter.org/fighting-hate/extremist-files/group/proud-boys
— https://www.python.org/
— https://public.tableau.com/en-us/s/
— https://developer.twitter.com/en/docs/twitter-api
— https://www.userbrain.com/en/
— https://docs.tweepy.org/en/stable/
— https://pandas.pydata.org/