The Hong Kong National Security Law is a legislative piece that establishes the crimes of secession, subversion, collusion, and terrorism against the People’s Republic of China. It is controversial legislation widely viewed to be an opposing response to the 2019 and 2020 pro-democracy protests in Hong Kong. The Hong Kong National Security Law was passed and came into effect July 2020.
This visualization project analyses social media reactions in the wake of this law coming into effect. The data set used for this project is a collection of tweets collected from Twitter in the week following the approval of the Hong Kong National Security Law. This project does not conduct a sentiment analysis of the tweets, but rather focuses on the reach and demographics of the news.
My inspiration for this project was primarily derived from two longitudinal public opinion polls conducted by the University of Hong Kong. The first poll asked respondents “Which of the following channels are your main source of news?” In 2000, when the poll was first conducted, television (81.1%) and newspapers (70.9%) were the most commonly reported main sources of news. By 2019, the most recent data available, the internet (69.4%) had surpassed all other news sources to be the most commonly reported main source of news.
The second poll asked respondents “When there is a conflict of information, which of the following news channels do you trust most?” In this poll, television prevailed over time as the most trusted news source, but family members and the internet made sizable gains in the 25 years that data has been reported.
Based on these two polls, I initially wanted to conduct a study of changes in trust of media in Hong Kong in recent years as a response to political conflict in the city. However, no data sets were available for either of the polls, so I searched for a more robust data set that revolved around a similar theme.
The data set used for this project is the Hong Kong – National Security Law 2020 data set available on Kaggle. The data set is pre-processed to select a random 7,000 row sample of tweets published from July 5th, 2020 to July 12th, 2020. Because the data set was pre-processed, only columns for tweet text, date, number of retweets, verified account status, and country were retained. Note also that the sample size pulled only tweets published in English, so the data may be skewed towards English-native countries.
To create the dashboard, I used Tableau Public. Tableau Public is a free platform that allows you to upload or connect to data, create visualizations, and publish your visualizations on their public site. I uploaded my data set as a CSV file to Tableau Public and found the parsing of the data to be mostly accurate, as I only needed to refine the formatting for the dates.
For the first visualization, I wanted to see where users engaging with news of the Hong Kong National Security Law announcement were tweeting from. I filtered the data to omit rows where the country was unavailable, and then applied the remaining data to a map.
For the next visualization, I used a line chart to show the activity level each day of the week following the Hong Kong National Security Law announcement. I also separated tweets from retweets to show the distribution of original versus refraining data.
For the last visualization, I used a bar graph to understand how many original tweets were published by verified accounts. Verified accounts are more likely to be journalists, politicians, or organizational accounts, and I was interested to see how many of these accounts were engaging with the topic.
Following feedback received from a peer review, I updated axis titles, added a tooltip, and standardized the colors of my dashboard to create a more cohesive product.
Dashboard & Interpretation
Please visit Tableau Public here to view the dashboard.
From the visualizations, I interpreted three main findings from the data:
- There is a heightened Twitter response from countries that commonly receive immigrants from Hong Kong (US, UK, Canada, and Australia)
- The Twitter response was the strongest the first full day after the announcement of security law approval
- Users were more likely to retweet tweets from verified users
When I sent my Tableau dashboard to a peer for feedback, I specifically requested feedback asking if the above findings were clearly visible in the charts. After receiving feedback, I realized that I needed to account for bias – both my own confirmation bias and bias in the data. For example, in the first finding, it is true that Twitter responses were the most prevalent in the US, but Twitter is also an American company, and the Twitter user base is overwhelmingly American to begin with. Additionally, while publishing of original tweets peaked on July 7th, 2020, retweets of relevant tweets peaked four days later on July 11th, 2020. I felt optimism when I first viewed the bar chart showing that the majority of retweets come from verified accounts, but it is possible that users were only retweeting these tweets in order to publish a counterargument or voice opposition.
Another reflection from this project is my surprise on the difficulty of finding public data sets. Originally, I wanted to research the impact of social media friends and friendship networks more, but I was unable to find data sets with sufficient rows. With more time, I would have liked to pull my own data set from Twitter’s developer toolkit in order to work with fully raw data.