A little background
Post-Truth, as referred to in depth within Lee McIntyre’s book “Post-Truth”, has been a concept that populations have been wrestling with since the first published papers were cascaded amongst literate audiences. In accordance with Lee’s work, “The phenomenon of “post-truth” rocketed to public attention in November 2016, when Oxford Dictionaries named it 2016’s word of the year”. Personally, it’s a growing concern due to how cumbersome it’s become to discern fact from fiction or satire. Sometimes it feels like all news sources are bias; each with their own agenda. Especially when it comes from a person in power telling you not to believe true sources like doctors or other experts. When I came across a dataset put together by Clément Bisaillon, which portrays the fake and real news coverage that occurred over the course of 2015-2018, my interest peaked. After looking further into the dataset, I discovered some surprising gaps in data and surprising trends. The driving questions that led this inquiry included: Which news sources were producing the most fake vs. real news? How did the creation of fake news occur over time? Did the creation of true news change in response?
Process and Tooling
Dataset used:
Tools:
- Kaggle: A website that hosts the largest open source data library.
- OpenRefine: A software application used to cleanse and transform data.
- Tableau: A software application used to create data visualizations and dashboards.
Kaggle
I found it particularly easy to use for identifying a dataset due to its usability score (data cleansing needs) and filtering based on:
- File type
- File size
- Open Database qualification
This part took more time than I anticipated due to the fact that we needed a dataset that provided a timeline of data collection vs. a point in time.
OpenRefine
Once selected, I uploaded it into OpenRefine to clean and combine the two files provided by the data set. I figured out a quick way to do this by opening both files at once.
I used the “Split into several columns…” and “Rename this column” features the most to clean up data elements such as the date formatting. Since I opened both files at the same time, and data set files used the same headers, it created a column to call out the files such as “Fake.csv” and “True.csv”. Through the “Split into several columns…” feature, I could easily delete the unnecessary column text (ie.csv) in a few clicks.
Some helpful videos that helped me through these strategies can be found on OpenReform’s Home page.
The “Text facet” feature helped to identify the categories and modify or combine duplicates as needed. The editor made this effort minimal. I was able to combine the “politics” and “politics_news” segments and create more consistent sounding categories as a result which are listed to the left. General was originally “News” this felt redundant since these were all inherently news sources, whether geo-specific like “US_News”, “World”, and “Middle-East”, or topic specific like “Government”.
Lastly, “Text filter” was very helpful in identifying and removing unwanted data that was null.
Tableau
Once I exported my file from OpenRefine as a .csv, I imported my file as text into Tableau to visualize the data as a dashboard. The tool is fairly helpful in guiding the user to select the appropriate visualizations that are made possible given the variables selected from the dataset. I chose the packed bubbles and the horizontal bar visualizations to better understand where the fake news was coming from and how abundant it was. Helpful features like the “Tool Tips” allowed me to generate simplified hover-over text.
I found there were more editing options in terms of removing headers and other elements from the sheet vs. the dashboard view. From there I was also able to separate the variable colors using the group variable feature, which allowed me to create separate colors for the True and Fake news.
I chose to use the colors blue and orange from the color blind color palette, which I felt provided the right emotion toward each topic area. A blue, to signify the calming aspects of easily seeking the truth in contrast to the alarming orange of the fake news. Orange has been said to evoke anger, which is exactly how I feel about the creation of fake news.
I only showcased the data from 2016 and 2017 because offered the most meaningful comparison of generated fake news and true news. Unfortunately the data only recorded fake news occurrences for 2015, which would have created a skewed view for comparison in lieu of no true news being recorded. Similarly, there was too little data collected for 2018; creating a skewed view of both fake and true news generated. In truth, this dataset includes a lot of text intended for creating algorithms to vet out fake news in a machine learning format.
Results
By using the packed bubbles, this visual enables the user to quickly understand which news sources are generating the most fake news, by news source. Additionally, the horizontal bar graph helps the user see the trend in content creation over the course of the first two years of the Donald Trump’s Presidency. This isn’t to put down the President, but it is a helpful observation to visualize the response of news sources to the rise in fake news. It could be assumed from these views, that there was an intentional increase in true news content generation to combat the spread of fake news. It is also interesting to note the amount of fake news decreased between 2016 and 2017.
User Testing
Through conducting user testing, I was able to validate my choice of data visualizations to represent the data effectively. They found that the packed bubbles provided a helpful visual to better understand the situation regarding the categories of news sources while also easily understanding the growth trends of content generated over the course of time. I was also able to generate helpful hover-over text that was simplified and clear. The only thing that they would have liked to see was the smaller bubble categories spelled out somewhere. However, due to the limitations of the tool, there was no way to have the lettering including apart from hovering from what I could tell. I tried experimenting and when the text ran over the bubble width, it wasn’t a clean look. They liked the color choices and felt it represented the variables well.
Next Steps
This visualization offers a helpful view to grasp how bad fake news became not too long ago. If I had more time, I would conduct more due diligence to validate the actual amounts of fake and true news over the course of a longer span of time (preferably to 2021) by accessing more datasets. It would also be interesting to see the demographics of who consumes the most information from each of the data sources. This could help give determine the level of misinformed populations and strategize around how to best target those audiences to offer easier access to true news.
References
McIntyre, L. (n.d.). Post-Truth | The MIT Press. The MIT Press.