LexisNexis – "Fake News"

LexisNexis is a database of traditional news articles. It aggregates the full text and metadata for each article it collects.

For our project, we downloaded both the full text of each article from November 7, 2016, to March 5, 2017 (for a total of 11, 834 articles), as well as their metadata. The full text was downloaded as a plain text file, which was then divided up by day in a Google Sheet and OpenRefine. Once the text was separated by day, it was processed using the programming language R and analyzed on Tableau. Term frequency data per day was calculated from the snippets using R (stopwords were removed and terms were stemmed).

We originally downloaded all of the available metadata on each article. This included:

Byline – author or journalist
Date – range of dates
Headline – article title
Length – number of words
Publication – source of publishing
Company – corporate company name
Geographic – combines country, state, and city
Organization – non-company organizations
Person – people mentioned
Subject – search by Topic

After initially downloading all of this metadata, we decided to pare it down according to our research needs using OpenRefine. For our purposes, we kept:

Byline
Date
Headline
Length
Publication
Type
Topic
Weight

Type, topic, and weight were formed through the “Company,” “Geographic,” “Person,” “Subject,” and “Organization” categories and their accompanying percentages.

See “Developing a Search” from LexisNexis for more information regarding the definitions of metadata, referred to as “document sections.”