{"id":91,"date":"2017-05-10T21:54:45","date_gmt":"2017-05-10T21:54:45","guid":{"rendered":"http:\/\/research.prattsils.org\/fakenews\/?page_id=91"},"modified":"2017-05-10T21:54:45","modified_gmt":"2017-05-10T21:54:45","slug":"lexisnexis","status":"publish","type":"page","link":"https:\/\/studentwork.prattsi.org\/fakenews\/data\/lexisnexis\/","title":{"rendered":"LexisNexis"},"content":{"rendered":"<p>LexisNexis is a database of traditional news articles. It aggregates the full text and metadata for\u00a0each article it collects.<\/p>\n<p>For our project, we downloaded both the full text of each article from November 7, 2016, to March 5, 2017 (for a total of 11, 834 articles), as well as\u00a0their metadata. The full text was downloaded as a plain text file, which was then divided up by day in a Google Sheet and <a href=\"http:\/\/openrefine.org\">OpenRefine<\/a>. Once the text was separated by day, it was processed using the programming language R and analyzed on Tableau.\u00a0Term frequency data per\u00a0day was calculated\u00a0from the snippets using <a href=\"https:\/\/www.r-project.org\/\">R <\/a>(<a href=\"https:\/\/github.com\/arc12\/Text-Mining-Weak-Signals\/wiki\/Standard-set-of-english-stopwords\">stopwords<\/a> were removed and terms were <a href=\"http:\/\/snowball.tartarus.org\/algorithms\/porter\/stemmer.html\">stemmed<\/a>).<\/p>\n<p>We originally downloaded all of the available metadata on each article. This included:<\/p>\n<ul>\n<li><span style=\"font-weight: 400\">Byline \u2013 author or journalist<\/span><\/li>\n<li><span style=\"font-weight: 400\">Date \u2013 range of dates<\/span><\/li>\n<li><span style=\"font-weight: 400\">Headline \u2013 article title<\/span><\/li>\n<li><span style=\"font-weight: 400\">Length \u2013 number of words<\/span><\/li>\n<li><span style=\"font-weight: 400\">Publication \u2013 source of publishing<\/span><\/li>\n<li><span style=\"font-weight: 400\">Company \u2013 corporate company name<\/span><\/li>\n<li><span style=\"font-weight: 400\">Geographic \u2013 combines country, state, and city<\/span><\/li>\n<li><span style=\"font-weight: 400\">Organization \u2013 non-company organizations<\/span><\/li>\n<li><span style=\"font-weight: 400\">Person \u2013 people mentioned<\/span><\/li>\n<li><span style=\"font-weight: 400\">Subject \u2013 search by Topic<\/span><\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>After initially\u00a0downloading all of this metadata, we decided to pare it down according to our research needs using OpenRefine. For our purposes, we kept:<\/p>\n<ul>\n<li>Byline<\/li>\n<li>Date<\/li>\n<li>Headline<\/li>\n<li>Length<\/li>\n<li>Publication<\/li>\n<li>Type<\/li>\n<li>Topic<\/li>\n<li>Weight<\/li>\n<\/ul>\n<p>&nbsp;<\/p>\n<p>Type, topic, and weight were formed through the &#8220;Company,&#8221; &#8220;Geographic,&#8221; &#8220;Person,&#8221; &#8220;Subject,&#8221; and &#8220;Organization&#8221; categories\u00a0and their\u00a0accompanying percentages.<\/p>\n<h6>See &#8220;<a href=\"https:\/\/www.lexisnexis.com\/bis-user-information\/docs\/developingasearch.pdf\">Developing a Search<\/a>&#8221; from LexisNexis for more information regarding the definitions of metadata, referred to as &#8220;document sections.&#8221;<\/h6>\n","protected":false},"excerpt":{"rendered":"<p>LexisNexis is a database of traditional news articles. It aggregates the full text and metadata for\u00a0each article it collects. For our project, we downloaded both the full text of each article from November 7, 2016, to March 5, 2017 (for&#8230;<\/p>\n","protected":false},"author":2,"featured_media":0,"parent":67,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"template-fullwidth.php","meta":{"footnotes":""},"class_list":["post-91","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/fakenews\/wp-json\/wp\/v2\/pages\/91","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/fakenews\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/fakenews\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/fakenews\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/fakenews\/wp-json\/wp\/v2\/comments?post=91"}],"version-history":[{"count":0,"href":"https:\/\/studentwork.prattsi.org\/fakenews\/wp-json\/wp\/v2\/pages\/91\/revisions"}],"up":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/fakenews\/wp-json\/wp\/v2\/pages\/67"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/fakenews\/wp-json\/wp\/v2\/media?parent=91"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}