{"id":5974,"date":"2016-12-11T23:40:45","date_gmt":"2016-12-12T04:40:45","guid":{"rendered":"http:\/\/research.prattsils.org\/?p=5974"},"modified":"2016-12-11T23:40:45","modified_gmt":"2016-12-12T04:40:45","slug":"visualizing-news-coverage-puerto-rico-2006-2016-new-york-times","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/infovis\/visualization\/visualizing-news-coverage-puerto-rico-2006-2016-new-york-times\/","title":{"rendered":"Visualizing News Coverage of Puerto Rico From 2006-2016 in The New York Times"},"content":{"rendered":"<p style=\"text-align: justify\"><strong>Introduction<\/strong><\/p>\n<p style=\"text-align: justify\">This analysis focuses on <i>The New York Times<\/i> coverage of Puerto Rico from 2006 to 2016. I thought of this idea while reading <i>The New York Times<\/i> article \u201cZika Cases in Puerto Rico are Skyrocketing.\u201d In the article the author described PR as an island \u201cin chaos\u201d where the \u201cwar against the Aedes aegypti mosquito&#8230; is sputtering out in failure.\u201d I found that coverage of PR tended to sway to the negative. However, my opinion is subjective and I wanted to quantify this hypothesis.<\/p>\n<p style=\"text-align: justify\">To achieve this objective, I needed to create a dataset from scratch. This process involved aggregating the top ten articles about Puerto Rico from 2006 to 2016 from the <i>NYT\u2019s<\/i> online archives. I excluded any articles from AP\/Reuters and articles that only mentioned the island in passing. I chose the <i>NYT<\/i> because it is a well-respected outlet that holds an esteemed position among American newspapers.<\/p>\n<p style=\"text-align: justify\">I compiled the following information on the selected 110 articles: dates, headlines, URLs, section where the article had been published, and authors. I wanted to verify if I could detect any patterns or trends in these variables.<\/p>\n<p style=\"text-align: justify\"><strong>Designing the Visualizations<\/strong><\/p>\n<p style=\"text-align: justify\">The purpose of my analysis was twofold: to analyze the selected articles metadata and to detect patterns in the words used to describe the island, its people and issues. I used Google Sheets to create the dataset. After I had finished compiling the data, I uploaded the sheet to Tableau Public.<\/p>\n<p style=\"text-align: justify\">For my first visualization, I copy\/pasted the content in the dataset\u2019s articles from 2006, 2011, and 2016 in a Word document. I successfully extracted the text from 30 different articles. I then uploaded all the text into Voyant Tools by year. I set rules to define the words that the tool should ignore when performing the analysis. The \u201cstopwords\u201d were: Puerto, Rico, Rico\u2019s, Rican, Mr, Said, Like, and It\u2019s. In the Word document, I replaced all instances of \u201cSan Juan\u201d and \u201cUnited States\u201d with \u201cSanJuan\u201d and \u201cUnitedStates,\u201d so Voyant would recognize the words as one term. Once Voyant analyzed the text, I exported the information as a text file. I then uploaded these three files into Tableau. For these three bar graphs, I selected \u201cTerms\u201d as the dimension, the sum of the records as the measure, and sorted the information in a descending manner. I filtered the information, so the visualization would only include the top ten words in each year. I wanted users to view these three graphs side-by-side, but each graph had a different amount in the y-axis. To solve this issue, I amended the 2011 and 2016 graph, so their y-axis would match the one from 2006. I then created a Dashboard using the three bar graphs and titled it \u201cTop Ten Words Used in Articles About Puerto Rico by <em>The New York Times<\/em>.\u201d Each individual graph was labeled with its corresponding year in the sub-header.<\/p>\n<p style=\"text-align: justify\">For my second visualization, I used Voyant Tools to analyze the text in the 110 headlines that had been selected from 2006 through 2016. I set rules to define the words that the tool should ignore when performing the analysis. The \u201cstopwords\u201d were: Puerto, Rico, Rico\u2019s, and Rican. Once again, I replaced all instances of \u201cSan Juan\u201d and \u201cUnited States\u201d to \u201cSanJuan\u201d and \u201cUnitedStates.\u201d Once Voyant analyzed the text, I exported the information as a text file. I then uploaded this file into Tableau. I selected \u201cTerms\u201d as the dimension and the sum of the records as the measure. I chose to use a bar graph and sorted the information in a descending manner. I filtered the information, so the bar graph would only show words that had been used three times or more during the ten-year period.<\/p>\n<p style=\"text-align: justify\">My third visualization involved analyzing the metadata of the selected 110 articles from 2006 through 2016. I chose the \u201cSections\u201d and \u201cAuthors\u201d variables as the dimensions and the sum of the records as the measure. I settled on a bar graph with a dual x-axis. I clustered the bars by the section were the articles were published. Then, I sorted the graph by section with the most articles published to least published. I also filtered the information, so the graph would only show the top ten section with the most articles. I realized that time would be a useful variable for context, so I added it as a marker set to discrete. Each bar now had a color that corresponded with the year when the article was published. Since the bars were colored in a gradient scale that was a little hard to see, I organized the bars in each section by the year they were published in a descending scale.<\/p>\n<p style=\"text-align: justify\"><strong>U\/X Testing<\/strong><\/p>\n<p style=\"text-align: justify\">I setup my laptop with the three visualizations at a coffee shop around the Dekalb stop in Bushwick. I politely asked three individuals that were sitting at the coffee shop if they minded performing user testing of these three visualizations for a class project. They were all generally friendly and consented. User 1 and User 3 were a little bit more guarded than User 2, and were self-conscious of not saying anything \u201cstupid.\u201d I tried to be even more cheerful when they supplied me with answers and emphasized that it was my project being tested, not them.<\/p>\n<p style=\"text-align: justify\">Users first performed a <span style=\"text-decoration: underline\">think-aloud<\/span> of the visualizations followed by <span style=\"text-decoration: underline\">task completions<\/span>. During the testing I supplied users with Figure 3 first, then Figure 2, and then Figure 1. Figure 3 proved to be the hardest to understand. My title was not clear enough and I had not labeled the axis correctly. Users missed the section headers in the top x-axis and did not understand what the names in the bottom x-axis meant. I had chosen a green gradient to mark the time span, but it was not easy for users to distinguish between the shades. User 1 commented that Figure 3 was \u201ca little overwhelming.\u201d User 2 commented that the section titled \u201cUnited States\u201d was confusing, since she did not know that was the name of a section in the newspaper. She suggested labeling this visualization better. User 3 seemed upset by Figure 3. He strongly suggested labelling the visualization better, providing users with more context, and making the \u201cSection\u201d column more noticeable. He hated the original title of Figure 3 and stressed that it should be more specific.<\/p>\n<p style=\"text-align: justify\">After the think-aloud, the participants used the visualizations to answer questions, such as &#8220;Who wrote an article in the Travel section during 2006?&#8221;, &#8220;What section did Ben Ratliff write for?,&#8221; &#8220;What was the most popular term used in articles during 2011?&#8221;. I was surprised to learn that users liked Figure 3&#8217;s color gradient, which I thought might be confusing. They also understood that under each section the bars were organized by year in a descending manner. However, they used Tableau\u2019s hover feature to help them figure out the dates, which would not be possible on WordPress. User 2 recommended that I use something more dramatic than a\u00a0one-color gradient to mark time for Figure 3. User testing made me realize Figure 3 required a lot more work.<\/p>\n<p style=\"text-align: justify\">Users easily understood Figure 2 and the testing went by smoothly. The users favorite visualization was Figure 1. They all commented that they enjoyed this one the most. User 2 commented that it was a good idea to place the three graphs side-by-side. The three users laughed at the graph for 2006, which has \u201cYankee\u201d and \u201cDaddy\u201d among the top terms. What I did not expect was that users would create their own stories with the data after seeing Figure 1. User 2 remarked, \u201c2006 was a good year for reggaet\u00f3n. In 2011 there was a mild interesting in Puerto Rico. And 2016 was a [expletive].\u201d<\/p>\n<p style=\"text-align: justify\"><strong>U\/X Amends<\/strong><\/p>\n<p style=\"text-align: justify\">Using the information I gathered from the testing, I decided it would be best to organize the visualizations the following way: \u201cTop Ten Words Used in Articles About Puerto Rico by <em>The New York Times<\/em>\u201d first, followed by \u201cMost Popular Words Used In Headlines About Puerto Rico By <em>The New York Times<\/em> From 2006-2016,\u201d and leave the densest visualization \u201cTop Ten <em>New York Times<\/em> Sections Containing Articles About Puerto Rico and Their Corresponding Authors From 2006-2016\u201d last.<\/p>\n<p style=\"text-align: justify\">In regards to color, I decided to leave the default blue for all the visualizations related to words (Figure 1 and 2). Users liked the color and it is similar to the blue used by the Pew Research Center in their visualizations. Figure 3, however, was trickier. Because color is used to present 11 different variables, the initial green\u00a0gradient was hard to read. However, using a palette of multiple colors made the visualization even more overwhelming. I settled on a \u201cRed to Gold\u201d gradient and decided to link Figure 3 to Tableau&#8217;s interactive graph, so users could use the hover tool outside of WordPress. I chose the &#8220;Red to Gold&#8221; colors\u00a0based on User 2\u2019s recommendation\u00a0and because red is the first color in the hierarchy of color coding.<\/p>\n<p style=\"text-align: justify\">I decided to improve the titles and axis labels on all the visualizations, especially Figure 3. In Figure 3, I changed \u201cNumber of Records\u201d in the y-axis to \u201cNumber of Articles.\u201d I increased the size of the text from 9 to 11 in the x- and y-axis to improve user readability. I wish Tableau Public allowed me to type \u201cAuthors\u201d below the x-axis, but I was unable to do this. I also made the line along the \u201cSections\u201d more heavy-set, so users would notice that each topic was self-contained.<\/p>\n<p style=\"text-align: justify\"><strong>Analysis<\/strong><\/p>\n<div id=\"attachment_5980\" style=\"width: 864px\" class=\"wp-caption aligncenter\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5980\" class=\"wp-image-5980 size-full\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2016\/12\/Articles.png?resize=840%2C831\" alt=\"Figure 1: Top Ten Words Used in Articles About Puerto Rico by The New York Times\" width=\"840\" height=\"831\" \/><p id=\"caption-attachment-5980\" class=\"wp-caption-text\">Figure 1: Top Ten Words Used in Articles About Puerto Rico by The New York Times<\/p><\/div>\n<p style=\"text-align: justify\">As observed by the three users, Figure 1 does a good job of documenting major events in PR and how they are perceived in the US at different points in time. In 2006, prior to the crisis, it seems that music and other cultural topics were the island\u2019s main draw. During 2006, the <i>NYT<\/i>\u2019s magazine did a profile piece on reggaet\u00f3n artists Daddy Yankee which is why both words figure so prominently. The author referred to the artist solely as Yankee at times throughout the piece, which is why I did not join the words on Voyant. Interestingly enough, the pending economic crisis was already brewing at the time. Section 936 expired in the island that year, which had previously allowed American companies to operate in the island without paying taxes. Multinational corporations left PR in droves, ushering a wave of unemployment in the island. However, this news did not factor into the top ten retrieved articles. During 2006 reggaet\u00f3n was approaching its zenith and was probably blasting in every corner of New York City. The top ten words from the selected articles of 2006 reflect a tropical place with a music-centered vibe.<\/p>\n<p style=\"text-align: justify\">Meanwhile, 2011 displays more politically charged terms and personal names figure into the top 10 words used in the selected articles. That year President Obama made a rare presidential visit to Puerto Rico, becoming the first president to visit the island officially since John F. Kennedy. The amount of attention paid to this visit can be perceived from the 2011 graph; the first 3 words allude or directly reflect his visit. The <i>NYT<\/i> also ran articles on the New York Giants Victor Cruz, who is half Puerto Rican, and on writer Esmeralda Santiago who moved to New York during her teenage year and had just released the novel \u201cConquistadora.\u201d Ana is the name of the main character of the novel which figured into the ten most popular words. Six of the ten terms correlate with direct ties to New York or the US\u2019s relationship with PR. Only two words, \u201cStudents\u201d and \u201cPipeline,\u201d allude to situations with no direct correlation to the mainland: the student protests of 2011 in anticipation of a university tuition hike and the protests against the proposed gas pipeline during the Fortu\u00f1o administration.<\/p>\n<p style=\"text-align: justify\">Unsurprisingly, the bar graph from 2016 contains the bleakest terms. They all allude to the financial crisis or the Zika outbreak. \u201cPower\u201d refers to Puerto Rico\u2019s electric power authority which is in debt and caused an island-wide blackout this year. The <i>NYT<\/i> also covered the Zika situation in PR incessantly, it appeared 30 times in the selected ten articles.<\/p>\n<p style=\"text-align: justify\">There is a tonal shift in the words used in the articles to describe PR over the decade described in the visualization. It is interesting that the term \u201cisland\u201d becomes more popular as the economic situation worsens. However, there is not enough data to know whether there is a correlation between the timing and the term. Based on Figure 1, while the pending economic crisis was already brewing in 2006, it went unperceived by the newspaper until very recently.<\/p>\n<p style=\"text-align: justify\">Interestingly, the terms from Figure 1 allude to topics that mostly have direct relevance to New York or the US. PR\u2019s situation, people, and culture seem to be worth reporting only when they has a direct impact on the US. While this is not surprising, since the <i>NYT <\/i>writes for its English-speaking American audience, it underscores the importance of having a strong Puerto Rican press. Only we can cover our own issues, culture, and people in a way that directly relates to our reality. We cannot rely on others to tell our stories and write our history.<\/p>\n<div id=\"attachment_5982\" style=\"width: 1117px\" class=\"wp-caption aligncenter\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5982\" class=\"wp-image-5982 size-full\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2016\/12\/Headlines.png?resize=840%2C863\" alt=\"Most Popular Words Used in Headlines About Puerto Rico by The New York Times from 2006-2016\" width=\"840\" height=\"863\" \/><p id=\"caption-attachment-5982\" class=\"wp-caption-text\">Figure 2: Most Popular Words Used in Headlines About Puerto Rico by The New York Times from 2006-2016<\/p><\/div>\n<p style=\"text-align: justify\">Figure 2\u2019s findings are similar to those from Figure 1. \u201cDebt\u201d and \u201cGovernor\u201d top the list as the two most popular words used in the selected 110 headlines. \u201cGovernor,\u201d \u201cObama,\u201d \u201cPolice,\u201d \u201cInquiry,\u201d and \u201cVisit\u201d were the political terms used the most in headlines. \u201cDebt,\u201d \u201cUtility,\u201d \u201cFiscal,\u201d and \u201cPower\u201d were the most popular economic terms used in the headlines. \u201cUS\u201d was the most popular geographic term followed by \u201cSan Juan.\u201d \u201cSalsa\u201d and \u201cBaseball\u201d were also on the list, which User 3 commented was almost stereotypical. Nevertheless, our sports figures and music still figure as a draw for international coverage. \u201cDeath\u201d also figures among the most prominent tems, which made me reflect on the old adage, \u201cgood news doesn\u2019t sell.\u201d<\/p>\n<div id=\"attachment_5984\" style=\"width: 2190px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/public.tableau.com\/views\/NYTimesPR\/Sheet1?:embed=y&amp;:display_count=yes\" target=\"_blank\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-5984\" class=\"wp-image-5984 size-full\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2016\/12\/Metadata-NYT-PR.png?resize=840%2C444\" alt=\"Please click\" width=\"840\" height=\"444\" \/><\/a><p id=\"caption-attachment-5984\" class=\"wp-caption-text\">Figure 3: Top Ten New York Times Sections Containing Articles About Puerto Rico and Their Corresponding Authors from 2006-2016<\/p><\/div>\n<p style=\"text-align: justify\">Meanwhile, Figure 3 provides a decent overview of the 110 selected articles\u2019 metadata. One can see, for example, how Travel section articles have waned from 2006. There is an uptick of Health articles written in 2016 after the Zika outbreak, an uptick in Business section articles after the global financial crisis of 2008, and an uptick in the Real Estate section in 2014 after the PR government announced tax cuts for individuals willing to relocate businesses there. Users can also see that the most frequent contributor, especially during the last two years, is Mary Williams Walsh who covers business-related topics. Coming in second is the Miami-based Lizette Alvarez. Even though more research is needed on each individual author, this next statement is a preliminary hypothesis based on last names, most of the authors reporting on the island do not seem to be Hispanic. When I read sentences where the <a href=\"http:\/\/www.nytimes.com\/2016\/09\/18\/health\/a-mosquito-killer-unwelcome-to-many.html\">forceful sterilization of unsuspecting Puerto Rican women is dismissed<\/a> or a pregnant woman in PR\u2019s hot and tropical climate is described as wearing <a href=\"http:\/\/www.nytimes.com\/2016\/07\/31\/health\/zika-virus-puerto-rico.html\">\u201cthe skimpiest of maternity dresses,\u201d<\/a> I wonder if better and more equitable reporting could be addressed by having a more diverse staff writing on these issues.<\/p>\n<p style=\"text-align: justify\"><strong>Further Research and Conclusion<\/strong><\/p>\n<p style=\"text-align: justify\">The three visualization provide users with information on how PR is perceived by US media at different points in time, along with who writes this news and how it is framed. By selecting a sample of 110 articles, I was able to extract popular terms in these articles and headlines, along with their metadata, to provide a snapshot of\u00a0how PR\u00a0was covered by the <em>NYT<\/em> from 2006 through 2016.<\/p>\n<p style=\"text-align: justify\">I was pleased that the users were able to use the data to come to their own conclusions, especially for Figure 1. I would like to keep on refining Figure 3, since I believe it is not as user-friendly as it could be.\u00a0Supplying people with too much information leads to frustration, but even though Figure 3 is dense and requires more patience than the other two visualizations, I think it delivers important information. Even User 3 who disliked Figure 3\u00a0was able to make interesting comments, like noticing that none of the prominent <em>NYT<\/em> writers appeared. Listening to\u00a0users and having patience even when they express negative feelings is important to create a better product.<\/p>\n<p style=\"text-align: justify\">Further research could focus on collecting information on\u00a0the amount of articles writen about PR from 2006-2016 and\u00a0analyzing any patterns. Aggregating articles written by Puerto Rican newspapers, along with other state-side newspapers,\u00a0and analyzing the terms used to describe the same news would also be an interesting project. This model could also be used to research how other subjects are covered, from Cuba to rural states. As\u00a0news outlets fall under scrutiny for failing to provide thorough coverage beyond the East\/West coasts and fake news websites proliferate, it is important now more than ever to be judicious about the content of what we read, who writes it, and how it is meant to be interpreted. These types of analysis\u00a0provide us with a useful and quantitative look of our news.<\/p>\n<hr \/>\n<p style=\"text-align: justify\"><a href=\"https:\/\/docs.google.com\/spreadsheets\/d\/1dGSX93RiaXFsyrnL79GxUskuDACat1YgmoyK6I_57Ag\/edit?usp=sharing\">Link to dataset<\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction This analysis focuses on The New York Times coverage of Puerto Rico from 2006 to 2016. I thought of this idea while reading The New York Times article \u201cZika Cases in Puerto Rico are Skyrocketing.\u201d In the article the author described PR as an island \u201cin chaos\u201d where the \u201cwar against the Aedes aegypti&hellip;<\/p>\n","protected":false},"author":92,"featured_media":5980,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[],"coauthors":[],"class_list":["post-5974","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-visualization"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paBdcV-1ym","_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/5974","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/users\/92"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/comments?post=5974"}],"version-history":[{"count":0,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/5974\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media?parent=5974"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/categories?post=5974"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/tags?post=5974"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/coauthors?post=5974"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}