{"id":435,"date":"2014-12-08T12:56:17","date_gmt":"2014-12-08T16:56:17","guid":{"rendered":"http:\/\/dh.prattsils.org\/?p=435"},"modified":"2014-12-08T12:56:17","modified_gmt":"2014-12-08T16:56:17","slug":"research-without-borders-big-open-data-columbia-university-december-4th-2014","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/dh\/2014\/12\/08\/research-without-borders-big-open-data-columbia-university-december-4th-2014\/","title":{"rendered":"&#8220;Research Without Borders: Big Open Data&#8221; (Columbia University, December 4, 2014"},"content":{"rendered":"<p class=\"p1\">At Columbia\u2019s <a href=\"http:\/\/scholcomm.columbia.edu\/2014\/11\/13\/research-without-borders-panel-to-discuss-big-open-data-on-december-4th\/\" target=\"_blank\">\u201cResearch Without Borders\u201d<\/a> program last week, panel-goers were exposed to the \u201cBig Open Data\u201d phenomenon from three distinct vantage points.<\/p>\n<p class=\"p1\">The first panelist was <a href=\"https:\/\/twitter.com\/djwrisley\" target=\"_blank\">David Wrisley<\/a>, an English professor at the American University of Beirut and a Medieval Fellow at Fordham\u2019s Center for Medieval Studies. His research focuses are medieval comparative literature and the digital humanities. Wrisley\u2019s approach to data-driven research was the most familiar from a DH standpoint. He focused on how, with a large dataset, we move from consuming texts to making things out of them; being able to have these data assets \u201cin your hands\u201d, you are able to make new things out of them, manipulate them, or study them in an entirely new way. Wrisley\u2019s method is similar to the idea of a \u201cdistant reading\u201d that we studied in Franco Moretti\u2019s <i>Graphs, Maps, Trees<\/i>. In terms of medieval comparative literature, the hypertext capabilities of digitized text are particularly useful for analysis.<\/p>\n<p class=\"p1\">Linking texts to other texts through human markup is a well-established method in comparative literature, but Wrisley was more interested in discussing automatic annotations and topic modeling, as opposed to human created ontologies and the lack of collaboration that comes from one person\u2019s hand producing the markup. He also touched on the spatial humanities, or creating maps of places both real in fictional, whether in the ancient world or in literature. This was also reminiscent of Moretti\u2019s text. Finally, Wrisley brought up another method that is particularly pertinent to medieval documents; multispectral imaging. Spectral imaging allows for image data to be capture from frequencies beyond the visible light range; this allows for the extraction of information that the human eye fails to capture. In this way, there is a granularity in big data, and it allows for a super close reading, far closer than the human eye. Showing traces of old documents, Wrisley stated, allows for a radical materiality of the humanities; DH methods that harness big open data are remarkable not just in their scope, or \u201cdistant readings\u201d, but also in their ability to conduct super-close readings. I hadn&#8217;t thought of this connection; it is true that both far away and extremely close perspectives are offered through a technological\u00a0processing\u00a0of information.<\/p>\n<p class=\"p1\">The next panelist was <a href=\"https:\/\/twitter.com\/jonathanstray\" target=\"_blank\">Jonathan Stray<\/a>, a computational journalist who teaches at Columbia University. Stray is also the lead on the development of the Overview Project, an open-source document archive analysis system for journalists. He defined computational journalism as journalism that either uses, or is simply about, computation. Applying computational methods to journalism means huge datasets that journalists no longer have to read through. Visualization methods such as word clouds can take these huge, text datasets and reveal unusual and unexpected trends. Besides word count visualizations, Stray also discussed the usefulness topic modeling\/subject sorting and network analysis.<\/p>\n<p class=\"p1\">I was specifically interested in how computational journalism uses DH methods for political ends; network analysis in a political context, for instance, allows for you to paste in the names of people and companies and look for connections that shouldn\u2019t be there. Stray gave the example of organized crime, money laundering, and fraud. More specifically, he discussed how this method allowed for WikiLeaks reporters to access a broader view of what happened regarding the private security contractors in Iraq. I could definitely see how networks provide algorithmic assistance for investigative journalism, for the computer automatically points to what is happening beneath the surface or between the lines of huge masses of bureaucratic paperwork.<\/p>\n<p class=\"p1\">After discussing the academic revelations and democratic freedoms one can amass through \u201cBig Open Data\u201d, I was wondering when the conversation would hit a darker chord. Next up was <a href=\"https:\/\/twitter.com\/alicetiara\" target=\"_blank\">Alice Marwick<\/a>, an Assistant Professor at Fordham University. Her work investigates online identity and consumerism \u201cthrough the lenses of privacy, surveillance, and consumption. Marwick provided the more dystopian approach to big open data. The focus of her presentation was how the &#8220;little data&#8221; that we all generate, including both our online and offline activities, are aggregated, bought, and sold to social media agencies. Thus, our closed data becomes open data. She first discussed a major offender, Facebook\u2019s Atlas ad program. Atlas\u2019 website advertises that is \u201chelps you reach your business objectives\u201d and that it is cross-device, online to offline, with real, proven results that all you to \u201cilluminate and understand customer journeys\u201d. To put it simply, if you browse the web, whether you\u2019re on your computer, tablet, or phone, while logged in to Facebook, Atlas can see all of your activities. This program has your facial recognition algorithm, your closet friends, political beliefs, games played, and your music listened to. This has allowed for Atlas to form very precise and targeted consumer audiences and identities.<\/p>\n<p class=\"p1\">Facebook sells information to data brokers, and over seven hundred million consumers have personal files. Data brokers have a variety of sources, such as public records, magazine subscriptions, online and offline shopping, education and salary, and DMV and voting records. What is different about Facebook is that it sells private closed information that isn\u2019t a part of the public record. For this reason, individuals reveal much that they do not intend to.<\/p>\n<p class=\"p1\">Sensitive information, such as being a smoker, being overweight, having divorced parents, and sexual orientation, can be inferred through correlations in closed data. Data brokers then sort the population in a digital dossier, into 71 segments and 19 categories. We are not allowed to know what these categories are or how we\u2019ve been categorized, and this information can allow for the most vulnerable members of the population to be targeted, and it is also widely available for purchase. Obama\u2019s campaign pioneered micro-targeted advertising; the government is not legally allowed to collect this information, but it is allowed to purchase it. Data brokers have even sold this information to criminals by mistake!<\/p>\n<p class=\"p1\">Alice Marwick\u2019s presentation made me seriously regret a lot of my online naivet\u00e9. Clearly, despite the advantages offered up through an unprecedented accessibility to large portions of data, there is certainly a dark side to algorithmic data retrieval and analysis. When it comes down to people being denied health insurance or a mortgage because of unclear, invisible surveillance, you can really begin to feel watched from all sides. I am certainly interested in Marwick\u2019s research and plan to eventually read her book. Still, from a DH\/research perspective, Jonathan Stray&#8217;s presentation was inspiring and a bit more hopeful. I am really interested in DH applications for political change, or the way that using DH research methods on big open data can serve emancipatory functions. Whether these future projects aim towards critiquing the state or the government, or towards allowing for citizens to access the information that is bureaucratically buried or hidden in a large mass of &#8220;open documents&#8221;, I hope that this area of DH grows a lot in the upcoming years.<\/p>\n<p>&nbsp;<\/p>\n<p><strong>Further Reading<\/strong><\/p>\n<p>Moretti, F. (2007). <i>Graphs, maps, trees: Abstract models for literary history<\/i>. New York: Verso.<\/p>\n","protected":false},"excerpt":{"rendered":"<p class=\"lead\">At Columbia\u2019s \u201cResearch Without Borders\u201d program last week, panel-goers were exposed to the \u201cBig Open Data\u201d phenomenon from three distinct vantage points. The first panelist was David Wrisley, an English professor at the American University of Beirut and a Medieval Fellow at Fordham\u2019s Center for Medieval Studies. His research focuses are medieval comparative literature and the digital humanities. Wrisley\u2019s approach&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"btn btn-danger\" href=\"https:\/\/studentwork.prattsi.org\/dh\/2014\/12\/08\/research-without-borders-big-open-data-columbia-university-december-4th-2014\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":183,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[8],"tags":[],"class_list":["post-435","post","type-post","status-publish","format-standard","hentry","category-event-reviews"],"_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/posts\/435","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/users\/183"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/comments?post=435"}],"version-history":[{"count":0,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/posts\/435\/revisions"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/media?parent=435"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/categories?post=435"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/tags?post=435"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}