“The Lexicon of DH” was a workshop held at CUNY on October 27,2015 to help newcomers to digital humanities (DH) come up with a resource center of tools to be used and an understanding of the vocabulary and methods of the field. It was presented by Mary Catherine Kinniburgh, a GC Digital Fellow, with assistance by Patrick Sweeney, also, a Digital Fellow at the Graduate Center at CUNY. GC Digital Fellows operate out of the Digital Initiative at CUNY, continuing the kind of innovative approaches to technology in education discussed by Brier in his snapshot of digital education in that university (Brier, 2012). The slides used in the presentation were opened up to the participants at the link of http://tinyurl.com/lexiconofdh, which was handy considering that there were many hyperlinks involved in the discussion. In fact the first tool that was shared was Zotero, which became a convenient way to store links to tools and datasets discussed in the presentation slides. Zotero is a bibliographic tool, but also in this use can function as an online bookmark collector that once the user installs within their browser, means that they can access their links from any computer that they are on.
Though the workshop was centered more around a working vocabulary of terms describing the practice of DH and the pragmatics of tools that can be used to execute a DH project (rather than a discussion of the theoretical frameworks or purposes of DH) the organizers did still get across some of the underpinnings of interdisciplinary scholarship and collaboration that are central to DH (Burdick, et al., 2012). At the beginning of the workshop, Mary Catherine Kinniburgh had participants interview someone seated close to them and then introduce that person to the group. Just seeing the diversity of DH experience and of prospective projects in the room was a reinforcement that DH can be practiced across many disciplines and that it encourages collaboration and cooperation between practitioners of different skill-levels to troubleshoot how to accomplish a project. (Spiro, 2012) While some participants were relatively new to using tech tools in their disciplines, others had already built websites or had collected twitter data and were just looking for analysis methods. Some participants were coming in from art history, some from literature and some from the social sciences. We were encouraged to discuss what we were finding out about the tools with our “partner” throughout the workshop.
The workshop was constructed around tools DHers might want to apply to collect, process, analyze, or visualize data. Several databases or digital collections were named as spaces where users could go to locate data that had already been collected by other sources, including the Around DH in 80 Days project. This site collects 80 different DH projects with an attempt to focus outside the US. It can serve as inspiration, but also, many of the datasets within it are open-source or creative commons so if someone sees a project that overlaps with an area they are also researching they can include the data in their analysis, or remix the raw data to answer a different question than the original researchers, as this is encouraged in DH (Presner, 2009) Links were also given to tools to enable the researcher to capture their own data, such as TANDEM, a tool designed at CUNY that combines image extraction, natural language processing and OCR, and tools were also recommended that could scrape the web such as Beautiful Soup or Scrapy.
Using these examples, we examined some of the tools with an eye towards both their possibilities and limitations. Both instructors informed us that though these capture tools would get us data, the parameters we set would control what kind of data we received, and the data would still have to be cleaned and interpreted. One of the illustrations of what a web-scraping tool can do that they gave us was No Homophobes, a project designed to gather instances of homophobic tweets on Twitter. However, the instructors pointed out, since the data-collecting mechanism behind the project is looking for particular word patterns, without necessarily being able to look at the context, some of the tweets being caught up in the net happen to be those written by people within the gay community poking fun at homophobic attitudes by using the same kinds of phrases and hashtags. So the limit of the capture tool in this case is one of recognizing intention, leaving a margin of error that the researcher will have to decide how to deal with.
Once the data is obtained, if it is text, it can be further processed by adding markup such as TEI, (Text Encoding Initiative). It can also be analyzed with tools such as Voyant, AntConc (which also can analyze concordance of words), the R programming language, MALLET and others that may be collected within the Tapor portal. For mapping projects that need geocoding applied to them in order for them to be analyzed, QGIS, which is an open-source version of ArcGIS was recommended, as well as OpenStreetMap which may not have all the same bells and whistles as the ever-present Google, but has the benefits of being a community-sourced platform. Google, being proprietary, can change information that could lead to the destruction of aspects of your project without your knowledge or permission.
With the discussion of tools that can be used to display or analyze the data that has been gathered to answer the research question, came a reminder that the parameters you set for how your data is displayed will color the point that you are making, and will influence your audience, so care should be taken that the visualization aligns with the data. Gephi and D3 were mentioned as good visualization tools, with a link given to D3’s helpful gallery section that allows users to look both at the images that can be made with your data using that particular code, as well as links to the code itself. Mapping is another way to visualize data and fortunately QGIS not only analyzes data, it can visualize it as a map too, and we were linked to the support material in QGIS for that possibility, OpenLayers / Leaflet are able to be used with QGIS in order to display data in web browsers. Exhibition software such as Omeka or Scalar are also ways to display the results of a project to readers.
The workshop ended with a reminder that it was one of several operated by CUNY’s Digital Fellows, with additional information given on when Digital Fellows are available to provide assistance. In this way it was presented as the beginning of the participant’s journey into DH rather than as a self-contained instructional session.
References
Brier, S. (2012). Where’s the pedagogy? the role of teaching and learning in the digital humanities. In M. K. Gold (Ed.), Debates in the digital humanities () University of Minnesota Press. Retrieved from http://dhdebates.gc.cuny.edu/debates/text/8
Burdick, A., Drucker, J., Lunenfeld, P., Presner, T., & Schnapp, J. (2012). digital_humanities MIT Press. Retrieved from http://mitpress.mit.edu/books/digitalhumanities-0
Davidson, C. M. (2008). Humanities 2.0: Promise, perils, predictions. Publications of the Modern Language Association, 123(3)
Presner, T. (2009). Digital humanities manifesto 2.0. Retrieved from http://www.humanitiesblast.com/manifesto/Manifesto_V2.pdf
Spiro, L. (2012). “This is why we fight”: Defining the values of the digital humanities. In M. K. Gold (Ed.), Debates in the digital humanities () University of Minnesota Press. Retrieved from http://dhdebates.gc.cuny.edu/debates/text/13