In her chapter in Digital Humanities in the Library: Challenges and Opportunities for Subject Specialists, Caro Pinto discusses the evolution of the traditional solitary work of humanities scholars to the collaborative nature of the majority of digital humanities projects. Pinto cites consortiums such as the Tri-Co Digital Humanities Initiative and Five Colleges DH as particularly successful examples of cross-institution collaboration. Similarly, NYCDH week is a perfect example of the way that digital humanists around New York are able to share their resources and knowledge-base with their colleagues.
The NYCDH Week’s Social Media Scraping workshop was led by Sarah Demott at New York University’s Bobst Library on Tuesday February 14, 2017. The workshop revolved around the qualitative data analysis software NVivo and its associated browser extension, NCapture. This software makes it easy for researchers to access user data from three major social networking sites, Twitter, Facebook, and YouTube. NCapture provides a few options for capturing data from each of these sites, as well as offering a number of tools to help researchers analyze their results. This tool is especially useful for researchers who have little to no coding skills but would still like to compile a dataset of social media information.
While NCapture is free to add as a Google Chrome browser extension, it saves the associated dataset as a proprietary .nvcx filetype, which can only be opened in NVivo. This means that users will need to purchase a subscription to NVivo in order to open and manipulate the datasets that they download using NCapture. Once we had captured our desired social media information as datasets, Sarah walked us through how to create a new project and import the NCapture files into NVivo.
Once the files have been imported, they display in a spreadsheet format. For example, with YouTube comment data, the columns include “Comment ID”, “Comment Username”, “Comment”, “Reply ID”, “Reply by Username”, and “Reply”. NVivo has tab options on the side of the spreadsheet which include a map function that plots the data points based on geography. Sarah also demonstrated how to query different aspects of our dataset, such as term frequency. This function included information on term length and frequency, as well as allowing for the addition of stop words, which helped to reduce the number of irrelevant words included in the results. The term frequency query also included a number of visual aids, such as a word tree and a word cloud.
Ultimately, NVivo and Ncapture seem like very useful tools for extracting social media data when the user is unfamiliar with coding. However, these tools do have some significant limitations. The most apparent of these is cost. While NCapture is freely available, NVivo is required to view NCapture data and student subscriptions start at $75 a year. The second issue with the NCapture/NVivo package is that it is really only capable of capturing real-time results from Twitter. NCapture would be unable to meet the needs of a project like ours in Digital Humanities II, where we are hoping to look at historical uses of a term and its associated hashtag. If NVivo were able to provide access to historical Twitter data, it might be worth purchasing for our project. Despite this shortcoming, it does seem to be a useful tool for real-time social media data collection, and provides a nice set of tools for researchers who are unfamiliar with coding and are new to data analysis.