Data


Selecting the Dataset

Our project interrogates the foundations upon which data analysis and scholarly conclusions rest. Although our research questions aren’t specific to the subject of the data, our search for and selection of a set to work still had parameters and goals.

Firstly, working within the time constraints of a single semester, we knew we were looking for a pre-existing dataset. Secondly, we were looking for a dataset that could be considered agnostic; although the neutrality of any dataset is debatable, our search was focused on finding material whose subject matter would not distract from or explicitly support the issues that were driving our experiments i.e. feminized labor, data science and literacy. Finally, we had a desire for our working dataset to have already been used for humanities research as a means to promote and comment on reproducibility in Digital Humanities and data science in general.

We attempted searches by source and by subject, looking far and wide at publicly available humanities data. This included: NEH-funded DH projects, projects published in DH Quarterly and other established journals, institutional/academic Digital Humanities centers, humanitiesdata.com, the DH Awards, repositories such as Kaggle, and an array of other data and digital project sources like Statista, github, and Wikidata. In the end, we selected a dataset found via Humanities Commons, a network of humanities researchers that hosts a repository of open-access materials. There, we found Desert Island Disks (Finnegan et al., 2021) which builds on Desert Island Discs 1942-2020 (Gustar, 2020) which builds on BBC Desert Island Discs Dataset v 1.0 (Morgan, 2018). The authors of each iteration published their dataset, some degree of analysis as well as the documentation of their digital process. Their commitment to transparency is what allows our work to be in conversation with theirs as it pertains to highlighting the labor and results of their data collection and processing.

BBC Desert Island Discs logo
BBC Desert Island Discs logo

BBC’s Desert Island Discs

Desert Island Discs is a long-running BBC radio program. The “first Desert Island Discs was recorded in the BBC’s bomb-damaged Maida Vale studio on 27th January 1942 and aired in the Forces Programme at 8pm two days later. It was introduced to the listening public as ‘a programme in which a well-known person is asked the question, if you were to be cast away alone on a desert island, which eight gramophone records would you choose to have with you, assuming of course, that you had a gramophone and an inexhaustible supply of needles.’”(BBC Radio 4 – Desert Island Discs – The History of Desert Island Discs, n.d.)

Screenshot of Desert Island Disks dataset

What is the Desert Island Disks Dataset?

Desert Island Disks (Finnegan et al., 2021) comprises three sheets in an Excel spreadsheet with information about every episode of BBC Radio’s Desert Island Discs. Sheet 1 includes the episode number and air date, the name of the castaway(s)/guest(s), their country of citizenship, country of birth, date of birth, profession, favorite track, etc.; Sheet 2 has the full selection of desert island discs by episode; and Sheet 3 has data from Spotify on the track selections including genres, “danceability”, “energy”, “loudness”, “speechiness”, and more.