Linked Jazz Meets Carnegie Hall

April 12, 2018 - All

Background: This project was inspired by our involvement with Linked Jazz, a research project based at Pratt Institute that investigates the application of Linked Open Data technologies to digital cultural heritage materials. The work is built on a collection of 50+ jazz oral history transcripts from which relationships between jazz musicians are derived. When the person being interviewed mentions another person, an RDF triple is generated to describe that Person A knows of Person B. One goal in working with the Linked Jazz data, and a goal of linked open data in general, is to link the data with external sources. This project is an exploratory step toward that goal, and serves as a use case and iterative research effort that we hope to expand on in the future. Linked Jazz Meets Carnegie Hall: Working with performance history data from Carnegie Hall, our work compares the Linked Jazz network of relationships to the network of performers involved in jazz events at Carnegie Hall from 1912 to May 1955. Our first step was to define the subset of people who are in both datasets. Using a series of Python scripts and the Python Human Name Parser module, we used string matching to find people with the same first and last names. From a total of 2005 people in the Linked Jazz dataset and 19000+ performers from Carnegie Hall, 264 name matches were identified using this method. In addition to the name parser string matching, we compared Unique Resource Identifiers (URIs) in the datasets for matches using the Python RDFLib package to parse the RDF. Many of the performers in the CH name directory do not have URIs that could be matched to URIs in the LJ data (which are primarily from DBpedia).

› tags: culture / programming / python /