“HiPSTAS, What?: Information Retrieval, Machine Learning, and Visualizations with Sound” with Tanya Clement (CUNY Graduate Center, March 5, 2014)

There is a tendency when discussing, analyzing, and archiving audio recordings to consider their significance in terms of textual output and transcription availability. However, there is much non-text based information that can be collected from audio recordings that falls outside traditional research methodologies. Considering audio within the context of visualization and pattern recognition offers the potential for the discovery of new data types that enrich existing research and forge new directions in the analysis of audio for scholars. By applying a digital humanist approach with the utilization of new tools and technologies that assist with sound analysis, new data and information may be derived from archived audio recordings.

Tanya Clement, an assistant professor at the University of Texas School of Information, visited the CUNY Graduate Center on March 5th as part of the CUNY digital humanities initiative to discuses her NEH grant funded project, HIPSTAS (High Performance Sound Technologies for Access and Scholarship. Although the project is still in the very early stages, Clement shared examples of some preliminary findings and used the forum to voice concern for current practices of audio preservation. By framing the academic study of audio analysis in an historical context by providing a survey of prior technologies and advances, Clement strived to explore and raise awareness about how audio has been utilized in the past, the untapped potential for its utilization in digital humanities research, and for improving archival preservation practices.

In her talked entitled, “HiPSTAS, What?: Information Retrieval, Machine Learning, and Visualizations with Sound,” Clement spoke extensively about how many audio recordings remain inaccessible to researchers because they are stored on outdated formats, meaning that in many cases archivists are not even aware what type of information is contained on these recordings. Additionally, she stressed that there is a tendency to only make audio available once there are transcriptions to accompany them. However, she emphasized that there is information contained in these recordings that cannot be transcribed. Clement is interested in taking the focus away from the time-consuming alignment of text and audio that transcription requires, and instead in revealing patterns in sound that would not be apparent by relying on current modes of analysis. She also added that archivists require more extensive training on working with audio files, as she fears sound recordings are in danger of disappearing when they are not utilized for research due to lack of access.

Thus far the project has focused on recordings of poetry, folklore, speeches from the LBJ presidential library, Native American story-telling tradition, oral histories, and various field recordings. A prototype called ARLO is currently being tested that allows users to automate metadata description for unprocessed sound collections. The software is open source and outputs a visualization that utilizes color to mark-up audio files. The software has the ability to recognize different readers of the same text, and even the same text being read by the same individual at a different time. The visualization makes clear the discrepancies in these recordings and illustrates variances that may not be apparent from simply listening or viewing the wave forms of a traditional audio output.

As Burdick, et al (2012) asserts, “ The next generation of DH work will make contributions to theory only if it can show how to think in digital methods, not just with digital tool (92).” This statement rings true with the type of work that Clement’s team is pursuing. There is an interest in altering the ways in which audio is considered and evaluated that does not resemble the manner in which it is typically assessed. Although the potential uses of the project remain mostly hypothetical, Clement touched on the ways in which the technology could open up many new research channels and lead to a higher level of audio preservation. She suggested that this tool could provide a level of processing not currently available, and ultimately free up archivists from the daunting, time-consuming task of listening to hours of audio files. Since there is a reluctance to simply release the materials prior to even some degree of processing has occurred, it was suggested that computer programs could do some of this work, doing an initial exploration of the audio materials. Digital tools have the ability to not only explore audio in new ways, but also to compare multiple files from various repositories and call out similarities that the unassisted human may overlook. The tools thus enhance and provide new insight into materials that are being under-utilized but offer the potential to enhance scholarship in ways not typically considered in traditional humanities research.

Audio has the potential to enhance and provide insights into many cultural heritage collections in ways that paper and transcription materials cannot. Liu (2012) argues that there needs to be a greater convergence of humanities and sciences, and this project seems to be doing just that. He states that “the greatest service that the digital humanities can contribute to the humanities is to practice instrumentalism in a way that demonstrates the necessity of breaking down the artificial divide of the ‘two cultures’ to show that the humanities are needed alongside the sciences to solve the intricately interwoven natural, technological, economic, social, political, and cultural problems of the global age.” To fully realize the potential of audio materials, humanists, researchers, and archivists must think outside the box of current research methods, and re-imagine the ways in which audio is analyzed and interpreted. Audio is not simply about the words that are transcribed or the sounds that we hear, but rather the ways in which these sounds are considered and applied to scholarship. It is possible that a tone, pause, enunciation, or background noise may provide insight in ways that what is literally being said, sung, or played cannot. Just as text analysis and literary criticism attempt to read between the lines and beyond the words, audio analysis should consider factors that simply listening may not reveal. With the assistance of computers and audio analysis tools, many untapped and unprocessed audio archives may bring new meaning and understanding to current research projects.

Clement and her HIPSTAS project remains on the cusp of these possibilities. Her project, as it stands currently, provides a forum to reconsider the ways we utilize audio, pose new questions, and alter the way audio is preserved. This under-utilized media offers a wealth of options for preserving the past and exploring sound in new ways. The project also serves to suggest new processing techniques to archives wishing to make audio materials available and searchable. The focus thus moves away from transcription and towards computer-aided analysis and exploration. Although the actual tactical process of how these goals and ambitions will be achieved remains vague, the notion that it is possible should encourage new discussions and research projects that will more fully incorporate audio into humanities scholarship and cultural heritage preservation, while forging new understandings of concepts traditionally grounded in textual methods.


Burdick, A., Drucker, J., Lunenfeld, P., Presner, T.,  and Schnapp, J. (2012). The Social Life of Digital Humanities” and “Provocationsi,  in Digital_Humanities. Cambridge, Mass.: MIT Press, 73–120.

Liu, A. (2012). Where is the Cultural Criticism in Digital Humanities. Debates in the Digital Humanities. Retrieved from:

