Concluding a year-long web archiving fellowship at The Frick Collection, this project investigates the value of high fidelity web archives against institutional budget and staffing concerns. Using survey results from US web archiving programs and a discussion of the challenges and solutions I discovered through my fellowship, I pose a new approach to web archiving through slow and selective capture that promotes financial and environmental sustainability.
NYC Web Archives
The Art Reference Library at The Frick Collection is a member of the New York Art Resources Consortium (NYARC), additionally consisting of the libraries and research centers at The Brooklyn Museum and the Museum of Modern Art (MoMA), which is dedicated to preserving art resources living on the web. Their repository of 11 collections holds web captures from sites relating to galleries, auction houses, catalogs raisonne, and NYARC institutional sites from over a decade.
During the course of this fellowship, I worked under Sumitra Duncan, web archiving lead at The Frick Collection, primarily performing quality assurance work on artist websites, local galleries, and The Frick Collection’s institutional site.
NYARC uses Archive-It, a subscription service of The Internet Archive, to crawl, store, and preserve their web archive for playback on the Wayback Machine, a user-facing interface where the public can search archived websites by capture date.
QA: The Price for Fidelity
The crawl technology of web archives is not an exact science. Determining the scope of capture can help to crawl only the content desired, but it is overall, a messy and digitally complex process. Most initial crawls do not return high-fidelity captures, meaning they often do not exactly replicate all the content, features, or style of the live site. Due to crawler error, proprietary software on the site, or dynamic content, many initial crawls require patching in missing documents. While this was the bulk of my work as a fellow, not many institutions have the staff capacity to dedicate to the tedious work of quality assurance. QA is a luxury, but it is often necessary to achieve fidelity in web archives.
However, I posit that through a more selective approach to web archiving, in opposition to the current framework of capturing all sites of interest, capturing only the sites deemed (through user-based surveys and research) to be most instrumental for the overall study of the field, web archiving can become more financially and environmentally sustainable, especially for smaller institutions that lack the resources for major web archiving projects.