Fidelity in Web Archives: The Luxury of Quality Assurance

Concluding a year-long web archiving fellowship at The Frick Collection, this project investigates the value of high fidelity web archives against institutional budget and staffing concerns. Using survey results from US web archiving programs and a discussion of the challenges and solutions I discovered through my fellowship, I pose a new approach to web archiving through slow and selective capture that promotes financial and environmental sustainability.

screenshot of the wayback capture of the Frick’s home page captured April 2024

NYC Web Archives

The Art Reference Library at The Frick Collection is a member of the New York Art Resources Consortium (NYARC), additionally consisting of the libraries and research centers at The Brooklyn Museum and the Museum of Modern Art (MoMA), which is dedicated to preserving art resources living on the web. Their repository of 11 collections holds web captures from sites relating to galleries, auction houses, catalogs raisonne, and NYARC institutional sites from over a decade.

During the course of this fellowship, I worked under Sumitra Duncan, web archiving lead at The Frick Collection, primarily performing quality assurance work on artist websites, local galleries, and The Frick Collection’s institutional site.

NYARC uses Archive-It, a subscription service of The Internet Archive, to crawl, store, and preserve their web archive for playback on the Wayback Machine, a user-facing interface where the public can search archived websites by capture date.

screenshot of NYARC’s home page in Archive-It, captured April 2024

QA: The Price for Fidelity

The crawl technology of web archives is not an exact science. Determining the scope of capture can help to crawl only the content desired, but it is overall, a messy and digitally complex process. Most initial crawls do not return high-fidelity captures, meaning they often do not exactly replicate all the content, features, or style of the live site. Due to crawler error, proprietary software on the site, or dynamic content, many initial crawls require patching in missing documents. While this was the bulk of my work as a fellow, not many institutions have the staff capacity to dedicate to the tedious work of quality assurance. QA is a luxury, but it is often necessary to achieve fidelity in web archives.

However, I posit that through a more selective approach to web archiving, in opposition to the current framework of capturing all sites of interest, capturing only the sites deemed (through user-based surveys and research) to be most instrumental for the overall study of the field, web archiving can become more financially and environmentally sustainable, especially for smaller institutions that lack the resources for major web archiving projects.

screenshot of the initial capture of https://www.frick.org/library/digitalarthistory, before extensive patch crawling

screenshot of https://www.frick.org/library/digitalarthistory live site, captured April 2024

Bio
Latest Posts

Francesca Strathern

Pratt Institute, MSLIS The Frick Collection Web Archiving Fellow 2023-2024

Latest posts by Francesca Strathern (see all)

Artistic Legacy in Italian Archives - April 30, 2024
Fidelity in Web Archives: The Luxury of Quality Assurance - April 30, 2024
RFP: Y2K Origins Project: The First 10 Years of Net Art Commissions - April 30, 2024

See work by year

Help

Fidelity in Web Archives: The Luxury of Quality Assurance

NYC Web Archives

QA: The Price for Fidelity

Francesca Strathern

Latest posts by Francesca Strathern (see all)

See work by year

Help

Topics

NYC Web Archives

QA: The Price for Fidelity

Francesca Strathern

Latest posts by Francesca Strathern (see all)