{"id":9383,"date":"2024-04-30T15:29:45","date_gmt":"2024-04-30T15:29:45","guid":{"rendered":"https:\/\/studentwork.prattsi.org\/infoshow\/?p=9383"},"modified":"2024-05-08T19:10:26","modified_gmt":"2024-05-08T19:10:26","slug":"fidelity-in-web-archives-the-luxury-of-quality-assurance","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/infoshow\/2024\/fidelity-in-web-archives-the-luxury-of-quality-assurance","title":{"rendered":"Fidelity in Web Archives: The Luxury of Quality Assurance"},"content":{"rendered":"<p>Concluding a year-long web archiving fellowship at The Frick Collection, this project investigates the value of high fidelity web archives against institutional budget and staffing concerns. Using survey results from US web archiving programs and a discussion of the challenges and solutions I discovered through my fellowship, I pose a new approach to web archiving through slow and selective capture that promotes financial and environmental sustainability.<\/p>\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"286\" src=\"https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/wayback-frick-home.png\" alt=\"\" class=\"wp-image-9672\" srcset=\"https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/wayback-frick-home.png 512w, https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/wayback-frick-home-300x168.png 300w, https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/wayback-frick-home-508x284.png 508w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\" \/><figcaption class=\"wp-element-caption\">screenshot of the <a href=\"https:\/\/wayback.archive-it.org\/4269\/20240502230238\/http:\/\/www.frick.org\/\" data-type=\"URL\" data-id=\"https:\/\/wayback.archive-it.org\/4269\/20240502230238\/http:\/\/www.frick.org\/\">wayback capture of the Frick&#8217;s home page<\/a> captured April 2024<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\">NYC Web Archives<\/h2>\n\n\n\n<p>The Art Reference Library at The Frick Collection is a member of the New York Art Resources Consortium (NYARC), additionally consisting of the libraries and research centers at The Brooklyn Museum and the Museum of Modern Art (MoMA), which is dedicated to preserving art resources living on the web. Their repository of 11 collections holds web captures from sites relating to galleries, auction houses, catalogs raisonne, and NYARC institutional sites from over a decade. <\/p>\n\n\n\n<p>During the course of this fellowship, I worked under Sumitra Duncan, web archiving lead at The Frick Collection, primarily performing quality assurance work on artist websites, local galleries, and The Frick Collection&#8217;s institutional site. <\/p>\n\n\n\n<p>NYARC uses Archive-It, a subscription service of The Internet Archive, to crawl, store, and preserve their web archive for playback on the Wayback Machine, a user-facing interface where the public can search archived websites by capture date. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"286\" src=\"https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/data-budget.png\" alt=\"\" class=\"wp-image-9674\" srcset=\"https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/data-budget.png 512w, https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/data-budget-300x168.png 300w, https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/data-budget-508x284.png 508w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\" \/><figcaption class=\"wp-element-caption\">screenshot of NYARC&#8217;s home page in Archive-It, captured April 2024<\/figcaption><\/figure>\n<\/div>\n\n\n<h2 class=\"wp-block-heading\">QA: The Price for Fidelity<\/h2>\n\n\n\n<p>The crawl technology of web archives is not an exact science. Determining the scope of capture can help to crawl only the content desired, but it is overall, a messy and digitally complex process. Most initial crawls do not return high-fidelity captures, meaning they often do not exactly replicate all the content, features, or style of the live site.  Due to crawler error, proprietary software on the site, or dynamic content, many initial crawls require patching in missing documents. While this was the bulk of my work as a fellow, not many institutions have the staff capacity to dedicate to the tedious work of quality assurance. QA is a luxury, but it is often necessary to achieve fidelity in web archives.<\/p>\n\n\n\n<p>However, I posit that through a more selective approach to web archiving, in opposition to the current framework of capturing all sites of interest, capturing only the sites deemed (through user-based surveys and research) to be most instrumental for the overall study of the field, web archiving can become more financially and environmentally sustainable, especially for smaller institutions that lack the resources for major web archiving projects. <\/p>\n\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"512\" height=\"340\" src=\"https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/digital-art-history-crawled-2.png\" alt=\"\" class=\"wp-image-9678\" srcset=\"https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/digital-art-history-crawled-2.png 512w, https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/digital-art-history-crawled-2-300x199.png 300w, https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/digital-art-history-crawled-2-508x337.png 508w\" sizes=\"auto, (max-width: 512px) 100vw, 512px\" \/><figcaption class=\"wp-element-caption\">screenshot of the initial capture of <a href=\"https:\/\/www.frick.org\/library\/digitalarthistory\">https:\/\/www.frick.org\/library\/digitalarthistory<\/a>, before extensive patch crawling<\/figcaption><\/figure>\n<\/div>\n\n<div class=\"wp-block-image\">\n<figure class=\"aligncenter size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"469\" height=\"512\" src=\"https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/digital-art-history-live-site-1.png\" alt=\"\" class=\"wp-image-9679\" srcset=\"https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/digital-art-history-live-site-1.png 469w, https:\/\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/digital-art-history-live-site-1-275x300.png 275w\" sizes=\"auto, (max-width: 469px) 100vw, 469px\" \/><figcaption class=\"wp-element-caption\">screenshot of <a href=\"https:\/\/www.frick.org\/library\/digitalarthistory\">https:\/\/www.frick.org\/library\/digitalarthistory<\/a> live site, captured April 2024<\/figcaption><\/figure>\n<\/div>\n\n\n<div style=\"height:100px\" aria-hidden=\"true\" class=\"wp-block-spacer\"><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Concluding a year-long web archiving fellowship at The Frick Collection, this project investigates the value of high fidelity web archives against institutional budget and staffing concerns. Using survey&#8230;<\/p>\n","protected":false},"author":4169,"featured_media":9669,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1150],"tags":[1265,127,197],"coauthors":[1176],"class_list":["post-9383","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-1150","tag-fellowship2024","tag-web-archive","tag-web-archiving"],"_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/posts\/9383","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/users\/4169"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/comments?post=9383"}],"version-history":[{"count":5,"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/posts\/9383\/revisions"}],"predecessor-version":[{"id":9915,"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/posts\/9383\/revisions\/9915"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/media\/9669"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/media?parent=9383"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/categories?post=9383"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/tags?post=9383"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infoshow\/wp-json\/wp\/v2\/coauthors?post=9383"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}