Digital archivists often have to contend with the difficulties of processing messy collections. They may have to deal with thousands of files on different media, which may cost more resources than the archives can afford to expend. For this reason many collections are only processed minimally.
Computers are commonly used to detect and deaccession duplicate files, but I believe we can go further. If software could automatically detect the connections between files and thereby identify edited versions of the same image or drafts of the same document, then that information could be invaluable to the archivist in discovering, navigating, and describing the contents of a digital collection.
I am developing software, codenamed “Eltrovo”, that can identify similarity between files to determine whether they may represent versions of the same work. This automatic discovery process will empower archivists with a greater understanding and enable them to describe digital collections more effectively.
The slide deck for this presentation is available online at stjo.hn/infoshow24