Quipu: Chinese Exclusion Act
April 12, 2018 - All
This Python 3.4 script is an attempt to build a simple web page of primary sources for a single subject. It queries resources in digital archives and special collections to retrieve metadata. It also downloads the thumbnails and URLS for the resources. This code is written to specifically find materials on the Chinese Exclusion Act. The goal is to test possible methods for creating a subject-based tool for researchers and also to test the reliability of the metadata obtained in using digital repository APIs and by scraping web sites. The metadata is relatively reliable, but not perfect, especially dates. RESOURCES QUERIED The Digital Public Library of America (DPLA) Calisphere (images only) The California State Parks Museum Collections PRIMARY FILES PRODUCED A JSON dictionary with normalized metadata for each object; A raw webpage (HTML) sorted by date to troubleshoot the proper download of metadata and thumbnails; A file of subject terms collected including a count of the objects where the term appears. SECONDARY FILE PRODUCED A running file that dumps metadata for an object once processed. This is to assist in troubleshooting problem areas when running the script. WHAT YOU NEED TO RUN THIS CODE A Python interpreter (Python 3.4) An API key for DPLA All the Python modules in the import line (except ‘my_file’) A folder in the same directory as the script called “thumbnails” The resulting page with contextualizing information and additional style formatting can be viewed at the following URL: http://pfch.nyc/quipu/project.html HOW THIS CODE CAN BE REPURPOSED MEANINGFULLY This code is specifically written for the subject Chinese Exclusion Act. It may work for another subject, however, if you would like to use this script to begin finding resources and collecting metadata/thumbnails for your own subject, you will definitely need to change certain parts of this script: Delete or comment out any lines that pertain to the California State Parks website. For DPLA, you will need to insert your API key and change the API query to reflect your subject instead of ‘Chinese Exclusion Act’ in the following lines, as indicated: {‘q’: ‘your+subject+phrase+here’, ‘page_size’: 10000, ‘api_key’: ‘your_API_key_here’ } dpla = requests.get(‘http://api.dp.la/v2/items’, params=payload).