Organizing Image Collections
February 1, 2019 - All
Blend Images is a commercial stock photography collection of approximately 100,000 images produced by over 200 photographers. This project explores how Python may be used as a tool to create separate sub-collections by searching string attributes, generating separate metadata for those collections, and moving or copying jpeg files from the original directory to new folders. Using one original directory of all the images and one CSV of metadata, I created Python scripts to identify specific subjects within the collection by searching for specific strings in the captions and keywords. From this search, new CSVs of only those collected lines of metadata were created. A separate script then reads the CSVs and copies the list of jpeg files into separate folders. Each image has string geographic data. These strings were converted to Lat-Long coordinates using the Google Geocoding API. The geographic locations of each sub-collection have been visualized using Tableau. Mapping the images allows us to further explore and understand the range of assets of these subjects in the Blend collection. Common practical uses for these scripts include being able to separate images from a specific photographer, credit name(s), or shoot location, or to move assets using only a list of filenames.