Scraping for Library Jobs
February 3, 2019 - All
For this project, I used Python to attempt to scrape job listings from three popular professional associations’ websites: Society of American Archivists (SAA) American Library Association (ALA) Association for Information Science and Technology (ASIS&T;) I first scraped the websites for links to detailed job postings and stored the URLs in a JSON file. Then I scraped each detailed job page (with different code to suit each websites’ format), and saved this data in a JSON file. I successfully scraped the ASIS&T; and ALA websites, but couldn’t scrape the SAA website, so my dataset only includes jobs from the last two professional associations. Lastly, I compiled the ASIS&T; and ALA jobs into one JSON file (master_job_list.json). For this project I used four modules: Requests (An HTTP library that provides a shortcut for making HTML requests) Beautiful Soup (Helps with searching and pulling data from websites structured with HTML or XML) JSON (Used to create and edit .json file types) Time (Used to add time between executions of the code).