Peshawar Scrapin’ is an exercise in rapid subject tagging of poorly-described of textual material. Using automatic and human-curated methods, I scraped 7,000+ PDF documents on the Soviet-Afghan War from the CIA’s website, expanding the CIA’s deficient metadata with the names of relevant persons, factions, places, and concepts.

Slides: https://docs.google.com/presentation/d/1ND-sEmw5zBjerO3t3x68xGrJHtk9Ar9rhu0GOTHgxzE/edit?usp=sharing