Peshawar Scrapin’: Producing a better index to CIA documents on the Soviet occupation of Afghanistan, 1979-1989

May 11, 2018 - All

Peshawar Scrapin’ is an exercise in rapid subject tagging of poorly-described of textual material. Using automatic and human-curated methods, I scraped 7,000+ PDF documents on the Soviet-Afghan War from the CIA’s website, expanding the CIA’s deficient metadata with the names of relevant persons, factions, places, and concepts.

Slides: https://docs.google.com/presentation/d/1ND-sEmw5zBjerO3t3x68xGrJHtk9Ar9rhu0GOTHgxzE/edit?usp=sharing

Author information

evolow

evolow

The post Peshawar Scrapin’: Producing a better index to CIA documents on the Soviet occupation of Afghanistan, 1979-1989 appeared first on #infoshow.