Time(s) Splitter
April 12, 2018 - All
Time(s) Splitter by Richard Goldstein, 2017 The Programing for Cultural Heritage course provided me with my first Python encounter. In the background of learning its basic programming syntax and functions, William Burroughs kept creeping into my mind with Python being a means to scrape, recompose, and clarify or give new meaning to text. Using the word “text,” is perhaps a bit out of place when considering programming’s context of “data,” but seeing data as text is what sparked the inquiry of my Python experiment. With Burroughs as inspiration, I chose to query the New York Times api as my resource much in line with Burroughs’s cut-up material of choice, the newspaper. I wanted to cast as wide a net as I could and only filter the paper’s archive by Burroughs’s birth and death dates, allowing him to literally frame the project field in addition to culling the leading text “snippets” of these articles. The implementation of Python required two scripts, one that would query the latest articles in reverse from his death and one that would query the earliest dates and moving forward from his birth. Having acquired the text, I then needed a way to atomize and structure these strings, which the CSV format supported. My rule for splitting text strings was a three-part if/then statement. I felt splitting texts on their preexisting conjunctions and pauses would lend a smoothness leading up to the ensuing juxtaposing concatenations. The system follows: if the “snippet” of text has a comma, split on the first comma if there is no comma, split on “and” if there is no “and” split on either “in,” “on,” “at,” or “for” which in Python looks like: if ‘,’ in a_doc[‘snippet’]: words = a_doc[‘snippet’].split(‘,’) elif ‘ and ‘ in a_doc[‘snippet’]: words = a_doc[‘snippet’].split(‘and’) else: words = re.split(“in|on|at|for”,a_doc[‘snippet’]) This logic would parse the text before writing them out to discrete cells of two CSVs, one for the oldest and newest strings.