{"id":1551,"date":"2019-05-13T19:18:29","date_gmt":"2019-05-13T23:18:29","guid":{"rendered":"http:\/\/studentwork.prattsi.org\/dh\/?p=1551"},"modified":"2019-05-13T19:18:32","modified_gmt":"2019-05-13T23:18:32","slug":"getting-data-for-digital-humanities-with-apis","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/dh\/2019\/05\/13\/getting-data-for-digital-humanities-with-apis\/","title":{"rendered":"Getting Data for Digital Humanities with APIs: A Gentle Introduction"},"content":{"rendered":"\n<p>Heard about APIs, but still not really sure what they are or how to use them? This post will take you through:<br><\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>the basics of what an API is<\/li><li>when you might want to use one<\/li><li>what kinds of institutions or collections might have APIs of potential interest to DH practitioners<\/li><li>where to look on these sites for access to APIs<\/li><li>how to construct queries, and<\/li><li>what to do once you\u2019ve received some data!<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>What\u2019s an API?<\/strong><br><\/h3>\n\n\n\n<p>An <strong>API<\/strong>, or Application Programming Interface, is, in general, a defined way for different parts of software systems to talk to each other. A software system is made of many individual parts that must work together to respond to commands or requests. When humans need to talk to each other, we try to use a language that all parties involved can understand, and that language has rules and guidelines that govern how we form words and sentences with particular meanings. Similarly, APIs are built with rules that must be followed for successful interactions.<br><br><\/p>\n\n\n\n<p>Often computer programs will communicate with each other using APIs in a way that is not visible to humans. However, humans can learn to directly create requests and receive information in a readable format using APIs, and this can be a powerful way to access data. If you want to use one to gather data for a digital humanities project, this will be done through interactions with web APIs.<br><br><\/p>\n\n\n\n<p>You may see frequent mention of <strong>REST <\/strong>(Representational State Transfer) or <strong>RESTful<\/strong> APIs, which are the main focus of this tutorial, though other, more complicated approaches to building web APIs exist. RESTful APIs make use of a familiar way of browsing the web\u2014Hypertext Transfer Protocol, or <strong>HTTP<\/strong> methods. You can use these kinds of APIs to get, post, or delete data online, but this tutorial will be focused mostly on creating GET requests and receiving responses, something you probably already inadvertently do every day.<br><br><\/p>\n\n\n\n<p>A request is made with a <strong>URL<\/strong>, or Uniform Resource Locator, and sent to a web server using HTTP with the expectation of getting a response back in the form of human-readable text or data, or simply a web page. The URL supplies the web server with everything it needs to create and return a correct response. See the \u201cHow do I construct a query?\u201d section below to read more about how to piece together a URL in order to return data that fits particular parameters.<br><br><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>When would I use an API in the context of DH?<\/strong><br><\/h3>\n\n\n\n<p>APIs can provide a way for you to request and receive specific bits of data that interest you from large corpuses of information from cultural heritage organizations and other data creators. That data can come to you in a predictable and machine-readable format, such as JSON or RDF, and will be broken up into metadata categories chosen by the institution that created the data.<br><br><\/p>\n\n\n\n<p>APIs can be of use if you want to download some data\u2014not everything available on a website, and not an individual record or two\u2014and you are at least somewhat familiar with the available information and the metadata used to describe it, and know what you think you are looking for. In many ways it\u2019s like using an advanced search function, except instead of a results page with a user interface to scroll through, you can get neat chunks of data that you can then further refine and manipulate.<br><br><\/p>\n\n\n\n<p>APIs can also enable the possibility of continuous retrieval if data that fits your parameters is frequently updating. This is generally how developers use APIs\u2014it\u2019s a way to get to the data repeatedly and pull it into apps or programs without having to go through a user interface. Instead of going back to a site and performing the same search every week or every day in hopes of finding new information, you can create a script to programmatically query an API at a certain rate, automatically returning any data that fits your parameters.<br><br><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Why are APIs being embraced by libraries, archives, and museums?<\/strong><br><\/h3>\n\n\n\n<p>They\u2019re a way for institutions to make their collections more accessible for things like DH projects by making data about them machine-readable and queryable. Making semi-structured data available provides flexibility, and many institutions hope this will encourage innovative or unexpected uses of their collection information.<br><br><\/p>\n\n\n\n<p>Organizations can also leverage APIs, either their own or other organizations\u2019, to more automatically provide or enhance access on their websites. For example, libraries that use the open source library system Koha can use an <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/openlibrary.org\/developers\/api\" target=\"_blank\">Openlibrary API<\/a> (part of the Internet Archive) to find and display images of book covers in their catalog, without having to individually search for each book cover and then download and upload the picture.<br><br><\/p>\n\n\n\n<p>APIs are at the heart of the <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/collection.cooperhewitt.org\/api\/\" target=\"_blank\">Cooper Hewitt Smithsonian Design Museum<\/a>\u2019s website, and can return everything from collection information to descriptions of temporal periods to a list of relationship terms they use to connect people associated with the collection. Even their cafe hours can be returned by querying an API! Not only does this mean their website can display the accurate hours, it also means sites like Google Maps can use that API to pull in information about their hours at a certain rate. If the cafe were to be closed for a holiday, that change would be reflected in the data available via the API.<br><br><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Where can I find an API?<\/strong><br><\/h3>\n\n\n\n<p>Certain types of cultural heritage organizations tend to make their data available via APIs, namely larger libraries, museums, and archives with vast amounts and varied types of data to share, and a budget with which to provide access. Large digital libraries and other sites that aggregate digital resources across collections are also good candidates. National and local governments, social media sites and online comments sections, and ongoing creators of primary sources like newspaper companies can also provide a wealth of information for digital humanists through APIs. Having a large amount of data might make a bulk download too unwieldy, and for many of these institutions, their users are more likely to be interested in a small portion of the data rather than everything available, making a public API a worthwhile service.<br><br><\/p>\n\n\n\n<p>In addition to size, it can be helpful to look for sites or institutions with stated commitments to openness or open access. Information sharing on some level is legally required for many government institutions (and projects that receive government funding), plus they have a lot of data, which is why they\u2019re a reliable source of APIs (remember national museums and libraries fall into this category!). On the other end of the openness scale, rich, giant resources like proprietary databases are unlikely to give you access to their data through an API. The information and its organization is their product, and there is a fear that you could scrape it all using an API and remake and redistribute their database. Companies may be willing to work with you if you need access, but it could cost a significant amount of money or come with restrictive use or distribution parameters.<br><br><\/p>\n\n\n\n<p>Individuals or creators of smaller projects can also construct APIs if their amount of data justifies it or if they want to provide access to frequently updating data. The open access publication, <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"http:\/\/dhdebates.gc.cuny.edu\/apis\" target=\"_blank\">Debates in the Digital Humanities<\/a>, for example, makes available APIs that can return all keywords or all sentences marked as \u201cimportant\u201d by users.<br><br><\/p>\n\n\n\n<p>However, APIs can be cost and labor intensive to create and maintain, so access to machine-actionable data from less well funded individuals or organizations is often differently provided, if at all\u2014possibly as some sort of data dump, or file with bulk amounts of information in a CSV\/TSV or other format that you can download. The <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/github.com\/cmoa\/collection\" target=\"_blank\">Carnegie Museum of Art collection data<\/a> is available on Github, as both a CSV and a JSON file. While you do have to download information about the entire collection, you could then \u201cquery\u201d the data afterwards by loading it into a program like <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"http:\/\/openrefine.org\/\" target=\"_blank\">OpenRefine<\/a> and searching for and retaining data of interest. Some organizations will provide an API and a bulk download option, to cater to different access needs.<br><br><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>No really, where can I find them?<\/strong><br><\/h3>\n\n\n\n<p>While many websites provide access to data using APIs, they aren\u2019t often so easy to find. If you suspect a website may have an API available for you to use, try looking through the site\u2019s navigation menu for words like \u201cDeveloper\u201d, \u201cPro\u201d, or \u201cTools\u201d. Don\u2019t be put off by the terminology: if the APIs are publicly accessible, they are for anyone to use.<br><br><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"395\" src=\"http:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.44.42-PM-1024x395.png\" alt=\"A screenshot pointing to the &quot;Pro&quot; link on the DPLA website.\" class=\"wp-image-1555\" srcset=\"https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.44.42-PM-1024x395.png 1024w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.44.42-PM-300x116.png 300w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.44.42-PM-768x296.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"593\" src=\"http:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.46.23-PM-1024x593.png\" alt=\"A screenshot pointing to the &quot;Developer Center&quot; link on the OpenLibrary website.\" class=\"wp-image-1556\" srcset=\"https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.46.23-PM-1024x593.png 1024w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.46.23-PM-300x174.png 300w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.46.23-PM-768x445.png 768w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.46.23-PM.png 1336w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"222\" src=\"http:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.49.23-PM-1-1024x222.png\" alt=\"A screenshot pointing to the API link for the 311 dataset from NYC open data.\" class=\"wp-image-1559\" srcset=\"https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.49.23-PM-1-1024x222.png 1024w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.49.23-PM-1-300x65.png 300w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-9.49.23-PM-1-768x166.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Once you drill down to a particular dataset, the availability of an API may be more explicit. <\/figcaption><\/figure><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>How do I construct a query?<\/strong><br><\/h3>\n\n\n\n<p>Once you\u2019ve found a website with a collection you\u2019d like to further explore and an API with which to explore it, your next step is to <strong>check the documentation<\/strong>. Many of the APIs created for collections of interest to humanities scholars have robust documentation, often with beginner and more advanced level instructions. Even after you\u2019ve become a confident user of a particular API, it\u2019s a good idea to always start with the documentation page, as the design of an API can change over time. <br><br><\/p>\n\n\n\n<p>To start, not always, but often, you will need to <strong>request a key<\/strong> or form of identification or authentication. While some APIs are completely open, free, and available for the public to use or test as they please, most have some kind of barrier to access in place. Often you\u2019ll have to provide an email address or sign up for an account, and then you can put in a request for a key, which is simply a string of characters assigned to you that will be used to authenticate the requests you make. The access granted by your key will probably come with rate limits that will prevent you from making a large number of requests in a short period of time. Some APIs require multiple keys, Secrets, or other forms of authentication in order to make requests.<br><br><\/p>\n\n\n\n<p>Why put these barriers in place? Institutions need some control over who is accessing their data and how, to guard against malicious users and people who want to overload their systems with requests. APIs can be used to post or even delete data from a site, and an institution may use a single API for internal and external users. In these cases, the keys safeguard against users being able to access all functions of the API.<br><br><\/p>\n\n\n\n<p>Once you\u2019re signed up and the necessary authentication is received, you can start to <strong>build a query<\/strong>! Again, the API documentation will cover the exact structure and syntax that the request needs to adhere to, but most REST queries follow the same general pattern:<br><br><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>protocol:\/\/base_url\/resource_path?parameter_key=value&amp;api_key=&lt;your_api_key&gt;<\/strong><br><br><\/pre>\n\n\n\n<p>What does this look like in practice? Below is a simple example from the <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/pro.dp.la\/developers\/api-codex\" target=\"_blank\">Digital Public Library of America\u2019s (DPLA) search API<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\"><strong>https:\/\/api.dp.la\/v2\/items?q=fruit+AND+banana&amp;api_key=<\/strong><br><br><\/pre>\n\n\n\n<p>Protocol: HTTPS, or Hypertext Transfer Protocol Secure, which encrypts the traffic between your machine and the web server. Remember all REST APIs operate using HTTP.<br><br><\/p>\n\n\n\n<p>Base URL: The domain name, as specified in the documentation, which will always form the base of your query. Often these will include several terms separated by forward slashes. The \u201cv2\u201d here refers to the API\u2019s version, in this case version two.<br><br><\/p>\n\n\n\n<p>Resource Path: The resource path is the first part of your specific query, and is followed by a question mark. The DPLA search API has two resource paths available: items, used here, which will return data about individual resources, and collections, which will return data about groups of items.<br><br><\/p>\n\n\n\n<p>Parameter\/key-value pair: The key here is \u201cq\u201d, which in this API indicates we want to search all text fields. Boolean variables can sometimes be used as shown, to search for items with the corresponding values \u201cfruit\u201d and \u201cbanana\u201d. Multiple parameters are separated by the \u201c&amp;\u201d symbol. Depending on the API, you may be able to specify format here as well (format=json, for example). In this case, no specification was necessary as this API only returns JSON.<br><br><\/p>\n\n\n\n<p>API Key: Here you will enter your API key if one is required, in the format specified.<br><br><\/p>\n\n\n\n<p>If you type that URL into your browser (with your own key, of course), this will return a JSON file with&#8230;a lot of metadata\u2014almost 200 lines per item if the JSON is <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/jsonformatter.curiousconcept.com\/\" target=\"_blank\">pretty printed<\/a>, or more nicely formatted for human reading. It\u2019s a bit overwhelming, but if you start to scroll through, you can begin to note the key-value pairs (also a part of JSON files), and pick out certain categories of data that might be of use in your research, such as the \u201ctitle\u201d or \u201csubject\u201d of each item.<br><br><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"294\" src=\"http:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-07-at-10.14.55-PM-1024x294.png\" alt=\"A screenshot of the JSON returned from the DPLA query.\" class=\"wp-image-1563\" srcset=\"https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-07-at-10.14.55-PM-1024x294.png 1024w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-07-at-10.14.55-PM-300x86.png 300w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-07-at-10.14.55-PM-768x220.png 768w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-07-at-10.14.55-PM.png 1520w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>Side note: <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/pro.dp.la\/developers\/api-codex\" target=\"_blank\">DPLA\u2019s API documentation<\/a>, where this example query originated, is thorough and very beginner friendly\u2014it\u2019s a great place to start exploring.<br><br><\/p>\n\n\n\n<p>No matter how complex or well constructed, your request will only return data that\u2019s as good as the metadata that\u2019s been put in, and will reflect those categories that the institution has decided upon. The API will also only return information about what has been deemed worthy to collect and digitize or digitally describe by the institution in the first place. Even further, APIs are constructed by people based on what they think you want to retrieve and what they\u2019re willing to provide access to, so institutions will often have multiple APIs if at all, and all operate a little differently, again, like different languages use different syntax and grammar. <br><br><\/p>\n\n\n\n<p>In the beginning of a project, you may find the highly defined modes of access with an API to be limiting, as the categories relating to your research ideas may not directly be queryable. Still, it can be worthwhile to start broad and see what you get \u2014 though they\u2019re designed for highly specified access, APIs can be really great ways to explore the back end of a collection! Of course, if you know exactly what you need and have an API that can provide that information, it\u2019s only a matter of constructing a query that reflects your specific needs. While you might initially be constrained by the predetermined organization of the data, once you have some data in hand you can isolate certain categories, and add more if you plan to create or connect it with other data.<br><br><\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Constructing queries with Chronicling America<\/strong><br><\/h3>\n\n\n\n<p>Let\u2019s take a look at and construct queries using the API from a corpus of newspapers as if we were in the brainstorming stages of a DH project. <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/chroniclingamerica.loc.gov\/about\/api\/\" target=\"_blank\">Chronicling America<\/a> is a resource created by the National Digital Newspaper Program (NDNP), which is jointly sponsored by the Library of Congress and the National Endowment for the Humanities. It provides access to metadata about American newspapers from 1690-present, and newspaper pages with OCR\u2019d text dating from 1789-1963. Their API is built for general use by the public, with no key or sign up needed. It is essentially just a more direct way to interact with their advanced search function, plus it can give you access to metadata in a more usable form.<br><br><\/p>\n\n\n\n<p>Say you\u2019re interested in looking through Alaska newspapers for discussions of suffrage that were happening there the early 1900s. You think you might want to do some kind of text or sentiment analysis of the content you find, but you\u2019re not sure what the final form of the project will be, whether a visualization or an interactive website or some kind of digital narrative.<br><br><\/p>\n\n\n\n<p>To look through the digitized pages, we can add search\/pages\/results\/? to the protocol and base URL, and then build out our query with parameters to eventually get something like the URL below, which will give us pages from newspapers published in Alaska, from the years 1912-1916, sorted by date, that include the term \u201csuffrage\u201d, all in a JSON file.<br><br><\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" aria-label=\" https:\/\/chroniclingamerica.loc.gov\/search\/pages\/results\/?state=Alaska&amp;dateFilterType=yearRange&amp;date1=1912&amp;date2=1916&amp;sort=date&amp;andtext=suffrage&amp;format=json  (opens in a new tab)\" href=\"https:\/\/chroniclingamerica.loc.gov\/search\/pages\/results\/?state=Alaska&amp;dateFilterType=yearRange&amp;date1=1912&amp;date2=1916&amp;sort=date&amp;andtext=suffrage&amp;format=json\" target=\"_blank\"> https:\/\/chroniclingamerica.loc.gov\/search\/pages\/results\/?state=Alaska&amp;dateFilterType=yearRange&amp;date1=1912&amp;date2=1916&amp;sort=date&amp;andtext=suffrage&amp;format=json <\/a><br><br><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"411\" src=\"http:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.57.04-PM-1024x411.png\" alt=\"A screenshot of the JSON returned from the query.\" class=\"wp-image-1568\" srcset=\"https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.57.04-PM-1024x411.png 1024w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.57.04-PM-300x120.png 300w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.57.04-PM-768x308.png 768w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.57.04-PM.png 1484w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>A snippet of the JSON returned with this query<\/figcaption><\/figure><\/div>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"618\" src=\"http:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.58.44-PM-1024x618.png\" alt=\"A screenshot of the advanced search results page.\" class=\"wp-image-1569\" srcset=\"https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.58.44-PM-1024x618.png 1024w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.58.44-PM-300x181.png 300w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.58.44-PM-768x464.png 768w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.58.44-PM.png 1974w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Results of the same query with \u201c&amp;format=json\u201d parameter removed<\/figcaption><\/figure><\/div>\n\n\n\n<p>How did I know which parameters we could add? The documentation unfortunately doesn\u2019t provide a convenient list of options. I puzzled through building the above query by first performing various advanced searches, and made note of the parameters listed in the resulting URL. I then modified this to build the query, essentially just deleting from the URL the superfluous parameters where I didn\u2019t specify anything, adding the \u201csort\u201d key to indicate that I wanted the results sorted by date, and specifying the format at the end.<br><br><\/p>\n\n\n\n<p>But wait\u2014there are only 20 results returned here, even though over 1200 items matched the search! <br><br><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"404\" height=\"318\" src=\"http:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.56.44-PM.png\" alt=\"A screenshot of the JSON indicating there are 1257 results and 20 shown.\" class=\"wp-image-1571\" srcset=\"https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.56.44-PM.png 404w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-7.56.44-PM-300x236.png 300w\" sizes=\"auto, (max-width: 404px) 100vw, 404px\" \/><\/figure><\/div>\n\n\n\n<p>To get to our next batch of results, we\u2019ll need to add a key-value pair for pagination, and continue re-requesting with increasing page numbers:<br><br><\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/chroniclingamerica.loc.gov\/search\/pages\/results\/?state=Alaska&amp;dateFilterType=yearRange&amp;date1=1912&amp;date2=1916&amp;sort=date&amp;andtext=suffrage&amp;format=json&amp;page=2\" target=\"_blank\"> https:\/\/chroniclingamerica.loc.gov\/search\/pages\/results\/?state=Alaska&amp;dateFilterType=yearRange&amp;date1=1912&amp;date2=1916&amp;sort=date&amp;andtext=suffrage&amp;format=json&amp;page=2 <\/a><br><br><\/p>\n\n\n\n<p>You now have some data\u2014hooray! Because there is no one way to do research, what follows will be a lot of questions, like:<br><br><\/p>\n\n\n\n<p>Is the date range sufficient for your project, or should you expand or narrow it? <br><\/p>\n\n\n\n<p>Should you expand your search to also look for the words \u201cvote\u201d or \u201cvoting\u201d? <br><\/p>\n\n\n\n<p>Which metadata categories are relevant for your project, and are there any categories you\u2019d like to add?<br><\/p>\n\n\n\n<p>How much of the text do you intend to keep? <br><\/p>\n\n\n\n<p>Would you like to have links to the images of each page in addition to the snippets of the OCR\u2019d text? <br><\/p>\n\n\n\n<p>Are you interested particularly in white women\u2019s suffrage and\/or the suffrage of indigenous people, both of which were up for debate in Alaska around this time? <br><\/p>\n\n\n\n<p>Whose views are represented in these newspaper articles, and are these the views you\u2019d like to focus on?<br><\/p>\n\n\n\n<p>Looking through the text, not all of the sources discuss suffrage in Alaska, but report on different campaigns for voting rights across the country\u2014are these outside the scope of your project?<br><\/p>\n\n\n\n<p>Are there then newspapers from around the country that you can find that discuss suffrage in Alaska at this time, and should those be considered? <br><\/p>\n\n\n\n<p>Do you want to search for other types of sources in other collections, such as diaries, pamphlets, or images, that also discuss this subject?<\/p>\n\n\n\n<p>These questions and more can start to guide you towards new searches, and help you narrow your topic and pinpoint exactly what it is you are looking for, assuming it exists to be accessed. <br><br><\/p>\n\n\n\n<p>Maybe you find you are more interested in the views on suffrage from a single newspaper: <em>The Thlinget<\/em>, which was published in Alaska from 1908-1912. Each title in this database has been assigned a Library of Congress Control Number (LCCN), which you can find in the JSON file we just received, or find by using the newspaper directory search.<br><br><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"566\" src=\"http:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-6.03.29-PM-2-1024x566.png\" alt=\"A screenshot with an arrow pointing to the LCCN.\" class=\"wp-image-1572\" srcset=\"https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-6.03.29-PM-2-1024x566.png 1024w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-6.03.29-PM-2-300x166.png 300w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-6.03.29-PM-2-768x424.png 768w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-08-at-6.03.29-PM-2.png 1282w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<p>You can then can go back to using the page search to look for individual digitized pages using the corresponding LCCN, adding parameters for a date range, specific words in the text, format type, or to sort by date, etc.<br><br><\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/chroniclingamerica.loc.gov\/search\/pages\/results\/?lccn=sn94050023&amp;dateFilterType=yearRange&amp;date1=1908&amp;date2=1912&amp;ortext=suffrage+vote&amp;format=json\" target=\"_blank\">https:\/\/chroniclingamerica.loc.gov\/search\/pages\/results\/?lccn=sn94050023&amp;dateFilterType=yearRange&amp;date1=1908&amp;date2=1912&amp;ortext=suffrage+vote&amp;format=json<\/a><br><br><\/p>\n\n\n\n<p>Note that rather than a Boolean variable, as with the DPLA API, here a different key is used to indicate that we\u2019re searching for either \u201csuffrage\u201d or \u201cvote\u201d within the volumes of this particular title. <br><br><\/p>\n\n\n\n<p>In addition to making requests directly in browser, I decided to also try another tool built for developers called <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.getpostman.com\/products\" target=\"_blank\">Postman<\/a>. You can use it to make API calls and download the data returned, and to name and save useful queries rather than having to rewrite them every time. You have the option to describe the key-value pairs of your queries, too, so that you remember exactly what each one of them does. Postman can also take those queries and generate a snippet of code in various languages such as Python that will perform the calls, and could be incorporated into more complex commands. If you want to eventually create and document your own API, you can also use Postman to develop and test it out.<br><br><\/p>\n\n\n\n<p>Below is the more specific request executed in Postman:<br><br><\/p>\n\n\n\n<div class=\"wp-block-image\"><figure class=\"aligncenter\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"657\" src=\"http:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-09-at-1.40.05-PM-1024x657.png\" alt=\"A screenshot of the Postman workspace.\" class=\"wp-image-1573\" srcset=\"https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-09-at-1.40.05-PM-1024x657.png 1024w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-09-at-1.40.05-PM-300x193.png 300w, https:\/\/studentwork.prattsi.org\/dh\/wp-content\/uploads\/sites\/4\/2019\/05\/Screen-Shot-2019-05-09-at-1.40.05-PM-768x493.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure><\/div>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Now what?<\/strong><br><\/h3>\n\n\n\n<p>Once you\u2019ve honed your research question and gathered some data, you need to figure out how you want to further refine your data to better explore your question. Your next steps will depend on how you want your data to be structured, or what format you need it to be in to work with various tools.<br><br><\/p>\n\n\n\n<p>Often folks creating DH projects will prepare their data in one or more CSV files, but JSON is generally too nested and complex to be directly converted to a CSV, which only works in two directions. You can, however, use a processing tool on the command line such as jq to manipulate and extract relevant data from JSON files, which you can then use to build a CSV. Below are a few tutorials to help you better understand arrays and the arrangement of data in JSON, and how to use jq to return certain bits of data from within them.<br><br><\/p>\n\n\n\n<p>Tutorials:<\/p>\n\n\n\n<p><a href=\"http:\/\/www.compciv.org\/recipes\/cli\/jq-for-parsing-json\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">Parsing JSON with jq<\/a><\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/shapeshed.com\/jq-json\/\" target=\"_blank\">JSON on the command line with jq<\/a><br><br><\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"http:\/\/openrefine.org\/\" target=\"_blank\">OpenRefine<\/a> is a tool you can use to clean your data. It records the actions you take in a JSON file that you can then export and store as part of your documentation. If you don\u2019t want to work with command line tools, OpenRefine can also be used to parse a JSON or XML file, which can then be converted to a CSV. This <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/programminghistorian.org\/en\/lessons\/fetch-and-parse-data-with-openrefine\" target=\"_blank\">Programming Historian tutorial<\/a> covers how to build more advanced API queries, including instructions on how to query the Chronicling America API directly in OpenRefine and then manipulate the returned JSON data. It will then take you through steps to better match or separate the data to columns, and remove data not of interest.<br><br><\/p>\n\n\n\n<p>If you found data you wanted online, but no API or direct download option, you can also try web scraping, which is often done with Python and <a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.crummy.com\/software\/BeautifulSoup\/\" target=\"_blank\">Beautiful Soup<\/a>. <br><br><\/p>\n\n\n\n<p>Tutorials:<\/p>\n\n\n\n<p><a href=\"https:\/\/acrl.ala.org\/techconnect\/post\/web-scraping-creating-apis-where-there-were-none\/\" target=\"_blank\" rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\">Web Scraping: Creating APIs Where There Were None<\/a><\/p>\n\n\n\n<p><a rel=\"noreferrer noopener\" aria-label=\" (opens in a new tab)\" href=\"https:\/\/programminghistorian.org\/en\/lessons\/?topic=web-scraping\" target=\"_blank\">Programming Historian web scraping tutorials<\/a><br><br><\/p>\n\n\n\n<p>When embarking on a project, remember to harvest and reuse data responsibly. Consider not just copyright\/licensing and the potential load you are placing through your requests, but ethically what you should share and how\u2014particularly if you plan to work with social media data. Often we work with information that represents aspects of the lives of people\u2014it\u2019s never just a number or cell in a spreadsheet.<br><br><\/p>\n\n\n\n<p>Recording what you asked for from where, and how you came to get the particular selection of data that you received can also help you better communicate the scope of your project and make clear to your users why certain data might not appear. Your manipulation and reuse of the data is a part of its changing lifecycle. It\u2019s important to record what you do and why as you refine and add to the data to further that transparency and allow future users to then build upon your research.<br><\/p>\n\n\n\n<p><br><\/p>\n","protected":false},"excerpt":{"rendered":"<p class=\"lead\">Heard about APIs, but still not really sure what they are or how to use them? This post will take you through: the basics of what an API is when you might want to use one what kinds of institutions or collections might have APIs of potential interest to DH practitioners where to look on these sites for access to&hellip;<\/p>\n<p class=\"more-link-p\"><a class=\"btn btn-danger\" href=\"https:\/\/studentwork.prattsi.org\/dh\/2019\/05\/13\/getting-data-for-digital-humanities-with-apis\/\">Read more &rarr;<\/a><\/p>\n","protected":false},"author":647,"featured_media":1571,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6,7],"tags":[207,44,208,217],"class_list":["post-1551","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-skillshares","category-student","tag-api","tag-data","tag-json","tag-spring-2019"],"_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/posts\/1551","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/users\/647"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/comments?post=1551"}],"version-history":[{"count":10,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/posts\/1551\/revisions"}],"predecessor-version":[{"id":1596,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/posts\/1551\/revisions\/1596"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/media\/1571"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/media?parent=1551"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/categories?post=1551"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/dh\/wp-json\/wp\/v2\/tags?post=1551"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}