Digital Humanities
@ Pratt

Inquiries into culture, meaning, and human value meet emerging technologies and cutting-edge skills at Pratt Institute's School of Information

Category: Skillshare Tutorials

Skillshare is an open-access knowledge resource of instructional posts and tutorials covering tools, skills, and templates for digital humanists.

Basic OCR (optical character recognition)

The first step in many digital humanities projects is to digitize whatever corpus is being used. In order to use digitized text without having to manually transcribe it or laboriously cut and paste it, Optical Character Recognition (OCR) is often used. Using two scanned pages from Virginia Woolf’s novel The Waves, the following screencast offers some basic instruction on how to OCR scanned pages. A few choices of OCR software are also briefly evaluated.

A free, community version of a popular OCR software is tested as well as a free, web-based version and the quality of these is examined. Finally a trial version of a moderately expensive software is tested and evaluated. By the end of the screencast, viewers should have information that will give them a head start when it comes to undertaking a small scanning/OCR project.

LIS-657 Digital Humanities – DHskillshare post – OCR demo from Lauren Spiro on Vimeo.

How to Write an HTML Page

HTML stands for HyperText Markup Language. It is a computer language used to write computer programs. HTML can be useful for digital humanists for connecting parts of websites that have been otherwise digitized. One starts the html document by writing <html> And </html> Then within these one types <head> </head> For the head section. For the body one types <body>…

Using XML to Code Charles Darwin

The term “greater than the sum of its parts” is a bit of a cliché, but there’s no better way to describe XML. When I was planning this Skillshare video, my in-house IT pointed out that “you can teach someone to use XML in five minutes.” And while that’s somewhat true, the concept behind using it as a markup language, and the reasons for doing so, are hard to separate out from the instruction.

So the video below explains a few things: What XML is, how it works, and why we use it at the Darwin Manuscripts Project, an ongoing project based at the American Museum of Natural History, where we transcribe Charles Darwin’s handwritten notes into plain text.

(The video plays best in high definition—click the little gear box at the bottom of the video screen and choose “720 HD”—and I also recommend viewing it as a full screen—click the bracket button at the bottom right-hand corner of the video screen.)

Introduction to Datacite

This screencast provides an introduction to Datacite, a not-for-profit organization formed in London in 2009, whose goals include establishing easier access to research data on the internet, increasing acceptance of research data as legitimate, citable contributions to the scholarly record and supporting data archiving that will permit results to be verified and re-purposed for future study.

This screencast takes viewers on a tour of the Datacite website and explores the purpose of Datacite as laid out by its founding members. It describes the advantages to researchers of citing data, such as enabling easy reuse and verification of data, which promotes scholarship and unfettered access to information. It also discusses things like how to cite data, which is similar to how one would cite a book or article. Further areas of discussion include participating member institutions, how to become a member of Datacite, and other opportunities for involvement such as conference and networking opportunities.

In addition to these areas, the various services offered by Datacite are outlined such as the Datacite Metadata Store and DOI Citation Formatter. Datacite also provides a variety of open access resources such as the DataCite Metadata Schema, which are explored in more depth herein.

It is my hope that this screencast will increase awareness about Datacite and encourage researchers, scholars and students to start or continue citing and sharing their data.
Caitlin Bronner