Digital Humanities
@ Pratt

Inquiries into culture, meaning, and human value meet emerging technologies and cutting-edge skills at Pratt Institute's School of Information

Text Analysis with Historical Newspapers

Screenshot of the homepage of the website Text Analysis with Historical Newspapers

As many cultural heritage institutions rush to digitize materials, hoping to fulfill their missions of providing widely, openly accessible collections, one of the odd quandaries researchers face is an overwhelming amount of certain kinds of information.

Historical newspapers are top among those materials that have been digitized like mad. It makes sense, of course — these might be fragile items that could benefit from preservation by digitization, and they can be a rich source of cultural, historical information. But the sheer volume of digitized newspaper collections, which is only growing, might cause some people to stop before they even start exploring them.

Those who face down the challenge rely on “distant reading” by computationally driven textual analysis. For the novice, however, that last sentence is probably terrifying.

For researchers who have an interest in historical newspapers but aren’t quite sure how to start, I created a resource that will help beginners through the initial process. The website, Text Analysis with Historical Newspapers, provides the introductory information needed to produce a simple text analysis project. Employing the “Five Ws” that are often referred to in journalism (who, what, where, when, and why), the site answers the basic questions someone new to text analysis might ask. An intro provides grounding and outlines terms; existing project overviews offer research inspiration; there are collections of corpora and tools; and finally, there are suggestions for follow-up resources and policies to keep in mind when performing mining data.

Some considerations for the future of this resource mirror the considerations for many digital humanities projects. For instance, it will face some preservation issues. Linkrot is likely — many of the DH projects that are provided as examples, for instance, already have link issues on their own sites, so it could simply be a matter of time before the project sites themselves no longer exist, which will make links on my own site outdated.

Additionally, further research on the presentation possibilities for this information is needed. Looking ahead, it will be useful to investigate alternative options for page layouts that might display the information in a way that’s easier to read, but still matches the conversational, accessible tone of the information. Even so, as it is now, this resource provides a fun starting point for those who might be intimidated by text analysis, encouraging them to give it a try and perhaps add a new research tool to their kit.

Leave a Reply

Your email address will not be published. Required fields are marked *