Wikidata Workshop

On Saturday, November 10, 2018, I attended the Wikidata Workshop. The event was organized by the Semantic Lab at Pratt, and the workshop was led by Megan Wacha, the Scholarly Communications Librarian at CUNY, and President of Wikimedia NYC. The purpose of the event was to learn more about Wikidata, an initiative associated with the Wikimedia Foundation, and to have an opportunity to work with Wikidata by editing and/or adding new records.

To first learn about Wikidata as an initiative, we first learned about how it relates to the Wikimedia Foundation. Before the Wikimedia Foundation, Wikipedia was established in 2001 to be a free and open online reference, which today is the 5th largest website in the world, the largest reference source on the internet, with approximately 15 billion views a month, and largely written by volunteer editors. Wikipedia is a multilingual and international website, meaning there are numerous interfaces of the Wikipedia homepage in different languages, and thus has implications for what information is spread across each language in Wikipedia. The Wikimedia Foundation was established in 2003 to field donations to maintain Wikipedia and sister wiki-based projects. It disburses funds to the different wiki-based projects which include such projects as Wikipedia, Wikimedia Commons, Wikibooks, and Wikidata. Wikidata is a free, linked database that serves as central storage for structured data across Wikimedia. And Wikimedia NYC, of which Megan is president, is separate from but affiliated with Wikimedia, and acts a bridge between Wikimedia and cultural institutions to improve records and increase access to information through Wikidata. With those relationships established, we were led through examples of why Wikidata is necessary, and how we can contribute.

A couple of the biggest issues with Wikipedia are issues of consistency and redundancy. I mentioned that Wikipedia can be read in multiple languages, but pages about the same thing across languages are not necessarily consistent. For example, say a page is written for the English Wikipedia about Gabriel Garcia Marquez, the page in Spanish Wikipedia that is written about him is different. Pages are not simply translated from other pages but often written in the language with whatever information is available in that language to the writer, and this is where Wikidata can step in to solve these issues. In Garcia Marquez’s Wikidata page, his date of birth, death, nationality and profession are entered; with the help of a bot, this information will be pulled and displayed across all the different in pages in different languages about him.

After learning about the foundations of Wikidata and how it works, our next task to edit some actual records. Collectively, we worked on editing the records for previous Lambda Literary Award winners, as Wikimedia NYC is trying to push to update the data for LGBT+ people. Something especially pertinent we discussed in relation to updating entries for LGBT+ authors was the debate surrounding linking gender identity to peoples’ pages. In the current structure of Wikidata, there is the option to enter ‘sex or gender,’ which leads to a conflation between sex and gender that is not an accurate representation of the lived experience of many people. The category is further restrictive because there are not many options available within the category, and because the way gender and sex are expressed across languages and cultures is different, there is no good translation for the categories. Some words may be more particular in English, but have no equivalent in Chinese, for example. This begs the question, should we further try to classify sex and gender in entries for Wikidata for everyone? Or not include it at all? But what if it’s important to the person’s work? And even if classification can be made to be more specific, flexible and translatable across languages, there is still the issue that, as Emily Drabinski wrote, ‘as we attempt to contain entire fields of knowledge or ways of being in accordance with universalizing systems and structures, we invariably cannot account for knowledges or ways of being that are excess to and discursively produced by those systems.’[1] Drabinski goes on to show that according to queer theory, it is not desirable to move toward an all-encompassing standardized system of knowledge organization, but rather to move toward an environment which there is a more consistent critical eye toward our organization schemes. The category of ‘sex or gender’ in Wikidata is a prime example of this, and of how material these issues of categorization can be, and thus, how important it is to consider carefully how we categorize them.

Wikidata, and linked open data in general, can be a way forward for information to be more flexibly and fluidly categorized, because it explicates relationships between information, rather than creating hierarchies of information. But it is still a knowledge organization scheme that has been unevenly applied across cultural institutions or simply ignored. The example of the category ‘sex or gender’ in Wikidata shows us it still is a necessity to be critical, and that no knowledge organization scheme is going to be finish the work of being critical of our classification systems.

-Taylor Baker, INFO 601-03

[1] Drabinski, Emily. “Queering the Catalog: Queer Theory and the Politics of Correction’ in The Library Quarterly: Information, Community, Policy, 94-111. Chicago: The University of Chicago Press, 2013.