From Data Collection to Prototype to Interactive Tool: Adding a Gender View to the Linked Jazz Visualization


Visualization

Background

The Linked Jazz team at Pratt Institute researches new methods afforded by linked data to provide access to cultural heritage materials and gather data for research. The most popular component of our work is the network visualization tool, built by developer Matt Miller, also the co-director of Linked Jazz with Cristina Pattuelli. The graph is a visualization of a unique dataset created by Linked Jazz: a curated list of names derived from the semi-automated processing of transcripted interviews with jazz musicians. The graph is comprised of single-mode nodes: interviewed musicians and the people they mention, connected by directed edges to the mentioned people. The graph also allows a user to click into the specific ego network of a person in the graph.

Linked Jazz visualization tool representing relationships in the jazz community. Source: https://linkedjazz.org/network/

The existing Linked Jazz visualization tool representing relationships in the jazz community.
Source: https://linkedjazz.org/network/

 

LJ_Ella_Fitzgerald_ego_orig

An example of an ego network on Linked Jazz.
Source: https://linkedjazz.org/network/?person=Ella_Fitzgerald

In either the standard or dynamic view, hovering over any node provides more information about the person from Wikipedia. In the ego network view, the user can click to see specific passages from an interview pertaining to the represented related node pairs.

In an earlier blogpost for our Information Visualization class, I discussed my work with Linked Jazz to enrich the Linked Jazz dataset of 2000+ names with gender information through automated means by querying linked data resources. The main section of the blogpost described visualizing the results of this experiment using Gephi.

LJ_Linked_Data_Experiment

Resulting network graphs created in Gephi illustrating the results of the linked data experiment for acquiring a gender attribute (red: women; blue: men; gray: unknown/not found)

Those static visualizations served not only as a graph of the results and as methodological proof-of-concept, but also as an intermediate prototype towards building an interactive tool. That tool should allow users to independently explore the Linked Jazz network through this gender lens.

This blogpost describes the methods used to translate the prototype into a new gender view that can be easily integrated with the existing Linked Jazz platform.

Choosing the proper development tools and refining the gender data

A Sigma.js plug-in is available for Gephi to convert Gephi network graphs into interactive graphs that are viewable in any standard browser. After installing the plug-in and testing it with my prototype graph, I realized it would take a considerable amount of time to customize the settings and work with the HTML code for it to incorporate the design decisions of the prototype graph. Rather than invest time in Sigma.js, I decided to allocate my time towards learning the tools used to build the existing Linked Jazz network visualization: D3.js (Data-Driven Documents). Technically speaking, D3.js is a javascript library that takes advantage of DOM, HTML, SVG, and CSS to enable highly responsive visualizations. It pre-loads data and offers functions that allow select statements to be generated based on user behavior in order to transform these data-driven visualizations on the fly. After studying the basics of D3.js, I began to examine the files that power the Linked Jazz network visualization.

I was able to copy all the files needed directly from the Linked Jazz website. These files included:

  • The main HTML file;
  • Javascript files;
  • CSS files;
  • The RDF triples file for the names derived from transcript processing (found on the Linked Jazz data access page);
  • The RDF triples file of the relationships derived from transcript processing (found on the Linked Jazz data access page);
  • All necessary images for musicians, logos, menu items, etc.

This allowed me to create a local instance of the interactive network graph that I could then edit to add a gender visualization view. This was primarily achieved by adding new conditional statements for a gender visual mode.

At this stage, I also decided the visualization no longer needed to serve as a record of the data enrichment experiment, but should fulfill the purpose of being a working tool for analyzing gender in our network. Although gender had been acquired for the majority of names, a fair number still remained unidentified. A considerable amount of work was required to find reliable gender information for those entities, sometimes even requiring a manual search in interviews to look for gendered pronoun references to the person. After this was completed, I outputted a gender triples file of names and gender values using the FOAF (friend-of-a-friend) gender predicate, and added this file to allow me to create the gender view.

An example of a gender triple (Clora Bryant):

<http://dbpedia.org/resource/Clora_Bryant> <http://xmlns.com/foaf/0.1/gender> “female”@en .

Inherited design choices and the need for departures

Because this view would function within the existing Linked Jazz platform, the new view needed to conform to its appearance and behavior. The majority of the code was therefore reused. There were many aesthetic settings I left unchanged: representing the social network as a force-directed graph; scaling node size based on the number of connections; pinning a selection of nodes along the periphery of the network to create a less cluttered view and to prevent overlapping; and using user interaction, like hover and clicks, to isolate areas of the graph and to provide the user with more information. However, the gender prototype built in Gephi also served as a guide for necessary departures from the Linked Jazz graph. The concept of the gender view is more focused: It branches out from the representational and navigational goals of the original network graph to enable users to ask more specific gender-related questions of our dataset.

LJ_Full_Graph_Gender_View

The new gender view for the Linked Jazz network visualization tool.

This difference translated into a different application of color to the nodes and edges, one similar to the Gephi prototype–blue for men and red for women. This color scheme was used for the following three elements: node color, node outline color, and edge color. Edges were colored according to the gender of the person mentioned since mentioning is the signal information on the map for analyzing gender (as opposed to the gender of the person interviewed, which would only reflect who we decided to process at Linked Jazz).

LJ_Mary_Lou_main

Exploring the gender distribution of people mentioned by Mary Lou Williams in an interview.
(Interview from The Jazz Oral History Project at Rutgers University)

LJ_Sarah_Vaughan

The most mentioned woman in the Linked Jazz network to date, according to node size: Sarah Vaughan.

LJ_Count_Basie

Exploring the gender distribution of people mentioned by Count Basie in an interview.
(Interview from The Nathaniel C. Standifer Video Archive of Oral History: Black American Musicians Located in the African American Music Collection at the University of Michigan)

LJ_Duke_Ellington

The most mentioned man in the Linked Jazz network to date, according to node size: Duke Ellington.

Unlike the original overview, only people who have been interviewed are represented by their photos in the gender graph. All others are represented by colored nodes. This simplifies the ability to read the gender encoding. The “Creating triples” and “Rendering network” animation that appears while the network is generated was utilized to provide an explanation of this difference in representation: “Interviewed musicians are shown as images”. Also, by default, the Linked Jazz graph pinned only the 20 most connected individuals to the elliptical periphery of the graph. I increased this number to 60 to include all the interviews processed to date and split the distribution across a second smaller ellipse to prevent crowding. Another significant change was the addition of all nodes. In the original graph, only entities with two or more connections are displayed. In this gender view, all people should be represented to provide a more accurate depiction of gender across our set of interviews. Including all entities has the added benefit of offering a more personal portrait of the interviewed person through these small reference clusters. As a final step, I removed the pop-up info box that appears for a person upon hovering. Because many more nodes are now represented on the graph, the pop-up info box overstressed the visualization, especially since this information is provided in other views. Instead, I decided the only information shown on hover would be the name label for the node.

These design decisions have been carried over to the gender version of the ego network tool.

LJ_Ella_Fizgerald_ego

The gender view of Ella Fitzgerald’s ego network. The representation of her node as a color indicates that although she has been mentioned by many, an interview with her is not yet part of the Linked Jazz dataset.

Conclusion

This interactive gender network graph marries the design considerations of the earlier Gephi prototype with the aesthetic and functionality of the existing Linked Jazz platform. As stated above, this view should be considered more narrow and targeted in its goals through the providence of a gender facet. In designing the gender graph, different audience levels were considered: from the casual user where an overview representation of gender densities in the network is sufficient; to the intermediate user, where there may be an interest in drilling into single ego networks or seeing the exact names of people mentioned; to the advanced user, where the gender information is used to decide which passages in which transcripts to explore. Extending beyond the idea of visualization, a complete research tool would not only offer the ability to identify passages of interest through the inclusion of such facets, but would allow a user to then apply any discoveries towards querying the backend data (for example, using SPARQL) to retrieve a customized set of  transcript data for further use.