{"id":3947,"date":"2015-11-10T15:03:15","date_gmt":"2015-11-10T20:03:15","guid":{"rendered":"http:\/\/research.prattsils.org\/?p=3947"},"modified":"2015-11-10T15:03:15","modified_gmt":"2015-11-10T20:03:15","slug":"the-linked-jazz-network-through-a-gender_lens","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/infovis\/visualization\/the-linked-jazz-network-through-a-gender_lens\/","title":{"rendered":"The Linked Jazz Network through a Gender Lens"},"content":{"rendered":"<p><strong>Background and goals<\/strong><\/p>\n<p>Since 2014, I have been working as a research assistant for Linked Jazz, a Pratt Institute-based project that experiments with applying semantic web and linked open data (LOD) technologies to cultural heritage materials. The largest and most visible segment of our work is the visualization of relationships between jazz musicians.\u00a0The nodes of the graph are derived from transcripts of interviews with jazz musicians, with each node representing a person interviewed or a person mentioned. Each edge represents a directional \u201cknows of\u201d relationship between the person interviewed and the person being mentioned in the interview. To date, ca. 52 interviews have been processed and are represented in our graph.<\/p>\n<p>The past year, I worked with this set of names\u00a0towards enriching the entire list of people with a gender attribute. This was an experiment in tapping\u00a0linked open data resources via endpoints and APIs to enrich our dataset. Using\u00a0Python scripts, LOD resources, such as VIAF, DBpedia (representing the Wikipedia dataset), and MusicBrainz were queried to obtain a gender attribute. The details of this experiment lies beyond the scope of this lab report, but more details can be found in\u00a0<a href=\"https:\/\/linkedjazz.org\/enriching-the-linked-jazz-name-list-with-gender-information\/\" target=\"_blank\">my blogpost on the Linked Jazz website<\/a>.<\/p>\n<p>A final realization of this work would\u00a0be the enhancement of the current Linked Jazz tool with a\u00a0gender overlay and set of filters. \u00a0As a first step, however, I decided to use\u00a0our Gephi lab day to create preliminary visualizations of \u00a0the results of this data enrichment. I was less interested in quantifying gender division in our network than I was in representational methods that could adequately address the ways in which one might use a visualization to explore gender. I tried to think of questions a person might ask of the interviews, for example, \u201cWhat does the overall distribution of gender look like in this series of interviews?\u201d or \u201cDo women tend to mention other women more than men mention women?\u201d Another question might be, \u201cAre there any men who mentioned a lot of women or conversely no women at all?\u201d Because the current Linked Jazz visualization tool provides access to\u00a0actual transcript passages, the hints provided by a gender overlay would enable researchers to click further into the source material to read precisely what was said.<\/p>\n<p>A secondary concern was to also faithfully represent this as\u00a0an experiment in designing automated methods for data enrichment. For example, often no gender information was found for a person using these methods, in which case, the gender was always stored as \u201cunknown\u201d, regardless of whether the person&#8217;s gender was actually common knowledge.<\/p>\n<p><strong>Inspiration<\/strong><\/p>\n<p>The primary inspiration for my visualization is the original <a href=\"https:\/\/linkedjazz.org\/network\/\" target=\"_blank\">Linked Jazz network visualization tool<\/a>\u00a0that was originally developed by Matt Miller, co-director of Linked Jazz with Cristina Pattuelli and the developer. I have always appreciated how the network requires only a very basic introduction, or even none at all, to enjoy. Understanding it as an experiment in building tools on\u00a0LOD foundations requires more extensive orientation, but again, it is not required\u00a0to discover connections\u00a0and explore\u00a0these oral histories in this novel way.<\/p>\n<div id=\"attachment_3946\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/Linked_Jazz_full.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3946\" class=\"size-medium wp-image-3946\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/Linked_Jazz_full-620x346.png?resize=620%2C346\" alt=\"Linked Jazz visualization tool representing relationships in the jazz community. Source: https:\/\/linkedjazz.org\/network\/\" width=\"620\" height=\"346\" \/><\/a><p id=\"caption-attachment-3946\" class=\"wp-caption-text\">Linked Jazz visualization tool representing relationships in the jazz community.<br \/>Source: https:\/\/linkedjazz.org\/network\/<\/p><\/div>\n<p>A second series of visualizations that I found interesting to use as a model was created by the ViDi research group at UC Davis. These\u00a0visualizations represent the <a href=\"http:\/\/vidi.cs.ucdavis.edu\/projects\/AggressionNetworks\/\" target=\"_blank\">friendship and aggression network of students, grades 8-12<\/a>. Unfortunately, it is not stated whether it represents only one school or many schools. Nevertheless, because grade distribution was\u00a0an important clustering factor, I like that the first friendship network visualization shows this\u00a0clustering and then the second proceeds to represent gender distribution across these clusters.<\/p>\n<div id=\"attachment_3944\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/FriendshipGraph.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3944\" class=\"size-medium wp-image-3944\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/FriendshipGraph-620x463.png?resize=620%2C463\" alt=\"A network graph depicting friendships among school children, grades 8-12. Source: http:\/\/vidi.cs.ucdavis.edu\/projects\/AggressionNetworks\/\" width=\"620\" height=\"463\" \/><\/a><p id=\"caption-attachment-3944\" class=\"wp-caption-text\">A network graph depicting friendships among school children, grades 8-12.<br \/>Source: http:\/\/vidi.cs.ucdavis.edu\/projects\/AggressionNetworks\/<\/p><\/div>\n<div id=\"attachment_3945\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/GenderGraph.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3945\" class=\"size-medium wp-image-3945\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/GenderGraph-620x463.png?resize=620%2C463\" alt=\"The same network graph as above, now colored by gender. Source: http:\/\/vidi.cs.ucdavis.edu\/projects\/AggressionNetworks\/\" width=\"620\" height=\"463\" \/><\/a><p id=\"caption-attachment-3945\" class=\"wp-caption-text\">The same network graph as above, now colored by gender.<br \/>Source: http:\/\/vidi.cs.ucdavis.edu\/projects\/AggressionNetworks\/<\/p><\/div>\n<p>A third and final visualization that I found interesting\u2014less for any aesthetic beauty, than for some of the choices made in communicating information\u2014was created by someone named Marc Smith that was the subject\u00a0of a <a href=\"http:\/\/thesocietypages.org\/cyborgology\/2012\/08\/24\/post-asa-reflection-gender-networks-use-of-asa-hashtags\/\" target=\"_blank\">Cyborgology blogpost by Whitney Erin Boesel<\/a>. It represents tweets with the hashtag for the 2012 American Sociological Association meeting, with further information about the top hashtags from clustered groups. On a very immediate level, I like the use of face images to represent people in the network and that they are sized proportional to frequency. Densities in the network and nodes connected by no more than one or two edges are also easy to discern.<\/p>\n<div id=\"attachment_3943\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/Graph-1016.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3943\" class=\"size-medium wp-image-3943\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/Graph-1016-620x449.png?resize=620%2C449\" alt=\"A network graph showing the Twitter activity of an American Sociological Association meeting in 2012. Source: http:\/\/thesocietypages.org\/cyborgology\/2012\/08\/24\/post-asa-reflection-gender-networks-use-of-asa-hashtags\/\" width=\"620\" height=\"449\" \/><\/a><p id=\"caption-attachment-3943\" class=\"wp-caption-text\">A network graph showing the Twitter activity of an American Sociological Association meeting in 2012.<br \/>Source: http:\/\/thesocietypages.org\/cyborgology\/2012\/08\/24\/post-asa-reflection-gender-networks-use-of-asa-hashtags\/<\/p><\/div>\n<p><strong>Preparing the data<\/strong><\/p>\n<p>Linked Jazz publishes most of its datasets on the website, including the nodes and edges files used by Matt Miller to create the network visualization Gephi. In order to conform as much as possible to the existing tool for future integration, I downloaded his files to use as the base for my own.<\/p>\n<p>Our list of name entities consists of over 2000 people derived from interviews. Matching on the literal person name in Matt&#8217;s CSV nodes file, I used Python to append the acquired gender data from my experiment for 1330 names to the nodes file. There were several other groups of names that justifiably needed to be identified by hand (without undermining the experiment), and these were also added to the file, bringing the total number of people with positive gender data to 1564. The only gender values represented were those acquired programmatically: \u2018male\u2019, \u2018female\u2019, and \u2018unknown\u2019. In cases where no value was acquired, \u2018unknown\u2019 was added as the default. Since I also wanted to include photos of the musicians, but only for those who are the interviewees to avoid distraction from the main focus of the visualization, I downloaded these photos from the Linked Jazz website and added the file names to my csv file. The\u00a0Gephi plugin Image Preview needed to be installed, however, in order to enable images to be used for nodes.<\/p>\n<p>Once this preparation work was done, the amended node file and unaltered edge file were easily imported into Gephi to begin creating the network graph.<\/p>\n<p><strong>Creating the graph<\/strong><\/p>\n<p>The graph consists of 2006 nodes and 3646 edges and was created in Gephi using the Force Atlas algorithm. After experimenting with several versions, I realized that the graph communicated most effectively when the hubs were stretched out to show the edges and the fanning of nodes around interviewed subjects more clearly. Since the focus is gender, I changed the color of the nodes in Gephi\u2019s Data Laboratory tab according to the gender column value: blue for \u2018M\u2019, red for \u2018F\u2019, and gray for \u2018Unknown\u2019, complying with standard coloring conventions to ensure readability. Edges were set to the color of edge targets, since the focus is gender distribution in the content of the interviews, not in our choice of which interviews to process. The outline of the nodes was set to the source color to ensure that the gender of the nodes with images could be read, and the nodes were set to be sized proportional to rank (number of edge connections). This was my preliminary graph:<\/p>\n<div id=\"attachment_3941\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3941\" class=\"size-medium wp-image-3941\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke-620x620.png?resize=620%2C620\" alt=\"The Linked Jazz relationship graph depicted by gender. Images represent people interviewed, whereas nodes represent people mentioned. Edges and nodes are colored by the gender of the target.\" width=\"620\" height=\"620\" \/><\/a><p id=\"caption-attachment-3941\" class=\"wp-caption-text\">The Linked Jazz relationship graph depicted by gender. Images represent people interviewed, whereas colored nodes represent people mentioned. Edges and nodes are colored by the gender of the target.<\/p><\/div>\n<p>This graph also can be viewed as a troubleshooting tool for the Linked Jazz project and as a snapshot of the current state of open data resources with regard to gender information. An example of the former is that the loose nodes along the periphery represent the fact that the relationships for one processed transcript are no longer represented in our graph, and that the people from that transcript now exist as orphans in our database. An example of the latter point is that many entities did not acquire gender data by means of querying these open data resources, not even some of our interviewees, as evinced by the abundance of gray lines and nodes and a few gray-outlined images.<\/p>\n<p>In order to see the gender distribution more clearly, I decided to remove the nodes.<\/p>\n<div id=\"attachment_3937\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_female_male_target.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3937\" class=\"size-medium wp-image-3937\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_female_male_target-620x620.png?resize=620%2C620\" alt=\"The same Linked Jazz relationship graph with all nodes removed that are not interviewees.\" width=\"620\" height=\"620\" \/><\/a><p id=\"caption-attachment-3937\" class=\"wp-caption-text\">The same Linked Jazz relationship graph with all nodes removed that are not interviewees.<\/p><\/div>\n<p>And then further, I decided to use filter queries in the Overview window to only show edges by target gender type. The contrast in line density between blue and red is extremely clear.<\/p>\n<div id=\"attachment_3939\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_male_target.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3939\" class=\"size-medium wp-image-3939\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_male_target-620x620.png?resize=620%2C620\" alt=\"The Linked Jazz relationship graph filtered by edges where the target's gender is male.\" width=\"620\" height=\"620\" \/><\/a><p id=\"caption-attachment-3939\" class=\"wp-caption-text\">The Linked Jazz relationship graph filtered by edges where the target&#8217;s gender is male.<\/p><\/div>\n<div id=\"attachment_3940\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_unknown_target.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3940\" class=\"size-medium wp-image-3940\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_unknown_target-620x620.png?resize=620%2C620\" alt=\"The Linked Jazz relationship graph filtered by edges where the target's gender is unknown.\" width=\"620\" height=\"620\" \/><\/a><p id=\"caption-attachment-3940\" class=\"wp-caption-text\">The Linked Jazz relationship graph filtered by edges where the target&#8217;s gender is unknown.<\/p><\/div>\n<div id=\"attachment_3938\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_female_target.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3938\" class=\"size-medium wp-image-3938\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_female_target-620x620.png?resize=620%2C620\" alt=\"The Linked Jazz relationship graph filtered by edges where the target's gender is female.\" width=\"620\" height=\"620\" \/><\/a><p id=\"caption-attachment-3938\" class=\"wp-caption-text\">The Linked Jazz relationship graph filtered by edges where the target&#8217;s gender is female.<\/p><\/div>\n<p>In the last graph representing interviewees (photo nodes) of different genders mentioning women (the red lines), it is easy to see who mentioned many women and who mentioned one or none (with the caveat that some may only <strong>look<\/strong> like one or none and actually be more, if any of the mentioned people of \u201cunknown\u201d gender turn out to be women). On the other end of the spectrum, this loose graph also allowed me to identify denser intersections that represent a person who is mentioned by many people. I identified three large intersections in the mid-section of the graph and a fourth, lesser intersection in the lower half.<\/p>\n<div id=\"attachment_3942\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_female_target_marked.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3942\" class=\"size-medium wp-image-3942\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_female_target_marked-620x620.png?resize=620%2C620\" alt=\"Busy intersections in the Linked Jazz relationship graph filtered by women mentioned, identified at a glance.\" width=\"620\" height=\"620\" \/><\/a><p id=\"caption-attachment-3942\" class=\"wp-caption-text\">Busy intersections in the Linked Jazz relationship graph filtered by women mentioned, identified at a glance.<\/p><\/div>\n<p>When I inspected the nodes on each of those intersections, I found they were all well-known female jazz vocalists. By moving the filtered graph into a new workspace, I used further filtering and selection to add labels to those nodes. In order of ranked mention, those nodes are Sarah Vaughan, Ella Fitzgerald, Billie Holiday, and Lena Horne, as seen here (name label size relates to rank).<\/p>\n<div id=\"attachment_3936\" style=\"width: 620px\" class=\"wp-caption alignnone\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_big_women_with_Lena.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-3936\" class=\"size-medium wp-image-3936\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2015\/11\/LJ_gender_01forke_big_women_with_Lena-620x620.png?resize=620%2C620\" alt=\"The names of the women most mentioned in our network of jazz musicians.\" width=\"620\" height=\"620\" \/><\/a><p id=\"caption-attachment-3936\" class=\"wp-caption-text\">The names of the women most mentioned in our network of jazz musicians.<\/p><\/div>\n<p><strong>Future directions<\/strong><\/p>\n<p>As expressed above, the ideal\u00a0final version\u00a0of these visualizations\u00a0would be its integration as a tool on the current Linked Jazz platform to allow users, like jazz researchers and digital humanists, the possibility to easily explore the Linked Jazz social network and interview data through the lens of gender. If\u00a0this method proves successful, additional attributes could be added to the name nodes, like instrument played, date of birth, and places lived.<\/p>\n<p>In the meantime, however, the main objective will be to create an interactive prototype represented by the visualizations shown here, possibly by using the Gephi plugin Sigmajs Exporter. The main interactive functionality to implement would be\u00a0gender filtering on both the interviewees and people mentioned, as well as the ability to see the name labels for nodes\/intersections of interest on mouseover.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Background and goals Since 2014, I have been working as a research assistant for Linked Jazz, a Pratt Institute-based project that experiments with applying semantic web and linked open data (LOD) technologies to cultural heritage materials. The largest and most visible segment of our work is the visualization of relationships between jazz musicians.\u00a0The nodes of&hellip;<\/p>\n","protected":false},"author":242,"featured_media":3967,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[59,53,60,39,61,62,63,64,65,66,67],"coauthors":[],"class_list":["post-3947","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-visualization","tag-archives","tag-digital-humanities","tag-gender","tag-gephi","tag-interview","tag-jazz","tag-knowledge-organization","tag-linked-jazz","tag-linked-open-data","tag-python","tag-sparql"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paBdcV-11F","_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/3947","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/users\/242"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/comments?post=3947"}],"version-history":[{"count":0,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/3947\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media?parent=3947"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/categories?post=3947"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/tags?post=3947"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/coauthors?post=3947"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}