{"id":9195,"date":"2018-04-19T11:12:32","date_gmt":"2018-04-19T15:12:32","guid":{"rendered":"http:\/\/studentwork.prattsi.org\/infovis\/?p=9195"},"modified":"2018-04-19T11:27:10","modified_gmt":"2018-04-19T15:27:10","slug":"from-treemaps-to-network-graphs","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/infovis\/labs\/from-treemaps-to-network-graphs\/","title":{"rendered":"From Treemaps to Network Graphs: Further Visualizations of Hierarchical Relationships in The Art Genome Project"},"content":{"rendered":"<h1><b>Introduction<\/b><\/h1>\n<p><span style=\"font-weight: 400\">This report explores categorical and hierarchical data from The Art Genome Project, Artsy\u2019s ongoing study into the characteristics and connections between artists and artworks, using Gephi, an open-source network analysis and visualization software. The goal of the resulting network visualization is to graphically represent abstract information hierarchies, providing a shared point of reference for internal stakeholders at Artsy.<\/span><\/p>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Background<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">As described in my previous post \u201c<\/span><a href=\"http:\/\/studentwork.prattsi.org\/infovis\/2018\/03\/01\/tagp-tableau\/\"><span style=\"font-weight: 400\">Visualizing Dynamic Hierarchies in The Art Genome Project<\/span><\/a><span style=\"font-weight: 400\">,\u201d Artsy\u2019s mission is to make all the world\u2019s art accessible to anyone with an internet connection. By partnering with the world\u2019s leading art auctions, galleries, fairs, and museums, the site functions as a free and powerful resource for users interested in art collecting and education. At present, there are over 800,000+ works of art and design featured on the platform by more than 80,000 artists.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The Art Genome Project (\u201cTAGP\u201d) is the classification system and technological framework that fuels Artsy. TAGP maps the characteristics, referred to as &#8220;genes&#8221; in-house, that connect artists, artworks, architecture, and design objects across time. There are currently over 1,300 genes of more than 40 types covering art-historical movements, subject matter, formal qualities, and more. Importantly, unlike tags, which are binary\u2014something is either tagged &#8220;chair&#8221; or not\u2014a gene is evaluated on a scale from 0-100 and then hand applied to an artist or artwork record by an expert contributor. While TAGP also uses tags to highlight specific iconography, motifs, and subject matter, dynamic genoming allows for greater nuance when capturing the conceptual and formal aspects of cultural heritage records.<\/span><\/p>\n<p><span style=\"font-weight: 400\">TAGP is a semi-structured controlled vocabulary, however it is not in compliance with ISO 25964 or related standards (ISO, n.d.) and its information hierarchy is relatively flat, going two levels deep at most. To meet these standards and create a deep taxonomy that is truly user-friendly, one of the first steps is to visualize its current structure and identify areas that are working and, conversely, need improvement to prioritize future iterations.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Previous work in Tableau Public resulted in a unified dashboard that visualizes quantitative analytic data through line graphs, packed bubble charts, and treemaps. Rather than focus on user engagement with gene pages however, this visualization studies the hierarchical connectivity of TAGP through network mapping.<\/span><\/p>\n<p><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Inspiration<\/span><\/span><\/p>\n<p><a href=\"http:\/\/wikiverse.io\/\"><span style=\"font-weight: 400\">Wikiverse<\/span><\/a><span style=\"font-weight: 400\">, a \u201cgalactic reimagining of Wikipedia\u201d as a cosmic web of knowledge by Owen Cornec provided initial inspiration for visualizing TAGP (2014). Although the interactive 3D visualization relies on JavaScript and webGL in lieu of Gephi and dwarfs the TAGP dataset of 1,301 genes with its 250,000 Wikipedia articles, Wikiverse\u2019s ability to cluster related concepts in a such a way that transforms abstract ideas into a navigable, representational tool directly informed this network graph.<\/span><\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-9197 size-large\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/wikiverse.jpg?resize=840%2C473\" alt=\"\" width=\"840\" height=\"473\" srcset=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/wikiverse.jpg?resize=1024%2C576&amp;ssl=1 1024w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/wikiverse.jpg?resize=300%2C169&amp;ssl=1 300w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/wikiverse.jpg?resize=768%2C432&amp;ssl=1 768w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/wikiverse.jpg?w=1328&amp;ssl=1 1328w\" sizes=\"auto, (max-width: 840px) 100vw, 840px\" \/><\/p>\n<p><b>Figure 1<\/b>. Screenshot of <em>Wikiverse<\/em> by Owen Cornec.<\/p>\n<p><span style=\"font-weight: 400\">Erin Gallagher\u2019s network visualizations of Twitter hashtags associated with #MarchForOurLives and #NeverAgain marches in March 2018 also served as inspiration for the TAGP mapping. Unlike the custom code of <em>Wikiverse<\/em>, Gallagher uses Gephi\u2019s plug-and-play GUI, specifically its built-in OpenOrd and Force Atlas 2 layout algorithms, to create the Twitter user-to-hashtag graph. Gallagher\u2019s visualization inspired the author to try changing the layout algorithm for the TAGP network visualization from the Frutcherman-Reingold algorithm to Openord, although ultimately the former was used in the final version.<\/span><\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-9198\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/march-for-our-lives.png?resize=840%2C488\" alt=\"\" width=\"840\" height=\"488\" srcset=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/march-for-our-lives.png?resize=1024%2C595&amp;ssl=1 1024w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/march-for-our-lives.png?resize=300%2C174&amp;ssl=1 300w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/march-for-our-lives.png?resize=768%2C447&amp;ssl=1 768w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/march-for-our-lives.png?w=2000&amp;ssl=1 2000w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/march-for-our-lives.png?w=1680 1680w\" sizes=\"auto, (max-width: 840px) 100vw, 840px\" \/><\/p>\n<p><b>Figure 2<\/b>. Screenshot of user-to-hashtag graph of 12,987 #MarchForOurLives tweets from March 23 to\u00a024 by Erin Gallagher.<\/p>\n<p><span style=\"font-weight: 400\">Finally, a network visualization found on the Gephi <\/span><a href=\"http:\/\/forum-gephi.org\/viewtopic.php?t=2314\"><span style=\"font-weight: 400\">forums<\/span><\/a><span style=\"font-weight: 400\"> that uses Gephi\u2019s GUI, Google Maps API, and Valve\u2019s Steam Web API to render a large graph of Steam Community members informed the clustering of the TAGP graph. Despite the link rot in the original forum post, the author was able to find the code on GitHub, along with an <\/span><a href=\"http:\/\/graphmap.net\/mapper\"><span style=\"font-weight: 400\">active link<\/span><\/a><span style=\"font-weight: 400\"> to the Steam Community network.<\/span><\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone size-large wp-image-9199\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/steam.jpg?resize=840%2C525\" alt=\"\" width=\"840\" height=\"525\" srcset=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/steam.jpg?resize=1024%2C640&amp;ssl=1 1024w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/steam.jpg?resize=300%2C188&amp;ssl=1 300w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/steam.jpg?resize=768%2C480&amp;ssl=1 768w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/steam.jpg?w=1920&amp;ssl=1 1920w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/steam.jpg?w=1680 1680w\" sizes=\"auto, (max-width: 840px) 100vw, 840px\" \/><\/p>\n<p><b>Figure 3<\/b>. Screenshot of Steam Community network graph.<\/p>\n<h1><b>Methodology<\/b><\/h1>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Data Collection and Transformation<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">This dataset was created by running SQL queries in Looker, a proprietary data analytics platform, against Artsy\u2019s AWS Redshift data warehouse. After querying all genes, types, and families, the results were exported to a local CSV file to be further transformed in OpenRefine, an open source desktop application for data wrangling.<\/span><\/p>\n<p><span style=\"font-weight: 400\">OpenRefine was used to shorten and standardize the names of the various gene types. For example, \u201cX &#8211; Automated Collector Category (Concept) (Display by Artwork, do not factor into similarity) DO NOT USE,\u201d \u201cX2 &#8211; Automated Collector Category (Content) (Display by artwork, DO factor into similarity) DO NOT USE,\u201d \u201cY &#8211; Automated Collector Category (Display by Artist, do not factor into similarity) DO NOT USE\u201d were all changed to \u201cCollector Categories\u201d in the interest of reducing the label lengths later in Gephi. To optimize the import process into Gephi, the column labels for \u201cGene Name\u201d and \u201cGene Type\u201d were changed to \u201cSource\u201d and \u201cTarget.\u201d<\/span><\/p>\n<p><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Visualization Creation<\/span><\/span><\/p>\n<p><span style=\"font-weight: 400\">The CSV file of the genome data was imported into Gephi, resulting in directed graph of 1,330 nodes and 1,301 edges. Labels were selectively added in the Data Laboratory mode for gene types, as listing every gene name would overwhelm the visualization. A warm color palette was chosen with automated gradient values calculated by Gephi based on node ranking by out-degree. Inspired by the aforementioned network visualizations, a black background with white labels and pale grey edges were selected to create a firework-like effect. Since the Gephi desktop application does not offer inline frame features for web publishing, the visualization was exported as a PNG file for display. <\/span><\/p>\n<h1><b>Results<\/b><\/h1>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-9200\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/tagp.png?resize=840%2C840\" alt=\"\" width=\"840\" height=\"840\" srcset=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/tagp.png?w=1024&amp;ssl=1 1024w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/tagp.png?resize=150%2C150&amp;ssl=1 150w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/tagp.png?resize=300%2C300&amp;ssl=1 300w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/tagp.png?resize=768%2C768&amp;ssl=1 768w\" sizes=\"auto, (max-width: 840px) 100vw, 840px\" \/><\/p>\n<p><b>Figure 4<\/b>. Network graph of The Art Genome Project hierarchy at one level of inheritance.<\/p>\n<h1><b>Future Directions<\/b><\/h1>\n<p><span style=\"font-weight: 400\">In the next iteration of this design, the author envisions additional attributes to capture the count of artists and artworks for each gene application and changing the weights in the edge table from a uniform 1.0 to reflect the term frequency-inverse document frequency (tf-idf) values. tf-idf weighting ensures that terms that occur frequently throughout the global genome (e.g. \u201cPainting\u201d) are weighted less heavily than genes that occur rarely (e.g. \u201cPolitical Figures\u201d), even if the gene for \u201cPainting\u201d has been applied at a value of 100. By including gene application counts and tf-idf values in the node and edge tables, a richer, more complex network graph could be generated that maps The Art Genome Project with greater nuance.<\/span><\/p>\n<h1><b>References<\/b><\/h1>\n<p><span style=\"font-weight: 400\">Cornec, O. (2014). <\/span><i><span style=\"font-weight: 400\">Wikiverse<\/span><\/i><span style=\"font-weight: 400\"> [Web application]. Retrieved from <\/span><a href=\"http:\/\/wikiverse.io\/\"><span style=\"font-weight: 400\">http:\/\/wikiverse.io\/<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400\">Gallagher, E. (2018, March 25). <\/span><i><span style=\"font-weight: 400\">#MarchForOurLives &amp; #NeverAgain<\/span><\/i><span style=\"font-weight: 400\"> [Blog post]. Retrieved from <\/span><a href=\"https:\/\/medium.com\/@erin_gallagher\/marchforourlives-neveragain-a59ee4a078cb\"><span style=\"font-weight: 400\">https:\/\/medium.com\/@erin_gallagher\/marchforourlives-neveragain-a59ee4a078cb<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400\">International Organization for Standardization. (n.d.). <\/span><i><span style=\"font-weight: 400\">The international standard for thesauri and interoperability with other vocabularies<\/span><\/i><span style=\"font-weight: 400\"> (ISO No. 25964). Retrieved from <\/span><a href=\"http:\/\/www.niso.org\/schemas\/iso25964\"><span style=\"font-weight: 400\">http:\/\/www.niso.org\/schemas\/iso25964<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400\">Large Steam network visualization with Google Maps + Gephi. (2012, November 18). Retrieved from <\/span><a href=\"http:\/\/forum-gephi.org\/viewtopic.php?t=2314\"><span style=\"font-weight: 400\">http:\/\/forum-gephi.org\/viewtopic.php?t=2314<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction This report explores categorical and hierarchical data from The Art Genome Project, Artsy\u2019s ongoing study into the characteristics and connections between artists and artworks, using Gephi, an open-source network analysis and visualization software. The goal of the resulting network visualization is to graphically represent abstract information hierarchies, providing a shared point of reference for&hellip;<\/p>\n","protected":false},"author":471,"featured_media":9197,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[149],"tags":[39,89],"coauthors":[],"class_list":["post-9195","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-labs","tag-gephi","tag-network-graphs"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/04\/wikiverse.jpg?fit=1328%2C747&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paBdcV-2oj","_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/9195","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/users\/471"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/comments?post=9195"}],"version-history":[{"count":4,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/9195\/revisions"}],"predecessor-version":[{"id":9208,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/9195\/revisions\/9208"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media\/9197"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media?parent=9195"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/categories?post=9195"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/tags?post=9195"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/coauthors?post=9195"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}