{"id":9515,"date":"2018-05-09T10:31:18","date_gmt":"2018-05-09T14:31:18","guid":{"rendered":"http:\/\/studentwork.prattsi.org\/infovis\/?p=9515"},"modified":"2018-05-13T13:47:45","modified_gmt":"2018-05-13T17:47:45","slug":"networked-tagp","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/infovis\/projects\/networked-tagp\/","title":{"rendered":"The Networked Art Genome Project"},"content":{"rendered":"<h1><b>Introduction<\/b><\/h1>\n<p><span style=\"font-weight: 400\">This report explores categorical and hierarchical data from The Art Genome Project (\u201cTAGP\u201d), Artsy\u2019s ongoing study into the characteristics and connections between artists and artworks, using Gephi, an open-source network analysis and visualization software. The goal of the resulting network visualization is to graphically represent abstract information hierarchies, providing a shared point of reference for internal stakeholders at Artsy. Research into the mental models of TAGP\u2019s abstract hierarchy for typical users was conducted through an unmoderated card sort on a thematic subset of the genome using Optimal Workshop, a proprietary web-based platform that offers a suite of user testing tools. Recommendations for future iterations of this visualization based on this user research is included as well.<\/span><\/p>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Background<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">As described in my previous posts \u201c<\/span><a href=\"http:\/\/studentwork.prattsi.org\/infovis\/2018\/03\/01\/tagp-tableau\/\"><span style=\"font-weight: 400\">Visualizing Dynamic Hierarchies in The Art Genome Project<\/span><\/a><span style=\"font-weight: 400\">\u201d and \u201c<\/span><a href=\"http:\/\/studentwork.prattsi.org\/infovis\/2018\/04\/19\/from-treemaps-to-network-graphs\/\"><span style=\"font-weight: 400\">From Treemaps to Network Graphs: Further Visualizations of Hierarchical Relationships in The Art Genome Project<\/span><\/a><span style=\"font-weight: 400\">,\u201d Artsy\u2019s mission is to make all the world\u2019s art accessible to anyone with an internet connection. By partnering with the world\u2019s leading art auctions, galleries, fairs, and museums, the site functions as a free and powerful resource for users interested in art collecting and education. At present, there are over 900,000+ works of art and design published on the platform by more than 100,000 artists and cultural makers.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The Art Genome Project is the classification system and technological framework that fuels Artsy. TAGP maps the characteristics, referred to as &#8220;genes&#8221; in-house, that connect artists, artworks, architecture, and design objects across time. There are currently over 1,300 genes of more than 40 types covering art-historical movements, subject matter, formal qualities, and more. Importantly, unlike tags, which are binary\u2014something is either tagged &#8220;chair&#8221; or not\u2014a gene is evaluated on a scale from 0-100 and then hand applied to an artist or artwork record by an expert contributor. While TAGP also uses tags to highlight specific iconography, motifs, and subject matter, dynamic genoming allows for greater nuance when capturing the conceptual and formal aspects of cultural heritage records.<\/span><\/p>\n<p><span style=\"font-weight: 400\">TAGP is a semi-structured controlled vocabulary, however it is not in compliance with ISO 25964 or related standards (ISO, n.d.). Its information hierarchy is relatively flat, going two levels deep at most. To meet these standards and create a deep taxonomy that is truly user-friendly, visualizing its current structure and identify areas that are working and, conversely, need improvement to prioritize future iterations is of highest priority to the genome team.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Previous work in Tableau Public resulted in a unified dashboard that visualizes quantitative analytic data through line graphs, packed bubble charts, and treemaps, while hierarchical connectivity one level deep was visualized with Gephi. This project extends the latter visualization, attempting to not only add another level of hierarchy to the graph but also to incorporate user experience research, narrowing the scope of the visualization and iteratively improving on the design.<\/span><\/p>\n<h1><b>Methodology<\/b><\/h1>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Card Sorting<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">Card sorting is a user experience (UX) research technique in which users organize individual topics into groups according to criteria that make sense to them (Sherwin, 2018). This technique is noted for its usefulness in identifying issues with a product\u2019s category structures, such as the information architecture on a website, by uncovering how users think about content. Open, unmoderated card sorting (versus closed or moderated) was selected due to its flexibility and speed\u2014users can organize content into groups with an online tool on their own time and results are immediately available for analysis. Extensive in-person debriefing was used to gain qualitative insights into users\u2019 rationale that might not have been discovered during the largely unmoderated session.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Of the 1,303 genes that comprise The Art Genome Project, 219 Content (e.g. \u201c<\/span><a href=\"https:\/\/www.artsy.net\/gene\/landscapes\"><span style=\"font-weight: 400\">Landscapes<\/span><\/a><span style=\"font-weight: 400\">,\u201d \u201c<\/span><a href=\"https:\/\/www.artsy.net\/gene\/portrait\"><span style=\"font-weight: 400\">Portrait<\/span><\/a><span style=\"font-weight: 400\">\u201d) and Concept (e.g. \u201c<\/span><a href=\"https:\/\/www.artsy.net\/gene\/political\"><span style=\"font-weight: 400\">Political<\/span><\/a><span style=\"font-weight: 400\">,\u201d \u201c<\/span><a href=\"https:\/\/www.artsy.net\/gene\/popular-culture\"><span style=\"font-weight: 400\">Popular Culture<\/span><\/a><span style=\"font-weight: 400\">\u201d) genes were selected as a subset for card sorting. This is an exceptionally large number of cards to work with, however the eight users selected were all TAGP catalogers (\u201cgenomers\u201d) and considered domain experts. This TAGP gene data subset was created by running SQL queries in Looker, a proprietary data analytics platform, against Artsy\u2019s AWS Redshift data warehouse and exporting the results to a local CSV file. OptimalSort, a card sorting software by Optimal Workshop, was used to input the CSV file and configure the study.<\/span><\/p>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Target Audience<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">The following user profile for the target audience of TAGP\u2019s internal stakeholders and users is based on the demographics of past and present genomers:<\/span><\/p>\n<ul>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Gender identity: Any<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Age: 25-35<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Income: Any<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Education: Master\u2019s degree or higher<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Web expertise: Expert<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Operating systems: Any<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Web browsers: Any<\/span><\/li>\n<li style=\"font-weight: 400\"><span style=\"font-weight: 400\">Other requirements: Encyclopedic knowledge of art history (prehistoric-present), expert research skills, and experience cataloging artists and\/or artworks in digital environment<\/span><\/li>\n<\/ul>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Data Collection and Transformation<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">This dataset was created in an identical manner as the sample dataset for the card sort\u2014SQL queries, Looker, and AWS Redshift. After querying all genes, types, and families, the results were exported to a local CSV file to be further transformed in OpenRefine, an open source desktop application for data wrangling. OpenRefine was used specifically for standardizing gene types and families, including merging certain redundant categories, in the interest of reducing the label lengths later in Gephi. However, a more granular dataset of genes, types, and families directly related to the card sort was kept relatively unedited for visualizing the user research.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Before importing the CSV data into Gephi, the cleaned data from OpenRefine was imported to Google Sheets to create node and edge tables. This included merging three distinct columns for \u201cGene Name,\u201d \u201cGene Type,\u201d and \u201cGene Family\u201d into two for \u201cSource\u201d and \u201cTarget,\u201d as well as adding another column for direction \u201cType\u201d (i.e. directed or undirected).<\/span><\/p>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Visualization Creation<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">The CSV file of the genome data was imported into Gephi, resulting in a directed graph of 1,334 nodes and 2,405 edges. Gephi\u2019s built-in ForceAtlas2 algorithm, a linear-attraction linear-repulsion model, was selected for the layout over ForceAtlas based on its improved performance for large networks. Tuning adjustments were made to calculate stronger gravity (g = 0.01) to attract nodes to the center and revent islands of gene clusters from drifting away. A warm color palette was chosen with automated gradient values calculated by Gephi based on node ranking by in-degree. Node size, label inclusion, and label font size were also determined by in-degree ranking. The same black background color was selected to make the paler nodes and edges stand out while keeping with the style established in previous visualizations. Since the Gephi desktop application does not offer inline frame features for web publishing, the visualization was exported as a PNG file for display.<\/span><\/p>\n<h1><b>Results<\/b><\/h1>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Card Sorting Analysis<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">Between the eight participants sorting 219 cards, a total of 210 unique categories were created. The smallest category of unique cards was a single gene, while the largest category was 42 genes. Participants created an average of 26 groups in a median time of one hour and 2 minutes, with 43.15 minutes as the lowest observed time and 161.15 minutes for the highest observed time. Five of the 8 participants completed their card sorts in New York, NY at Artsy HQ, with the remaining three finished their individual studies in North Adams, MA, Berkeley, CA, and New Delhi, India.<\/span><\/p>\n<p><span style=\"font-weight: 400\">After manually standardizing similar gene categories, inter-rater reliability (IRR) was calculated to determine which mental models of TAGP\u2019s hierarchy are shared between one or more raters with over 80% agreement (IRR &gt; 0.8). For example, three participants had an IRR of 0.61 for a category loosely called \u201cPerforming the Self\u201d which included the genes \u201c<\/span><a href=\"https:\/\/www.artsy.net\/gene\/self-as-subject\"><span style=\"font-weight: 400\">Self as Subject<\/span><\/a><span style=\"font-weight: 400\">,\u201d \u201c<\/span><a href=\"https:\/\/www.artsy.net\/gene\/personal-histories\"><span style=\"font-weight: 400\">Personal Histories<\/span><\/a><span style=\"font-weight: 400\">,\u201d \u201c<\/span><a href=\"https:\/\/www.artsy.net\/gene\/self-portrait\"><span style=\"font-weight: 400\">Self-Portrait<\/span><\/a><span style=\"font-weight: 400\">,\u201d \u201c<\/span><a href=\"https:\/\/www.artsy.net\/gene\/alter-egos-and-avatars\"><span style=\"font-weight: 400\">Alter Egos and Avatars<\/span><\/a><span style=\"font-weight: 400\">,\u201d \u201c<\/span><a href=\"https:\/\/www.artsy.net\/gene\/individual-portrait\"><span style=\"font-weight: 400\">Individual Portrait<\/span><\/a><span style=\"font-weight: 400\">,\u201d and \u201c<\/span><a href=\"https:\/\/www.artsy.net\/gene\/diaristic\"><span style=\"font-weight: 400\">Diaristic<\/span><\/a><span style=\"font-weight: 400\">.\u201d While there were a number of standardized categories that exceeded IRR &gt; 0.8, none of these had a number of participants who created similar categories within the high IRR standardized category greater than four (i.e. less than half of the participants would be in true agreement based on their sorting patterns). This means that either the ad hoc standardizations should be changed and\/or additional card sort studies on these categories should be conducted. Still, any categories with an IRR &gt; 0.8 were included in the revised visualization, regardless of the number of participants in agreement.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Detailed results of the card sort, including a standardization grid, similarity matrix, dendograms, and participant-centric analysis (PCA) can be viewed at <\/span><a href=\"https:\/\/www.optimalworkshop.com\/optimalsort\/tagp\/content-concepts-2018\/shared-results\"><span style=\"font-weight: 400\">https:\/\/www.optimalworkshop.com\/optimalsort\/tagp\/content-concepts-2018\/shared-results<\/span><\/a><span style=\"font-weight: 400\">. Note that these study results are password protected\u2014please email <\/span><a href=\"mailto:rachel@artsymail.com\"><span style=\"font-weight: 400\">rachel@artsymail.com<\/span><\/a><span style=\"font-weight: 400\"> for access.<\/span><\/p>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Visualizations<\/span><\/span><\/h3>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-9521\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome.png?resize=840%2C840\" alt=\"\" width=\"840\" height=\"840\" srcset=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome.png?w=1024&amp;ssl=1 1024w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome.png?resize=150%2C150&amp;ssl=1 150w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome.png?resize=300%2C300&amp;ssl=1 300w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome.png?resize=768%2C768&amp;ssl=1 768w\" sizes=\"auto, (max-width: 840px) 100vw, 840px\" \/><\/p>\n<p><b>Figure 1<\/b>. A network graph of The Art Genome Project.<\/p>\n<p><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"alignnone size-full wp-image-9523\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome-labels.png?resize=840%2C840\" alt=\"\" width=\"840\" height=\"840\" srcset=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome-labels.png?w=1024&amp;ssl=1 1024w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome-labels.png?resize=150%2C150&amp;ssl=1 150w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome-labels.png?resize=300%2C300&amp;ssl=1 300w, https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome-labels.png?resize=768%2C768&amp;ssl=1 768w\" sizes=\"auto, (max-width: 840px) 100vw, 840px\" \/><\/p>\n<p><b>Figure 2<\/b>. The same network graph with labels.<\/p>\n<h1><b>Future Directions<\/b><\/h1>\n<p><span style=\"font-weight: 400\">Based on user research, the next design iteration in this project would be visualizing the \u201cSubject Matter\u201d gene cluster in further detail, styling and exporting it using the same methods, and then updating the node and edge tables with the results from the card study by adding deeper hierarchies for standardization categories with high IRR values. <\/span><\/p>\n<p><span style=\"font-weight: 400\">Ideally, this same user research would be conducted through additional unmoderated card sorts for each gene family until the entire genome has been studied. These results could then be used to update node and edge tables, creating a series of iterative visualizations that better reflect the mental models of TAGP\u2019s internal stakeholders and users. Once a shared model is established and visualized for these \u201cback-end\u201d users, the entire method could be repeated with a new target audience\u2014the auctioneers, artists, art lovers, critics, dealers, educators, museum-goers, patrons, and students who visit the front-end of the site to discover, learn about, and collect art. If TAGP aims to be truly user-friendly, must be understandable to this audience in addition to the genomers who built it.<\/span><\/p>\n<h1><b>References<\/b><\/h1>\n<p><span style=\"font-weight: 400\">Sherwin, K. (2018, March 18). Card sorting: Uncovering users\u2019 mental models for better information architecture. Retrieved from <\/span><a href=\"https:\/\/www.nngroup.com\/articles\/card-sorting-definition\/\"><span style=\"font-weight: 400\">https:\/\/www.nngroup.com\/articles\/card-sorting-definition\/<\/span><\/a><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction This report explores categorical and hierarchical data from The Art Genome Project (\u201cTAGP\u201d), Artsy\u2019s ongoing study into the characteristics and connections between artists and artworks, using Gephi, an open-source network analysis and visualization software. The goal of the resulting network visualization is to graphically represent abstract information hierarchies, providing a shared point of reference&hellip;<\/p>\n","protected":false},"author":471,"featured_media":9521,"comment_status":"open","ping_status":"open","sticky":true,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[150],"tags":[39],"coauthors":[],"class_list":["post-9515","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-projects","tag-gephi"],"jetpack_featured_media_url":"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infovis\/wp-content\/uploads\/sites\/3\/2018\/05\/genome.png?fit=1024%2C1024&ssl=1","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paBdcV-2tt","_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/9515","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/users\/471"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/comments?post=9515"}],"version-history":[{"count":3,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/9515\/revisions"}],"predecessor-version":[{"id":9524,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/9515\/revisions\/9524"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media\/9521"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media?parent=9515"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/categories?post=9515"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/tags?post=9515"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/coauthors?post=9515"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}