{"id":8644,"date":"2018-03-01T11:57:14","date_gmt":"2018-03-01T16:57:14","guid":{"rendered":"http:\/\/studentwork.prattsi.org\/infovis\/?p=8644"},"modified":"2018-05-09T10:29:43","modified_gmt":"2018-05-09T14:29:43","slug":"tagp-tableau","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/infovis\/projects\/tagp-tableau\/","title":{"rendered":"Visualizing Dynamic Hierarchies in The Art Genome Project"},"content":{"rendered":"<h1><b>Introduction<\/b><\/h1>\n<p><span style=\"font-weight: 400\">This report explores analytical data from The Art Genome Project, Artsy\u2019s ongoing study into the characteristics and connections between artists and artworks, using Tableau Public, a free data visualization software. The goal of the resulting dashboard is to reveal salient features of quantitative genomic data that will aid in prioritizing content for future information architecture restructuring efforts.<\/span><\/p>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Background<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">Artsy\u2019s mission is to make all the world\u2019s art accessible to anyone with an internet connection. By partnering with the world\u2019s leading art auctions, galleries, fairs, and museums, the site functions as a free and powerful resource for users interested in art collecting and education. At present, there are over 800,000+ works of art and design featured on the platform by more than 80,000 artists.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The Art Genome Project (\u201cTAGP\u201d) is the classification system and technological framework that fuels Artsy. TAGP maps the characteristics, referred to as &#8220;genes&#8221; in-house, that connect artists, artworks, architecture, and design objects across time. There are currently over 1,300 genes of more than 40 types covering art-historical movements, subject matter, formal qualities, and more. Importantly, unlike tags, which are binary\u2014something is either tagged &#8220;chair&#8221; or not\u2014a gene is evaluated on a scale from 0-100 and then hand applied to an artist or artwork record by an expert contributor. While TAGP also uses tags to highlight specific iconography, motifs, and subject matter, dynamic genoming allows for greater nuance when capturing the conceptual and formal aspects of cultural heritage records.<\/span><\/p>\n<p><span style=\"font-weight: 400\">TAGP is a semi-structured controlled vocabulary, however it is not in compliance with ISO 25964 or related standards (ISO, n.d.) and its information hierarchy is relatively flat, going 2 levels deep at most. To meet these standards and create a deep taxonomy that is truly user-friendly, one of the first steps is to visualize its current structure and identify areas that are working and, conversely, need improvement to prioritize future iterations.<\/span><\/p>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Inspiration<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">Stephen Few\u2019s guidelines for representing quantitative data in <\/span><i><span style=\"font-weight: 400\">Now You See It<\/span><\/i><span style=\"font-weight: 400\"> (2009) provided both inspiration and reference for creating the line graphs in this dashboard. To quote Few: \u201cIf your objective is to see how quantitative values have change during a continuous period of time, nothing works better than a line graph. Lines work better than any other means to make visible the sequential flow of values as they have changed with the passage of time\u201d (2009, p. 150). Although Few notes the great deal of overplotting in Figure 7.19 \u201cWebsite Visitors by Hour\u201d makes comparison difficult (2009, p. 154), it still served as inspiration for \u201cFollowers by Gene Family\u201d in the dashboard.<\/span><\/p>\n<p><span style=\"font-weight: 400\">In searching for a Tableau-friendly network visualization inspiration, Ward, Grinstein, and Keim\u2019s recommendation of using treemaps for displaying hierarchical structures using the space-filling treemap technique illustrated in their book <\/span><i><span style=\"font-weight: 400\">Interactive Data Visualization: Foundations, Techniques, and Applications<\/span><\/i><span style=\"font-weight: 400\"> in Figure 8. 2 (2010, p. 273) provided inspiration for \u201cGene Hierarchies.\u201d The packed bubble chart in the dashboard was influenced by Ken Ferlage\u2019s <\/span><i><span style=\"font-weight: 400\">Word Usage in Sacred Texts<\/span><\/i><span style=\"font-weight: 400\"> (2017).<\/span><\/p>\n<h1><b>Methodology<\/b><\/h1>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Data Collection and Transformation<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">This dataset was created by running SQL queries in Looker, a proprietary data analytics platform, against Artsy\u2019s AWS Redshift data warehouse. Rather than attempt to create a single, complex query that would denormalize Artsy\u2019s relational data into tabular data better suited for visualization in Tableau, relational data was exported to a local CSV file to be further transformed in OpenRefine, an open source desktop application for data wrangling.<\/span><\/p>\n<p><span style=\"font-weight: 400\">Using OpenRefine, the dataset was transposed to turn the count of followers of a given gene each month into tabular data. Text faceting was used to decrease gene types from 40 choices to 30 with the intention of reducing overcrowding when creating visualizations. Beyond transposing and faceting, no further data transformation or wrangling was required. The dataset was then exported to CSV to be used for visualization creation in Tableau Public.<\/span><\/p>\n<h3><span style=\"text-decoration: underline\"><span style=\"font-weight: 400\">Visualization Creation<\/span><\/span><\/h3>\n<p><span style=\"font-weight: 400\">After connecting the dataset to Tableau Public (Desktop), there was a considerable amount of time spent experimenting with the drag-and-drop interface. Tableau Help provided excellent documentation for novice users such as the author, particularly articles on <\/span><a href=\"https:\/\/onlinehelp.tableau.com\/current\/pro\/desktop\/en-us\/buildexamples_treemap.html\"><span style=\"font-weight: 400\">building treemaps<\/span><\/a><span style=\"font-weight: 400\"> and <\/span><a href=\"https:\/\/onlinehelp.tableau.com\/current\/pro\/desktop\/en-us\/qs_hierarchies.html\"><span style=\"font-weight: 400\">creating custom hierarchies<\/span><\/a><span style=\"font-weight: 400\">. The measures \u201cGene Family,\u201d \u201cGene Type,\u201d and \u201cGene Name,\u201d were used to create one such custom hierarchy that was used by each visualization in the dashboard.<\/span><\/p>\n<p><span style=\"font-weight: 400\">The \u201cGene Hierarchies\u201d treemap was created first, using size to represent the count of distinct gene names given that there were many duplicates due to tabular data formatting. Color is used to represent gene families and carried through subsequent visualizations. The \u201cFollowers by Gene Family\u201d line graph uses the sum of followers on the vertical axis and Tableau\u2019s automatically separated date hierarchy along the horizontal axis using the custom gene hierarchy as its dimension. Since \u201cFollowers by Gene Family\u201d sums followers at the top-most level of the gene hierarchy, \u201cFollowers by Genes\u201d was created to illustrate the most followed individual genes at the lowest level of the hierarchy.<\/span><\/p>\n<h1><b>Results<\/b><\/h1>\n<p>[iframe src=&#8221;https:\/\/public.tableau.com\/views\/lis-658-lab2\/GeneDashboard?:showVizHome=no&amp;:embed=true&#8221; width=&#8221;90%&#8221; height=&#8221;500&#8243;]<\/p>\n<p><span style=\"font-weight: 400\">Not only does the \u201cStyle and Movement\u201d gene family contain the greatest number of distinct genes, it also has the most followers, regardless of month, quarter, or year. Of all the genes, \u201cEmerging Art\u201d from the \u201cStyle and Movement\u201d gene family has the largest number of followers. Interestingly, \u201cPhotography\u201d and \u201cPrints,\u201d two material genes, along with \u201cDesign\u201d and \u201cLatin America and the Caribbean\u201d rise towards the top of followed genes amidst the sea of other style and movement genes.<\/span><\/p>\n<h1><b>Future Directions<\/b><\/h1>\n<p><span style=\"font-weight: 400\">With more time and resources, the author envisions connecting Tableau to Artsy\u2019s Amazon Redshift database to keep data fresh. Although outside the scope of this data research agreement, adding more dimensions like gene pageviews, conversions, entrances, and search query frequencies, would create better metrics for identifying problems in TAGP\u2019s structure than solely relying on the count of followers. For instance, is there a high bounce rate on certain gene pages and if so, is this an issue of inaccurate labeling or poor content and layout? Are search queries suggesting what users want and cannot find in current genes? If so, should these be translated into new genes? By including more analytical data to the dashboard, many of these questions can be answered in the future. Adding more granular analytical data would also open up opportunities for new visualizations to add to the dashboard, like radar graphs for gene pageviews.<\/span><\/p>\n<h1><b>References<\/b><\/h1>\n<p><span style=\"font-weight: 400\">Ferlage, K. (2017). <\/span><i><span style=\"font-weight: 400\">Word usage in sacred texts<\/span><\/i><span style=\"font-weight: 400\">. Retrieved from http:\/\/www.kenflerlage.com\/2017\/07\/word-usage-in-sacred-texts.html<\/span><\/p>\n<p><span style=\"font-weight: 400\">Few, S. Few, S. (2009). <\/span><i><span style=\"font-weight: 400\">Now you see it: Simple visualization techniques for quantitative analysis<\/span><\/i><span style=\"font-weight: 400\">. Oakland, CA: Analytics Press.<\/span><\/p>\n<p><span style=\"font-weight: 400\">International Organization for Standardization. (n.d.). <\/span><i><span style=\"font-weight: 400\">The international standard for thesauri and interoperability with other vocabularies<\/span><\/i><span style=\"font-weight: 400\"> (ISO No. 25964). Retrieved from <\/span><a href=\"http:\/\/www.niso.org\/schemas\/iso25964\"><span style=\"font-weight: 400\">http:\/\/www.niso.org\/schemas\/iso25964<\/span><\/a><\/p>\n<p><span style=\"font-weight: 400\">Ward, M. O., Grinstein, G., &amp; Keim, D. (2010). <\/span><i><span style=\"font-weight: 400\">Interactive data visualization: Foundations, techniques, and applications<\/span><\/i><span style=\"font-weight: 400\">. Natick, MA: A K Peters, Ltd.<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction This report explores analytical data from The Art Genome Project, Artsy\u2019s ongoing study into the characteristics and connections between artists and artworks, using Tableau Public, a free data visualization software. The goal of the resulting dashboard is to reveal salient features of quantitative genomic data that will aid in prioritizing content for future information&hellip;<\/p>\n","protected":false},"author":471,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"template-fullwidth.php","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[150],"tags":[39,6,26,23,12],"coauthors":[],"class_list":["post-8644","post","type-post","status-publish","format-standard","hentry","category-projects","tag-gephi","tag-line-graph","tag-packed-bubbles","tag-tableau","tag-tree-map"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paBdcV-2fq","_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/8644","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/users\/471"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/comments?post=8644"}],"version-history":[{"count":9,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/8644\/revisions"}],"predecessor-version":[{"id":9518,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/8644\/revisions\/9518"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media?parent=8644"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/categories?post=8644"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/tags?post=8644"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/coauthors?post=8644"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}