{"id":4783,"date":"2016-06-21T10:30:18","date_gmt":"2016-06-21T14:30:18","guid":{"rendered":"http:\/\/research.prattsils.org\/?p=4783"},"modified":"2016-06-21T10:30:18","modified_gmt":"2016-06-21T14:30:18","slug":"baseball-champions-network","status":"publish","type":"post","link":"https:\/\/studentwork.prattsi.org\/infovis\/visualization\/baseball-champions-network\/","title":{"rendered":"The Baseball Champions Network"},"content":{"rendered":"<p><strong>Introduction<\/strong><\/p>\n<p>In exploring network visualizations, I thought it would be interesting to build a network graph of relationships among baseball teammates.\u00a0 Specifically, I considered the case of teammates on MLB world championship teams.\u00a0 For the purpose of this exercise, I limited myself to champions since the advent of free agency, in 1977.\u00a0 The introduction of free agency to baseball has meant an increasing mobility of players between teams, which could potentially be reflected in a network graph, showing associations with multiple championship teams by certain players.\u00a0 I also expected that the network graph would highlight certain baseball dynasties, such as the New York Yankees of the late 1990s to early 2000s, and the San Francisco Giants of the last decade.<\/p>\n<p><strong>Inspirations<\/strong><\/p>\n<p>For the purpose of this exercise, a force-directed graph seemed to be most appropriate.\u00a0 As in the image below, my expectation was to see significant clustering for individual championship teams, with a small number of players acting as bridges between teams.\u00a0 Unlike the below image, where we see a single central node for each cluster, every member of a team is connected to every other teammate, with edges connecting the entire team.\u00a0 As a result, I would expect an even tighter clustering for individual teams.<\/p>\n<p><a href=\"http:\/\/support.sas.com\/documentation\/cdl\/en\/grnvwug\/62918\/HTML\/default\/viewer.htm#p0q343kxjyj36jn1e2z6lulkda3j.htm\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/i0.wp.com\/support.sas.com\/documentation\/cdl\/en\/grnvwug\/62918\/HTML\/default\/images\/nvgrph-force.gif?resize=360%2C317\" alt=\"Force-directed network graph\" width=\"360\" height=\"317\" \/><\/a><\/p>\n<p>The below graph is an example of a similar project, where individual baseball players are linked based on having played on the same team.\u00a0 We see a number of effects that I would expect in my own visualization, such as clustering of players who remained on the same team, with bridges of players who moved between teams.\u00a0 Additionally, I note the use of color to differentiate between clusters.\u00a0 I would expect this to be highly correlated to the individual teams played for, and it would be especially suited to highlight dynasties in my own visualization.\u00a0 A key difference between this network graph and my own project is my focus on individual championship teams, whereas the below graph represents the entire player universe over a shorter timespan.\u00a0 Looking at championships adds an element of increased familiarity as these have been the most visible teams and players, and also places a historical context onto the network as it draws these relationships over several decades.<\/p>\n<p><a href=\"https:\/\/griffsgraphs.wordpress.com\/tag\/network\/\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/griffsgraphs.files.wordpress.com\/2012\/07\/baseball_players_label_lowres.jpg?resize=840%2C840\" alt=\"Baseball players network graph\" width=\"840\" height=\"840\" \/><\/a><\/p>\n<p>To highlight team affiliations, it seemed appropriate to apply team colors to the clusters which my network graph revealed.\u00a0 The image below illustrates team colors for the 30 MLB teams, although the Florida Marlins used a different set of colors when they won their championships in the 1990s.\u00a0 For this graph, I selected the original teal colors for those Marlins teams.<\/p>\n<p><a href=\"http:\/\/chenglor55.deviantart.com\/art\/Major-League-Baseball-team-logos-436736322\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" class=\"aligncenter\" src=\"https:\/\/i0.wp.com\/pre05.deviantart.net\/c07f\/th\/pre\/f\/2014\/056\/f\/0\/major_league_baseball_team_logos_by_chenglor55-d780rwi.png?resize=840%2C586\" alt=\"Baseball logos and colors\" width=\"840\" height=\"586\" \/><\/a><\/p>\n<p><strong>Materials<\/strong><\/p>\n<p>My data was collected from the baseball statistical resource <a href=\"http:\/\/www.baseball-reference.com\/\">www.baseball-reference.com<\/a>.\u00a0 I looked up the team roster for every championship team between 1977 and 2015, including every player that played at least one game for that team during the regular season.\u00a0 Data were cleaned to remove extraneous characters representing the handedness of individual players.\u00a0 I then processed the data to create a one-to-one relationship between every player on an individual championship team.\u00a0 The nature of the teammate relationship means that the connections are undirected, and existing within a single mode, individual players.<\/p>\n<p>This dataset was then imported into the network visualization software, Gephi.\u00a0 Gephi was used to plot the data into a force-directed network graph.\u00a0 Individual node size was set to represent the number of connections for that player, with the expectation that this would highlight players on multiple championship teams.\u00a0 I then ran a clustering algorithm to break the nodes up into individual teams, and applied team colors to each cluster.\u00a0 Finally, node labels were applied to identify the players represented, with label size proportional to their number of connections.\u00a0 Below is the visualization that resulted:<\/p>\n<div id=\"attachment_4875\" style=\"width: 620px\" class=\"wp-caption aligncenter\"><a href=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2016\/06\/Network-of-Champions.png\"><img data-recalc-dims=\"1\" loading=\"lazy\" decoding=\"async\" aria-describedby=\"caption-attachment-4875\" class=\"size-medium wp-image-4875\" src=\"https:\/\/i0.wp.com\/studentwork.prattsi.org\/infoshow\/wp-content\/uploads\/sites\/2\/2016\/06\/Network-of-Champions-620x620.png?resize=620%2C620\" alt=\"Network of Championship Teammates, 1977-2015\" width=\"620\" height=\"620\" \/><\/a><p id=\"caption-attachment-4875\" class=\"wp-caption-text\">Network of Championship Teammates, 1977-2015<\/p><\/div>\n<p>&nbsp;<\/p>\n<p><strong>Results<\/strong><\/p>\n<p>The network graph displays a number of interesting features.\u00a0 Firstly, as expected, individual teams show significant clustering, appearing as very tight bundles in the image.\u00a0 One can also clearly observe the bridge players, typically occupying the spaces between teams.\u00a0 Coloring according to team clusters also achieved the desired effect of highlighting championship dynasties, especially the Yankees dynasty which dominates the center of the graph, in addition to the aforementioned Giants on the far right, as well as significant presence for the Boston Red Sox on the bottom right, and the St. Louis Cardinals on the top right.<\/p>\n<p>An unanticipated outcome was a more or less linear progression across time.\u00a0 This graph can be read from left to right, with 1970s championship teams appearing on the far left, 2010s teams on the right, and the 1990s-2000s Yankees appearing in the middle of the timeline.\u00a0 This arrangement makes sense, as there is less connectedness between players across decades than there is within decades, as player lifespan is rarely longer than 15 years or so.\u00a0 Therefore, the force-direction naturally pushes the more historically disconnected players away from each other to opposite ends.\u00a0 This lends to a natural historical readability of this graph, enhancing the information it represents.<\/p>\n<p>There are, however, a number of issues with the visualization, as well.\u00a0 The presence of dark team colors, especially the Yankees, makes the node labels difficult to read in places.\u00a0 The very tight clustering also results in a great deal of overlap between labels, which also affects readability.\u00a0 Although some color values were adjusted to differentiate between teams, there is still a great deal of similarity between some team colors, such as the reds of the Red Sox, Cardinals, and Minnesota Twins, and the oranges of the Giants and the Baltimore Orioles.\u00a0 While the player names help disambiguate these teams for readers who are familiar with these historical teams, the image itself lacks a certain clarity as a result.<\/p>\n<p>I also note that, while the clustering was effective at identifying individual teams, it was not quite perfect, and there are several instances of different teams being grouped together.\u00a0 This should be resolvable by adjusting the degree of clustering.<\/p>\n<p>Finally, because of the nature of the data, it is possible for one player name to represent two different players.\u00a0 One notable instance in this case is the node for Dave Roberts in the bottom center of the graph, who is connected to both the 1979 Pittsburgh Pirates and the 2004 Boston Red Sox.<\/p>\n<p><strong>Future Directions<\/strong><\/p>\n<p>There are a number of adjustments that could be made to address the issues mentioned with this graph.\u00a0 Regarding the data, it would be advantageous to use unique player IDs in order to disambiguate players such as Dave Roberts, and have these IDs associated with labels representing the actual player names.\u00a0 Colors should be adjusted to avoid dark-on-dark issues, and to help better differentiate between teams.\u00a0 This could also be helped by putting text labels by team clusters indicating the team name.<\/p>\n<p>The issue of overlapping labels is a bit trickier, but could be addressed by converting the visualization to an interactive format where labels only appear when the user selects an individual node.<\/p>\n<p>Finally, I would be interested to see the resulting graph if this exercise were extended over the whole of baseball history.\u00a0 New dynasties would emerge and new players would be highlighted.\u00a0 However, to accomplish this would require development of a script to process the data, as the method used here was very manually intensive and time consuming.\u00a0 Using a base dataset from a relational database could make this process more streamlined.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction In exploring network visualizations, I thought it would be interesting to build a network graph of relationships among baseball teammates.\u00a0 Specifically, I considered the case of teammates on MLB world championship teams.\u00a0 For the purpose of this exercise, I limited myself to champions since the advent of free agency, in 1977.\u00a0 The introduction of&hellip;<\/p>\n","protected":false},"author":171,"featured_media":4875,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"jetpack_post_was_ever_published":false,"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1],"tags":[78,81,89],"coauthors":[],"class_list":["post-4783","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-visualization","tag-baseball","tag-mlb","tag-network-graphs"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"jetpack_shortlink":"https:\/\/wp.me\/paBdcV-1f9","_links":{"self":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/4783","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/users\/171"}],"replies":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/comments?post=4783"}],"version-history":[{"count":0,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/posts\/4783\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/"}],"wp:attachment":[{"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/media?parent=4783"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/categories?post=4783"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/tags?post=4783"},{"taxonomy":"author","embeddable":true,"href":"https:\/\/studentwork.prattsi.org\/infovis\/wp-json\/wp\/v2\/coauthors?post=4783"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}