2017 Houston Astros Regular Season


Final Projects, Visualization

Introduction

Visualizing sports data is popular among sports fans. Sports stats and records have been visualized in various ways. Those visualizations help to catch trends quickly, also entertain users rather by providing numerical statistics. For example, MLB.com releases post season projection graph around September. Based on the data from Fangraphs.com, the graph shows each team’s probability of winning the division or wild card, or any postseason berth. From users’ perspective, this helps users to picture which team has the highest/lowest possibility, or how a team’s possibility to the postseason has been changed. This graph works dynamically as users move around the cursor over the line, and provides great usability that users understand how probability has been changed over the season without disturbing efforts.

2017 MLB Postseason Projection Graph (source: MLB.com)

 

In this project, I worked on the visualization of the latest baseball data: 2017 Houston Astros regular season using Tableau Desktop. Houston Astros won the 2017 MLB world series for the first time of their franchise history since the team was founded in 1965. Before 2017, the team advanced to the postseasons once. The best playoff record is 2005 World Series, but the team failed to achieve the title with 0:4 against Chicago White Sox. Besides its World Series Champion title, Astros showed outstanding performance throughout the season. Astros recorded 101 wins, the fist 100+ wins since 1998, and Jose Altuve, the second baseman of Houston Astros, received the American League MVP title. Before the season, Houston Astros spent many years for their extensive rebuilding. They lost 111 games in 2013 after trading away nearly all their veteran players in return for prospects. Average payroll in 2017 of Houston Astros is 19th among all 30 MLB teams. My visualization will review the 2017 year of Houston Astros highlights, events, and performance of major players. Final visualization is published on Tableau Public server.

Methodology

1. Design Choices

Visualization of sports statistics is appealing sports fans. Also it is important to create stimulate users’ curiosity and interests for sport visualization.

Following infographic highlights 2012 San Francisco Giants championship, the year the San Francisco Giants won the World Series. Although this infographic is for print and not dynamic, this has influenced to bring up the topic and the supporting ideas. The upper half shows all records, events, highlights of 2012 San Francisco Giants, and the bottom half presents the 2012 post season games. Bar chart is a common way to present scores or winning percentage, but in addition, the author added a dotted timeline below the bar chart. Each dot represents win or loss of each game. On the bottom of the timeline, the author presented significant numbers in larger font size to draw attention from the audience. In terms of design, this infographic minimized the number of colors only by using the Giant’s representative orange, black, and cream colors. The author mixed graphics and texts harmoniously, and used simple layouts to handle various types of dataset.

San Francisco Giants: champions by Alejandro Colmenarez (Source: http://visualoop.com/infographics/san-francisco-giants-champions)

Another vis that I reviewed is “MLB Player Value”. The vis uses only on each player’s value and relative data: salary and WAR (Wins Above Replacement). This vis does not present players’ stats, or other elements, having users focusing on the player by given XVAL value.

MLB Player Value (XVAL) (Source: http://www.osmguy.com/2013/04/mlb-player-value-tableau-public-elite-8-contest-entry/)

I reviewed 2 other baseball visualizations and determined what to show on my visualization which presents Astros regular season overview.

  • 2017 regular season wins and losses
  • Timeline: Present events of Astors 2017 sequentially
  • Salary and WAR (Wins Above Replacement) of active players
  • Home Runs of each player: Visualize distance, angles, number of home runs into two graphs
  • Stats of top 10 hitters and pitchers

 2. Datasets and Design Process

Raw MLB datasets are mostly open to public uses and well organized and managed by some sports media company. With the basic datasets downloaded from various sources, some of them were cleaned and joined with other dataset.

1) 2017 Regular season wins and losses 

2017 Houston Astros Media Information from MLB.com contains all records and stats of 2017 season. I converted Houston Astros day-by-day table in the report to spreadsheet. For cleaning up, I changed formats, removed redundancies (asterisk, at mark), separate a column with win-loss records into 3 columns – scores, opponent scores, and win-loss. Timeline events are also referred from the same report.

In order to show Astros’ score and opponent score in a same column, I added a calculated field of the negative values of opponent scores, so that the graph presents opponent score opposite side of the x-axis.
For the timeline graph, I also created a calculated of MIN(0) so that all data are aligned on x-axis horizontally.

2) Home Runs per Ball Park during 2017 Season

I would like to show 1) which ball parks Houston Astros ever hit home runs in 2017, and 2) physical home run data depicted on the baseball field image, helping users grasp roughly how far and which direction the home runs went.

Ball Parks that Astros hit one or more home runs are presented on the U.S. map. I downloaded MLB stadiums address and longitude / latitude json file and converted it to csv format. In tableau, the longitude and latitude values are assigned to geographic roles to be pointed on the map. Team logos are saved as “shape” and assigned on each stadium manually.

To mark positions of homers on the field map, I used Home Runs data from ESPN Home Run Tracker. The table called “2017 Season Home Runs” contained horizontal / elevated angles and true distances of all home runs. The x and y coordinates of each home run are obtained using trigonometric function with horizontal angles and true distances. The x and y values are created in  calculated fields using following equations:

  • x = -([True Dist.]*COS(RADIANS([Horiz. Angle])))
  • y = [True Dist.]*SIN(RADIANS([Horiz. Angle]))

The field image is a general image to help users to understand the chart.

At first I created those two charts separately. Then I found a shared column – ball parks – from those two datasets: Houston Astros Home Runs and MLB ball park locations. I joined two table on the data source pane, and set the ball park map as a filter.

3) Salary vs. WAR

WAR(Wins Above Replacement) shows a player’s total contributions to their team in one statistic. It is all-inclusive and provides a useful reference point for comparing players. This is not an absolute index to show a player’s performance, but at least we can say that a player with WAR 5.0 is more valuable than one with 4.0. In general, scales for a single-season are:

  • 8+ MVP Quality
  • 5+ All-Star Quality
  • 2+ Starter,
  • 0-2 Reserve
  • < 0 Replacement Level

I placed salary data on the column and WAR on the row, and added average lines to help users roughly understand which players were paid less/more and performed better or not.

4) Top 10 Hitters and Pitchers based on WAR

These charts provide 2017 stats of top 10 hitters and pitchers of Houston Astros, selected and ordered based on WAR value. Filter actions can be revised using  “action” on the dashboard: I applied “excluding all values”. When users select all players, the filter acts in an opposite way: The fields of stats become empty when all players are selected.  This prevents users to see sum of all values which are not relevant in this case.

User Research

First draft for user research

1. Methods

User Research Methods 

After finishing the first draft, I conducted user tests for further iteration. I conducted two in-person interviews and observations together, and two remote user tests. Remote user testing is unmoderated, in a natural setting. In general, infographics and visualization are not informed or structured by the author or 3rd people. Users and readers percept as they read. In-person user test is beneficial in that I can receive immediate feedback, make observation, and to have conversation during the test. Also during the observation I can notice what parts users struggle at first.

For the in-person interview observation, I gave the user a brief information about the topic. Then I made observation first then verbally interviewed. For the remote tests, I emailed the link to 2 users and had them to give any types of feedback from the following point of view:

  • Design
  • Content
  • Composition / structure
  • Usability

Test user profiles 

I recruited 4 users who have different knowledge background and level of enthusiasm on baseball.

  • User 1: Diehard lifetime Houston Astros fan
  • User 2: Watches some interesting or major MLB games (e.g. post-season games), but not an Astros fan
  • User 3 : Does not watch MLB games, but enjoy other baseball leagues regularly
  • User 4:  Does not watch baseball games regularly, but able to read and understand baseball statistics. This user is interested in sports visualizations for informative purposes.

2. Findings and Takeaways

All the feedbacks from the user tests are randomly listed. Then they are broke down into 4 categories (content, design, structure, and usability), then prioritized issues to reflect in the next revision. This process is usually analyzed by affinity diagrams. A common way of creating affinity diagram is using different colors of sticky notes. Each issue is noted on one sticky note and classified into a couple of categories. Each color can represent categories or sections (e.g. which part the problem occurs) Affinity diagram is useful when multiple people work together as a group. Using a metaphor of the affinity diagram method, I colored each issue according to its category, combined them together later to find useful findings from the user tests.

User research results

  • Clarifying a target audience is the first thing to consider for visualization.
    : This will lead to prepare right datasets, and the design choices. If this targets for Astros fans, they want to know interesting subjects rather than basic, well-known information they already know. They want to be assured that how much Astros surpassed other teams or previous years. If target audience is people who are interested in baseball in general or who have little knowledge on baseball, the subjects should be more informative and detailed. As designers create personas at the early stage of design, I learned that persona of target audience should be in mind through the vis process.
  • Supporting data should be balanced.
    : Subjects on the first draft leaned toward batting stats. In that case the Win-Loss graph doesn’t sound relevant. If the vis shows general stats, subjects (pitching and hitting) should be balanced.
  • For interactive visualization, indicate users what actions they can take. 
    : Users may not be familiar with hovering to see tooltips or clicking symbols to filter out the data. For interactive visualization, users need to know cues for further actions. Informing what actions they can take will give better user experience.
  • Providing glossary for novice users. 
    : If using abbreviation is inevitable, provide glossary in an accessible form.

Iteration

I revised and redesigned the first draft reflecting the takeaways from the user research.

  • Add a brief description about main purpose and information of the visualization to deliver clear message to the users
  • Add glossary for baseball stats abbreviation for users with little knowledge on baseball
  • Create a timeline of significant events and news of Houston Astros in 2017 to enhance qualitative aspects
  • Divide the subjects visually so that users know which graphs are related and linked.
  • Include pitching stats and pitchers’ WAR & salary to subjects are balanced toward a single topic.
  • Replace to web-safe fonts. If the font used in the vis is not installed in users’ computer, Tableau presents alternative fonts. This may disarrange the layout.

Click here or the image below to see the final visualization in Tableau.

Final visualization

 

Future Development

As I learned from the user research, defining target audience should be done in early stage in sports visualizations. This project is informative visualization targeted more on general audience rather than so-called baseball fans by layering general stats in various ways. Next project will be something that draws attention from avid baseball fans. It will require more comprehensive understanding on baseball and MLB. Adding a compelling storyline on visualization also will be another challenge.