I visualized total home runs of each player in Texas Rangers (MLB) history. Texas Rangers was established in 1961 as the Washington Senators, then the Senators moved to Arlington, Texas after the 1971 season. For this lab assignment, I only used data since 1972, the current Texas Rangers records.
1. Visualization References
Since baseball is all about statistics, there are plenty of data visualizations. I referred to some baseball related articles, especially made with Tableau.
- The History of the Single Season Home Run Record: the author visualized the single season home run records since 1927 – accumulative number of home runs per game. The author also visualized home runs of Babe Ruth, and following players who broke records. This visualization is done using Tableau.
- Visualization: Miguel Cabrera and his 44 Triple Crown-winning home runs: The author plotted all 44 of Cabrera’s home run flight paths in 2012, as viewed from the side at ground level.
- How Mariano Rivera Compares to Baseball’s Best Closers: Mariano Rivera is a former Yankee’s relief pitcher retired in 2013. The graph shows Mariano Rivera’s cumulative saves and other pitchers who have 100 or more saves in their career. Started with hand drawn sketch, the chart is created using R. The R chart was was then cleaned up and annotated using Adobe Illustrator for publication. (source: http://blog.revolutionanalytics.com/2012/05/mariano-rivera-nyt.html)
2. Preparing for data
1) Source: Lahman’s database offers free, reliable baseball statistics database. The Lahman Database has every player’s standard batting and pitching line for every year. Its limitation is that data is available only for single seasons, no monthly breakdowns.
2) File preparation: I used batting data among the set of 2016 comma-delimited version. The batting file contains all batting related data for every player since 1871. I filtered out Texas Rangers only, copied, pasted, and saved as a new csv file (batting_TEX.csv). The batting file contains playerID instead of players’ name. Lahman’s database offers master.csv, which has profiles of every player, so later I will join these two files in Tableau.
3. Visualization using Tableau Public
1) Connect to a file: Load “batting_TEX.csv”.
2) Join & Merge: Drag “master.csv” file to join. Users can choose one of inner, left, right, or full outer join options. I selected inner join. These two files share a column called “Player ID”. Since the master.csv files only has two seperate last and first name columns, I decided to create a new column called “Name Full”. Choose two columns to be merged, right click, and select “Create Calculated Field”. Then I put [Name First]+” “+[Name Last] and named the column as “Name Full”.
3) Create worksheet: Click “Worksheet” to create a new sheet. Drag “Year ID” to Columns and “HR” to Rows on the top of the worksheet. Drag “Name Full” to Color under Marks, and drag “Name Full” again to Label. To create a filter, drag “Name Full” to Filters.
4. Results
(Click the link in case you are not able to see the chart)
1) Yearly Texas Rangers Home Run Totals since 1972
2) Texas Rangers Home Runs since 1972: Running Total
At the SUM(HR), I added a quick table calculation -> running total, so the graph can show cumulative home runs of each player in Texas rangers.
5. Limitation
In the running total I would like to show accumulative numbers of home runs of players in only their career years in Texas Rangers. For example, Ivan Rodriquez played for Rangers 1991-2002, and came back in 2009 for an half season. Since the graph shows the number of home runs only happened in Rangers years, it might be misinterpreted that Ivan Rodriquez hit almost no homers during 2002-2009 seasons in Texas Rangers. (He actually hit total 88 homers in 2002-2009 seasons in Florida, Detroit, etc.) With the original data and current Tableau features, there is no way to express this.