Introduction to the Data
For this exercise in understanding TableauPublic, I utilized the National Material Capabilities dataset (v4.0) produced as part of the Correlates of War Project, an ongoing study begun by David J. Singer at the University of Michigan in 1963. This dataset contains annual values for six variables from 1816-2007 that are considered indicators of a given country’s material capabilities, and thus are often used to investigate international conflict throughout history. These six indicators are: military expenditure, military personnel, energy consumption, iron and steel production, urban population, and total population. The dataset includes values for 244 countries, as well as the CINC (Composite Indicator of National Capability) score for each country by year. The CINC score is calculated based on the six material capability indicators.
The dataset contains 14,200 records and because there are multiple variables to be considered, it presents vast possibilities for analysis. However, that also makes it difficult to interpret the information and decide what variables and relationships to focus on. Experimenting in TableauPublic facilitated my understanding of the data and shed light on possibilities for future exploration and factors that I need to address further before trying to use this data again.
Inspirational Visualizations
For inspiration, I sought out examples that communicate time, space, and/or correlations because of the many elements there are to consider in this dataset. First, I came across this U.S. Census visualization that shows increasing urbanization across U.S. cities from 1790-1890. I particularly aimed to utilize the time-slider seen here to dynamically demonstrate changes over time. The use of color and size in the encoding of the values across the map is also effective as it shows the user both distribution and difference in size. The combination of the map and line graph allows the user to see distribution over space and time on the map as well as the overall trend of population growth in the line graph.
Second, TableauPublic features this visualization in their gallery which shows worldwide fuel consumption through three different methods. Though not a time-series, I was inspired by the use of a scatterplot in conjunction with the world map. The scatterplot depicts the relationship between fuel consumption and production, which supplements the map that only encodes one variable, consumption.
Third, I was inspired by the Gapminder tool because it utilizes multiple elements of visual representations, including color, size, motion, and position, without being too overwhelming. Gapminder graphs show both correlation and change over time, which I hoped to achieve in my visualizations. This particular visualization of “200 Years that Changed the World” is similar to my data in its focus on comparisons between countries throughout a long period of history.
Methods and Results
The data was already normalized, so there was no need for manipulation in OpenRefine. There were some issues that needed to be addressed in TableauPublic before starting, however. First, the value “-9” is used in the data to represent missing values, which would cause a problem when graphing. This was resolved by filtering out the “-9” in each of my visualizations. Another issue was that monetary values from 1816-1913 are in British pounds, while values from 1914-2007 are in U.S. dollars. I added a reference line to note the change in currency (however, it appears the line did not get saved into the final version seen here and was most likely accidentally removed during editing).
I created three separate visualizations: a line graph, a set of line graphs in small multiples, and a chloropleth map. I immediately realized that I needed to narrow down the number of countries that would be shown in the graphs to avoid having excessive lines and points that would be unreadable. For the purposes of this lab period, I selected just four countries out of 244: China, Germany, Russia, and the U.S.
I first wanted to make a line graph (figure 1) that included all four countries, differentiated by color. I placed the year on the x-axis to compare trends over time. I assigned the primary energy consumption to the y-axis, but a line graph would be useful for showing the trends in any one of the variables here. Though there are clear trends that appear in this first graph, there are problems that immediately stand out. Most obvious is the decline and sharp rise in consumption by Germany between 1945 and 1991. This results from missing values during that time span. There is inconsistency in the data because not all countries have recorded values for the same years, which might have resulted in a misleading graph. In order to display two different variables on the same line graph, I next tried using small multiples (figure 2). Separate graphs for each country with the same scale for the year on the x-axis allows for comparison while avoiding clutter. I wanted to include variables other than energy consumption this time, so I plotted total population and military expenditures to show trends over time as well as look for possible correlations between population and the amount of money spent on the military. Also, I converted the scales on both y-axes to show absolute values instead of population and expenditures by thousands as they are presented in the original dataset. This was not necessary, but I wanted to make the values more clear.Ultimately, the small multiples appear to be the most compelling visualization at this stage. The difference between China and the U.S., for example, is immediately clear. We can see that as of 2007 the U.S. had a significantly smaller population than China, but much higher military expenditures. However, I am concerned that these graphs, too, are misleading because of the scale of the y-axis. Because of the huge range in values, some changes in the values appear smaller than they are, and vice versa.
I created a chloropleth map (figure 3) to incorporate more of the geographical aspect of the data. I did not limit the number of countries because values for all of the countries can be shown clearly on the map, as opposed to a jumble of lines on a graph. However, I would most likely change this now and show only the originally selected countries to make comparisons between the three visualizations easier. I filtered by color for military personnel, using a scale of red hues to encode the values. Only one color was necessary to represent the one variable. I chose military personnel because I thought it would be an effective complement to the variables addressed in the small multiples.
I tried adding a time slider by filtering by year, which created a time slider that could be adjusted by range. I wanted to show change by year, though, not by intervals of years. This problem was easily fixed by adding the year variable to pages instead of filters. Now, as the user drags the slider, they see yearly values. Displaying change over time in this way is interactive and may be more engaging to the user than a static line graph. It is striking to see the U.S. and Russia very saturated in 1945, for example, while in 2007 they are significantly lighter in color but still comparable in the number of personnel.
One problem with the map is that encoding this single variable is not very meaningful. Using the map with the other graphs, or adding another variable like population, would give the user more material to interpret.
In the last minutes of the lab, I quickly experimented with a scatterplot (figure 4) to see if I could represent even more variables on one plane and look for correlations among the variables I had already used in the other graphs. I tried plotting primary energy consumption on the x-axis and military expenditure on the y-axis, assigning colors to each country, and using the size of the data points to indicate population size. I also added another time slider, which causes the data points to move around the graph while also changing point sizes. Just as in the small multiples, scale is a problem here again because too many data points appear close to zero when that is not their value.Conclusions
There could be many possibilities for future analysis of this dataset, but also many problems. First, more countries can be included in the visualizations and grouped in various ways to show different relationships. For example, certain conflicts in history could be chosen to examine and the countries involved could be displayed for a particular range of years. Second, I was not able to include all six indicators and would like to do so. I also would like to incorporate the CINC scores. The colors used to encode the values also need to be reconsidered, primarily on the small multiple line graphs because the colors are too similar.
While there is potential for future analysis, I do think that another dataset would be a better option for visualizing a time-series. This data is unfortunately lacking consistency between years as not all countries have records for the same dates, which is a problem I did not realize the extent of until I was using the data in TableauPublic. It is also difficult to compare hugely different values on the same planes because the scale becomes misleading. Using percentages instead of actual values might be an option for making the different sets of values more comparable.
Sources
Bloodworth, C. (2012). [Graph]. Worldwide Fuel Consumption. Retrieved from http://www.tableau.com/learn/gallery/worldwide-oil-usage
Correlates of War Project. (2010). National Material Capabilities (4.0) [Data file and code book]. Retrieved from http://tinyurl.com/oak68fm
Gapminder. (2008). [Graph]. 200 Years that Changed the World. Retrieved from http://www.bit.ly/PLZIL0
United States Census Bureau. (2012). [Graph]. Increasing Urbanization. Retrieved from http://www.census.gov/dataviz/visualizations/005/