Info Vis Lab 2: TableauPublic


Lab Reports

Link to the Visualization

For this lab the goal was to twofold: to become familiar with OpenRefine and TableauPublic, and to create more complex visualizations using cleaned data. The first visualization is “Deaths by Gender and Decade,” which is a bar graph that displays the total deaths for each decade, split by male and female. The second visualization is “Death Totals by Year,” which displays the death totals for each year in a dot chart. The final visualization is “Deaths by Age Group,” which shows the number of deaths for each age group (in intervals of ten years) via a bar graph. These visualizations are designed to display the data with different filters so that alternate trends can be recognized and analyzed.

There were three visualizations that I used as inspiration. The first was “Age-adjusted Death Rates and Life Expectancy at Birth,” based on NCHS data. This visualization showed the country’s death rate from 1900-2013, with a filter for both Race and Sex. I used this as the basis for the “Deaths by Gender and Decade” data sorting, since I had the Sex data available. The second example I used was “Age-adjusted Death Rates for All Causes: United States, 1999–2015,” which came from the CDC. I liked the way this dot chart looked when using yearly data over a period of time, which is how the “Death Totals” data looks when filtered to a single decade. The third visualization is “Under-5 Mortality Rate, 2016” from healthdata.org. The age range visualizations in the top right section made me realize that I could group my age data, therefore making it easier to parse.

In order to create these visualizations, the original dataset had to be converted to a CSV file and then purged of unnecessary columns in OpenRefine. The data is from mortality.org, and it is a record of all the reported deaths in the United States from 1933-2014, with the subject’s age, sex, and year of death. TableauPublic was then used to create the three visualizations.

I chose these three visualizations because I wanted to display each of the facets that was included in the dataset. “Deaths by Gender and Decade” uses the Year and the Sex facets, “Death Totals by Year” uses just the Year facet and the raw death count, and “Deaths by Age Group” uses the death count and age facet (though the ages are grouped by decade).

I had some trouble with the “Death Totals by Year” visualization, mainly because the dataset originally had data for the year 2015. This data skewed the entire set because the year contained 640+ records, whereas the rest of the years didn’t have more than 450 records each. This resulted in a jump of over 2 million deaths, which heavily altered the visualization’s range Once I removed the 2015 data, the scale become much more reasonable. I also had issues with the “Deaths by Gender and Decade,” which was due to the unintuitive nature of the facets. I repeatedly tried to drag the Color filter over to the facets, instead of the other way around.

When it came to the “Deaths by Age Group” visualization, there were many entries that had a +/-5-year margin of error and were therefore designated as occurring at the midpoint of their perspective decades (i.e. 35, 45, 55, etc.). Since this would skew the data if it was presented in individual years, I grouped the data by decade, making it more accurate and easier to understand.

Overall, I found cleaning up my data and columns in OpenRefine useful, but I had to go back to it a few times once I started visualizing my data in TableauPublic. It was frustrating to have to go back and re-upload the data to TableauPublic after I removed the 2015 data, especially since it wasn’t clear that it would be an issue even after I looked at the data in OpenRefine.

In terms of usability, I often found TableauPublic to be incomprehensible and nonsensical. Instead of being able to rearrange data or filters, I ended up simply undoing steps until I was back to before the problem occurred. In one instance, I wanted to change the labeling of the Year axis to be listed by decades instead of individual years. This proved to be impossibly difficult, as there was no way to edit the axis without right-clicking on it. In this case, however, the years were so packed together that there was no space on the axis that didn’t correspond (when clicked) to a specific year. It turned out that the problem was that the Year data was set to “discrete” instead of “continuous,” so the program believed there to be distinct and unchangeable breaks between each year. I mainly focused on watching the tutorial videos on how to clean up and then input data, and I wish I’d spent more time looking at how to actually create the visualizations.

One feature I did like is the ability to use a visualization as a filter. By filtering my data via “Deaths by Gender and Decade,” I was able to notice the drop in child mortality (from ages 0-9) that occurs as the decades pass, even as the total deaths rise (See Fig. 1 and Fig. 2).

Fig. 1: 1930’s Data

Fig. 2: 1970’s Data

Ultimately, while I’m quite happy with the visualizations produced in TableauPublic, I need a lot more time to learn its quirks and shortcuts so that the next dashboard I create won’t be as much of a grind.

Visualization Examples

https://vizhub.healthdata.org/mortality/5q0-analysis

https://www.cdc.gov/nchs/data-visualization/mortality-leading-causes/

https://blogs.cdc.gov/nchs-data-visualization/deaths-in-the-us/

Data Source

http://www.mortality.org/cgi-bin/hmd/country.php?cntr=USA&level=1