October 1, 2016
Introduction:
For my first project in the Data Visualization class I wanted to work with a dataset that pertained to New York City residents, and chose one that targeted the leading causes of death for people who lived in the city. I was interested to see if certain diseases that lead to death are more prevalent in particular ethnic groups, and, further, if that differs between men and women within an ethnic group. My goal was to address these points, and to see if diseases truly target certain groups, at all.
Three Visualizations:
The first visualization that informed my design was a bar graph depicting the daily routine of creatives, similarly called ,“The Daily Routine of Famous Creative People”. The various types of creative people and the multiple targeted categories of the day (sleep, exercise, etc.) made me think about ways to work with my dataset – with different ethnic groups and the targeted categories of diseases that were identified therein. I liked the organization of a simple bar graph, the use of color, and how if you clicked on a color category you could see the data filtered, which was impactful.
This visualization from the Wall Street Journal called “Battling Infectious Diseases in the 20th Century: The Impact of Vaccines” has the most obvious ties to my own data in terms of the topic of focus – health in the U.S. over the course of years. There are some notable differences: the WSJ data was taken over the course of 70 years, while the NYC Open Data used in my lab spanned less than a decade; the WSJ data and visualizations reflected nationwide findings, where my data was centralized in one city; and the WSJ used heat maps. Given all that, I found it helpful to see that they had approached each disease with a separate visualization, to ensure clarity, and the colors used are those associated with the points they were trying to depict (red in the graph denotes a high level of illness/ danger, blue is very low levels of illness/ stability ).
The last visualization is from CNN and it shows diversity in the U.S. between different generations. What informed my design was the way the CNN visualization used color to represent different ethnic groups, which allows viewers to see the information in a clear way, and really aided in the overall effect.
Materials Used:
The dataset I used was called “NYC Leading Causes of Death”, sourced from NYC Open Data, where over 1500 datasets that are generated from city agencies and organizations, are available to the public online. While my data came from the Health section of the site, other data categories available included Education and Housing& Development. My data set spanned between 2007-2011, and reflected men and women from the following groups: Non Hispanic White, Asian& Pacific Islander, Hispanic, and Non Hispanic Black. Examples of the causes of death in the data were malignant neoplasms, heart disease, and pneumonia.
Tableau Public was the software utilized for the visualizations in this lab. In this software a spreadsheet (in my case saved as a CSV), an be accessed by Tableau, and then interactive data visualizations can be made.
Lecture slides were referred to, in specific those from 8/30 (of particular note were slides “Less is More” and the “Memorable Embellishments” of graphs).
Class readings that I mainly referenced were Hadley Wickham’s “Tidy Data”, and Stephen Few’s “Effectively Communicating Numbers” (in the latter the section A Step by Step Graph Selection and Design Process was very useful). Also helpful was Lindsay MacDonald’s “Using Color Effectively in Computer Graphics” to remind me of color association and taking into account users needs.
Methods Used : After I downloaded the dataset from NYC Open Data, saved as a CSV, I went back over the data to check it was “clean” and not cells in the sheet were empty. Once the data was clean, I could easily use it in Tableau. In my “Sheet” I began to explore the features that would help to build my visualizations, including dragging dimensions to different areas of the sheet, customizing colors for different ethnicities, and selecting the best visualization option in the “marks” section.
Once I assessed initial stages of the disease visuals I decided to edit out diseases that had every low showings across all ethnicities. I only included those in which 20% or more were represented, within any ethic group at any period of time. As an example, Asian& Pacific Islanders, of both sexes, had higher levels of pneumonia than many other groups, which had less than 20%, so I kept that disease in the data.
Results: It was interesting to see that men and women of the same ethnic groups did have slight variations in the charts. Overall, it appears that if a disease effects a group, it effects the men and women of that group. Asian& Pacific Islanders, of either sex, had a high risks of cerebrovascular disease and pneumonia/ influenza. Hispanics also shared a high risk of pneumonia/ influenza. Non- Hispanic Black men and women were at high risk when it came to diabetes. My final visualization, below, aided in my findings.
Future directions: It would be interesting to also work with current data on this topic, as the data set I used stops at 2011. Having updated data, say up to 2015, would give a more comprehensive view on health risks that impact New York City residents.