New York City Death Analysis (since 2007)


Charts & Graphs, Lab Reports

Introduction & Inspiration

Health is always an important topic for our daily lives. I’m always interested in reading articles about health and doing research on how might we live healthier. I got my inspiration for this lab from the World Health Origination. I got shocked that the top two causes of death in 2016 and 2000 are an ischaemic heart attack and stroke. Therefore, in this lab research, I would like to find out the causes of death in New York City and the relations between death and other potential factors.

Material

I found a dataset called the New York City Leading Causes of Death from NYC open data. This dataset could be traced up to 2007, and it was updated on Sep 14, 2019. It was provided by the Department of Health and Mental Hygiene (DOHMH). The cause of death is originated from the NYC death certificate, and it is issued for every death that happens here. It mainly contains the year, cause and gender.

Methodology

Before visualizing the data, I walked through the dataset and did the cleaning. I decided to use the death rate to illustrate the situation because the deaths are not solid enough without the whole population. Among the raw dataset, there were some “-” symbols in the death rate that could not be recognized as numbers, so I deleted them in Microsoft Excel. In this way, the death rate could be calculated precisely. Moving forward, I used Tableau Public for the dataset visualization. I also used it for grouping causes of death that are close to each other to make the graph clean and easy to understand.

Final Result

Link to the full dashboard.

Visualization-1 Different Causes of Death by Sex

I’m very interested in the leading causes of death in New York City, and how might sex have a relationship with them. In this graph, I would like to display the sum of the death rate by different causes and sex. To begin with, I made a graph of the sum of the death rate by different causes. I sorted the causes of death so that their sum of death is in descending order, and I grouped up some causes that were barely visible into “All Other Causes”. Then I applied different colors to sex. But the result was not clear enough for readers to compare between two sex. Therefore, I duplicated this worksheet and sort them by different sex. In my final dashboard, I placed the worksheets for male and female side by side so that it could be easy to compare. I assigned a warm color to females and a cold color to males to follow the readers’ common sense. The leading cause of death in New York City since 2007 is the heart attack, which is somewhat in line with the global trend. Another interesting finding is that some kinds of causes could be strongly related to sex, like assault and Alzheimer’s Disease.

Visualization 2- Difference in Death Rate

Another angle I’m interested in the death rate is the total trend during the past years, so my second graph is about the relationship between the sum of death rate and the time. But actually the sum of the death rate does not change a lot with time, for each year it goes around 4500-5000. In order to display the changes clearly, I switched to the difference in death rate per year, and now we can find out the in 2011 and 2013, there are increases in the death rate, other years are decreased.

Visualization 3- Death rate of different races

I’m also interested in how race could relate to the death rate. The third chart describes the death rate of different races. I used areas to represent the different percentages of the total death rate. Color pallet and annotations are also used to represent different races. The dimensions “Other Race” and “Unknown” are not included in this chart because there was nearly no death rate for them. We can find out that the White Non-Hispanic has the most death rate over the past years, and Asian and Pacific Islander has the least. Moreover, the trend does not change dramatically over time.

Reflections

This lab research is really interesting and practical, I find Tableau a very powerful tool. I can use it to produce nearly all kinds of graphs I can imagine. Features are also very handy. The same color pallet could be applied to the same dimensions among different worksheets, and the dashboard could be automatically updated after I make changes to the worksheet. 

Another important take away from this lab is that always talk to people and ask for feedback at the early stage of a project. At first, I created some worksheets and then got stuck on how I could further play with my dataset. It took me a long time before I asked for feedback from my classmates, from whom I get new inspirations and ideas on how might I display the data. For the coming research, I would like to talk more to people to get inspired.

Reference

  1. https://www.who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death
  2. https://data.cityofnewyork.us/Health/New-York-City-Leading-Causes-of-Death/jb7j-dta