Introduction
In the American consciousness, the perception of aircraft fatalities is largely dominated by terrorist attacks, particularly 9/11. For this lab, I wanted to examine the causes and timing of fatal civilian aircraft crashes from 1993 until 2003 to determine where and when fatal crashes occur, and what are the most common causes of these crashes, excluding the 9/11 hijackings, as the huge loss of life as a result of those attacks- over 3,000 in total- are outliers and not representative of normal year-to-year operations, nor are coordinated, multiple-plane attacks the norm.
Mortality maps like this one were my primary inspiration for this visualization. While the color ranges result in some loss of detail fidelity and require some pre-existing geographical and political knowledge to understand fully, they do provide a quick and dramatic burst of information at a glance with even basic geographical knowledge; for instance, this map can quickly show us regions of the world with higher mortality rates, such as Eastern Europe and non-Northern Africa, while still allowing more in-depth analysis such as per-country comparisons should we desire to do so.
Methods & Materials
The dataset was obtained from the Information is Beautiful datasets and downloaded as a .tsv file. However, as the data had some formatting that was inappropriate for my use such as a single date field and required some minor typo cleanup and standardization work, I opened the datasheet into the OpenRefine 3.5.2 web browser tool in order to clean it for analysis. In this tool, I renamed some unclear fields, corrected typos in the data, and consolidated redundant data facets using the “Cluster and edit” feature. As the dataset had two “cause” columns, with one being a broad category and the other being a detailed cause (for example, “hydraulic failure” being a category of “mechanical” error), I also renamed this field to “detailed cause,” as I will not be examining fatality reasons to that degree of detail in this report, but I do want to preserve that column in case I wish to use it for further analysis.
After I completed the data clean-up step, I downloaded the cleaned-up data, again as a .tsv, and this time uploaded it to the Tableau Public web application to create the visualizations.
Visualizations
A color map of the world was used to show total deaths by the country in which the crash or accident occurred, with lower deaths being darker blue and higher darker red. Countries and regions with no crashes or no data appear as grey.
A line graph was used to display year-to-year fatalities in order to see any temporal trends or spikes in deaths that might indicate many incidents or high-fatality crashes.
Bar graphs were used to display deaths by cause category and fatalities by month. Causes of fatalities were divided into 5 categories:
- Criminal: Intentional downings caused by terrorist acts, hijackings, or attacks on the plane or persons aboard.
- Human error: Mistakes made by the pilots, flight crew, or ground crews.
- Mechanical: A failure of the plane’s systems not directly attributable to human actions, such as a flight control failure.
- Unknown: The primary reason for the fatalities is not known.
- Weather: Severe weather conditions such as storms or turbulance.
Each chart was left black as adding color would be unnecessary and distracting with such a small number of groupings, and as these are charts of loss of human life, would be disrespectful to the victims if added for simple aesthetic purposes.
Results and Interpretation
With the created visualizations, some things become apparent. Firstly, in the color map of total fatalities over the covered timespan, we can see that two countries have the highest fatalities by a good margin: Russia and the USA, even excluding the fatalities from the 9/11 hijacking, with some notable high fatalities in Nigeria and Indonesia.
In the year-to-year line graph, we can see that despite occasional spikes, there is a general downward trend of yearly aviation fatalities, dropping to less than half from 1993 to 2016.
In the causes graph, human error is by far the highest reason for loss of life. Criminal acts, 9/11 excluded, are in fact the least common cause of death falling below even causes that are unknown. It was surprising to me that weather was the third most common cause of death, though this can be enlightened somewhat by the next graph.
In the fatalities by month chart, we see that fatalities are highest in July, August, and December, though May is also close behind. Recall back from the map chart that the US and Russia are the largest sources of fatalities, and we can start to make some guesses as to why. Summer months in the Northern hemisphere, and Christmas in December, means more vacations. May is a little more difficult to tease out, but may also be due to more travel due to Mother’s Day, the end of winter, and graduation season for students.
Reflection
In future analysis with this data, I would like to test my hypotheses of the reasons for the month-to-month changes in fatalities using another dataset that shows vacations and overall air travel. Another interesting thing to look at would be to compare the overall population of countries to air travel fatalities. This dataset was fairly limited and focused in scope, and adding other datasets would allow me to make more insightful and detailed observations using the data directly, instead of just making inferences from the data largely based on pre-existing knowledge.