Introduction
With this project, I explored an interest in the severe weather events that caused fatalities in the United States, between January 1st, 2011, and December 31st, 2015. About 10 years ago, I studied severe and unusual weather, in the pursuit of becoming a “Trained Spotter,” so this data was directly related to my personal curiosities. While I allowed myself plenty of room to experiment with the data, I did focus the manipulation of my visualizations on these 2 questions:
1/ In the last 5 year, what type(s) of severe weather events caused the most fatalities?
2/ If 1 type of event emerges as the most fatal, will 1 particular storm be revealed in the data or, will there simply be in an increase in the number of events categorized as that type?
Materials
First, using Excel, I created a dataset from the Storm Events Database (http://www.ncdc.noaa.gov/stormevents/) made publicly available by the National Oceanic and Atmospheric Administration’s (NOAA) National Centers for Environmental Information (NCEI). Then, I took this Excel sheet into Google’s OpenRefine (2.6-ro.2), for the following purposes:
- To eliminate fields that were irrelevant to my questions (ex. “Time Zone,” “Crop Damage,” etc.)
- To delete rows representing U.S. territories (Guam, Puerto Rico, U.S. Virgin Islands, etc.), as each of these events were represented by an “XX” in the “State” field, and this would complicate any attempt at mapping the events
- To group event types into larger categories (ex. “High Wind, “Thunderstorm Wind,” and “Strong Wind,” all became “Wind”)
Later, I also used OpenRefine to isolate tornadic events, in the creation of a second, much smaller dataset.
Both of these datasets, then, were taken into Tableau Public 10.0, where I made all of my visualizations.
Methods & Discussion
Initially, I wanted to illustrate the types of severe weather events, the number of fatalities, and time. Many of the example visualizations that I found representing similar data resorted to layered area graphs. With this type of graph, much is lost; as you can see in the US Weather Fatalities, 1940-2011 example, whole weather events are completely invisible for large stretches of time (ex. The problematically undefined “Other” category is invisible for more than 50 years, despite representing 1 of the highest spikes in the last decade). Also, in this graph, far too many colors are employed, making it difficult to decode. These are some things that I wanted to avoid, in creating my first visualization.
For my first design, in Tableau Public, I added “Date” to columns, “Type” to rows, and “Deaths” to color. I thought that the resulting heat map was effective, and so, started playing with elements inside of this framework. For example, I toyed with time measurements (ex. When looking at “months,” rather than “years,” the increase in severe weather events during warmer months became clear), and the addition of various “Mark Labels.” After adding the “Mark Labels” which represent percentages of the full table, I was surprised to see that, in the last 5 years, 2011’s tornadic events caused nearly 20% of all severe weather fatalities! This led me to wonder if there were more tornadoes that year or, if a small number of events caused a great deal of damage.
Again, the sample visualizations that I saw incorporating the “Number of Records” into similar facets with which I was already working, employed layered graphs. This example of a layered bar graph has the same problem with lost information as the layered area graph that I had presented earlier. Also, I found the differing units of measurement along the left and right sides of the graph to be confusing. So, once more, this stood as an example of what not to do.
Working with small multiples in my second visualization, I resolved some of the issues that I saw in the examples I have discussed here; I was pleased with the consistent units of measurement, and the monochromatic presentation. At this point, the “Mark Labels” made the table look messy, and anyway, their purpose was less apparent, so I removed them. This new graph revealed that there were, in fact, more fatal tornadoes in 2011 (85) than in all of the other 4 years combined (72). This would lead me to believe that there was not, necessarily, 1, particular event that caused a surge, but I wanted to take a closer look.
At this point, I went back into OpenRefine to isolate all of the fatal tornado events in the year 2011, and then, brought that new dataset into Tableau Public. Inspired by this example visualization which deals with similar content, I made a line graph, and later, added annotations. When I changed the time measurement to days rather than years, a huge spike emerged on 1, specific day. When I googled the date, I discovered that the most fatal tornado event in nearly a century occurred on that day! This led me to search for the other significant dates, in order to annotate all of the noticeable spikes on this third visualization.
Finally, since I had data about the states in which each of the severe weather events took place, I decided to make a quick map, in the true fashion of weather studies. While this did not support my goal of looking at trends across time, it ultimately supported my findings, as, sure enough, Alabama was the state most affected by 2011’s “Super Outbreak,” and Missouri was the state most affected by both of the smaller outbreaks. This does not explain the other prominent states, but that is a project for another day.
Future Directions
If I were to move forward with this project, I think that it would be interesting to zero in on each of the event types, as I did with 2011’s tornadic events, to take a closer look at each of the major happenings, and annotate all of the severest storms. On the other hand, I would also like to zoom out, and look at a much larger stretch of time, in order to a create a more impactful heat map; it was relatively time-consuming to collect 5 years’ worth of data, but with more time, it would be possible to go back as far as January of 1950, using the same Storm Events Database.