Flooding looks like waves crashing into beach houses. It looks like hurricanes turning roads into rivers. It looks like cars floating down the street. But what many discount is that it also looks like waterlogged intersections after routine rain, unusable crosswalks in densely packed inland neighborhoods, and basements at high risk of becoming lakes even though their owners are miles from the closest river or ocean. In this project I combine data from FEMA and 311 to tackle the common misconceptions that drive underestimation of flood risk. Through an interactive, map-based visualization I unpack what it means for FEMA to define an area as high risk, how this affects New York City, and what the city’s 311 record of routine flooding tells us about what to expect going forward.
I coded my data cleaning and analysis in Python, relying heavily on the GeoPandas package for manipulation of the spatial datasets, and created the visualization itself in ESRI’s ArcGIS StoryMaps. You can find all of my Python code on GitHub.
This decision of medium followed from my narrative goals. I wanted this project to:
- Intuitively illustrate for those with no or limited mathematical background how common misinterpretations of FEMA’s probability estimates can drastically underestimate flood risk
- Apply the correct interpretation of FEMA’s estimates to highlight the most vulnerable areas of New York City
- Supplement these abstract depictions of flood risk with relatable images of what routinely flooded neighborhoods in New York City look like
- Inspire users to build on the narrative visualization’s takeaways and further explore neighborhoods of personal interest to them
I chose ArcGIS StoryMaps because it has the capability to create visualizations that overlay a scrolling sidecar of text over a slideshow of visualizations. This format, commonly used by news organizations such as the New York Times, is well-suited to my goals because it is driven by the visualizations and contextualized by the sidecar text. To unpack some of these complex misconceptions I needed both tools.
Because many of the goals of my project are spatial in nature, the majority of the visualizations in my project are maps. ArcGIS StoryMaps unfortunately only has the capacity to create very simple maps (ex. dropping one marker on a location that you want to highlight), but not the interactive point and choropleth maps that I required. To get around this I used the following pipeline:
- Export the analyzed spatial datasets needed to populate the map from Python as geojson files (except for one which I had to export as a zipped shapefile because it exceeded the size limit of ESRI)
- Import each dataset into ESRI ArcGIS Online as a hosted feature layer (It’s possible to not host the data directly on ESRI’s server, but not with the dataset sizes I required.)
- Create an ArcGIS Web Map for each basemap configuration I wanted to include in my StoryMap (one satellite imagery and one dark gray canvas with state/coastal boundaries)
- Add each hosted feature layer as a layer in each Web Map and format (ex. define color scales, format clickable pop-ups, add/suppress geographic labels)
- Add a slide in the ArcGIS StoryMap sidecar template and import the Web Map with desired basemap as the background (ArcGIS StoryMaps conveniently let’s you select a ‘view’ for each Web Map, so you can select the layers from the Web Map that you want to display instead of creating a separate Web Map for each slide of the Story Map.)
- Add the sidecar text above each Web Map directly in the StoryMap software
The first few slides of the narrative visualization were the only aspect that required a different pipeline. These tackle the first goal of the visualization: Intuitively illustrate for those with no or limited mathematical background how common misinterpretations of FEMA’s probability estimates can drastically underestimate flood risk. To create these visualizations, I used Python’s Plotly package, manually defining the position of the red and blue dots to illustrate the point. For the slides that combine these dots with a timeline, I overlaid the resulting plotly png with a timeline in Google Drawing.
There are two angles from which to examine flood risk: what has happened in the past and what scientists model will happen in the future. In my visualization I chose to tackle the issue from both directions.
The Future: FEMA’s NYC Flood Risk Boundaries
The Federal Emergency Management Agency (FEMA) is responsible for conducting flood risk assessments for the 22,000 communities that elect to participate in its National Flood Insurance Program (NFIP), the primary source of flood insurance in the US. These risk assessments segment participating communities into risk groups based on the probability that a neighborhood will flood and the level of flooding that is expected. Based on this assessment, FEMA assigns each neighborhood a flood hazard categorization:
- High risk: 1% or higher chance of flooding each year
- Moderate risk: .2-1% chance of flooding each year
- Minimal risk: Less than .2% chance of flooding each year
New York City is one of these participating 22,000 communities for which FEMA publishes flood hazard maps. The raw data used to determine these hazard categorizations is accessible on their website. Using this data and documentation published by FEMA I was able to aggregate the dataset’s FLD_ZONE column into the above mentioned ‘high’ and ‘moderate’ categories and map the areas of NYC that FEMA estimates are most at risk.
Note: I chose to pull this data from FEMA’s most recent 2013 flood hazard maps even though full implementation of these maps is currently stalled by a dispute with the city government. The alternative maps are based on 1983 data and were clearly proven to severely underestimate flood risk when Hurricane Sandy hit in 2011.
The Past: 311 Records
When a street floods in New York City, citizens are able to report it via the 311 portal. To get a sense of where flooding has historically occurred in New York City and how this compares to FEMA’s estimates, I used the 311 API to pull all complaints from 2015-2020. Because the API has a limit of 50,000 records, this required me to batch my requests into multiple iterations. I identified the flood-specific complaints by subsetting out all records assigned to the Department of Environmental Protection where a cleaned version of the ‘descriptor’ column contained the keyword ‘flood’. I also imported a shapefile of the NYC zip codes (ie. the mappable outlines of each zip code) from the NYC Open Data Portal and merged this with the 311 complaints to make the data mappable. By using GeoPandas to project the zip code polygons to a coordinate reference system that preserves area, I was able to calculate the square mileage of each zip code and use this to normalize the 311 counts for a choropleth map. This ensures that the map emphasizes areas with the most 311 counts per square mile, as opposed to simply emphasizing zip codes that are the largest. To ensure that outliers weren’t driven by population, I also dropped all records that had the word ‘duplicate’ in the resolution_description paragraph. These demarcate floods that were reported by multiple people, and including them would highlight areas that have more people, not areas that have a disproportionate level of flooding. This zip code dataset also contained population data which I incorporated into the narrative.
All pictures used in the visualization were open source. They can be found listed in the citations section at the end of this post.
The numbers in the sidecar text in the first few slides of the presentation (unpacking how FEMA’s flood risk categorization compounds over time) comes from the following probability calculation that I performed:
FEMA defines a high risk area as one that has a 1% chance of flooding each year. What is the chance that it floods at least once in a century?
You might think that because the chances of it flooding in a given year are 1 in 100 and a century is 100 years, the probability that it floods in a century are 1%. This discounts the fact that this 1% probability is going to be repeated over and over though. The probability actually compounds as follows:
P(floods at least once in 100 years) = 1-P(does not flood at all in 100 years) = 1 – (.99)100 = 64%
The same way you’d be more likely to get hurt if you went sky diving everyday versus trying it out once, a high risk area is much more likely to flood in 100 years than it is to flood in one. As expected, the probability over 30 years is somewhere in the middle:
P(floods at least once in 100 years) = 1-P(does not flood at all in 100 years) = 1 – (.99)30 = 26%
This reality is confused by the fact that FEMA often refers to high risk areas as 100-year floodplains, a phrase that evokes the expectation that disaster will only strike once a century. Because so many interpret this high risk designation incorrectly, before showing which areas are high risk in New York City, I found it important to convey the correct interpretation of this statistic and the seriousness of the issue.
After building an initial draft of the visualization, I did think-aloud sessions with two potential users to get a sense of where my narrative was unclear or had room for growth. Given that I aim for this visualization to be comprehensible by a wide range of users, I deliberately chose to conduct this exercise with both a novice and expert user. Both participants were males and in their 20s (ideally I’d be able to do this with a larger, more diverse pool in the future). The first has very limited exposure to maps, interactive visualizations, and flooding science. The second has significant knowledge in all of these areas. I incorporated the results of this exercise into my final draft, discussed below.
Initial design decisions
In the initial draft of this visualization I made a number of deliberate design decisions intended to support my narrative goals:
- Progress from abstract to concrete: Part of the challenge with the public’s conception of flood risk is that humans have a tendency to depersonalize and ‘other’ destabilizing topics. Because it can be overwhelming to confront impending disaster, many hold on to some version of, “Flooding is so unlikely. Yes, I see it in the news, but those people must have built in an unsafe area. I’m sure that will never happen here.” I wanted this visualization to gently, but firmly provide a counterpoint to that logic.
This is not easy. A visualization can’t change its audience’s views if they become overwhelmed and defensive, yet challenging this ‘othering’ directly threatens people’s sense of safety. To try lessen the shock of this visualization, I decided to ease the audience into the more personal aspects of the issue. I started abstract, tackling FEMA’s definitions and probabilistic predictions, letting the audience initially maintain the facade that flooding is something that only affects ‘other people’. With the aid of this depersonalization, the viewer is equipped to honestly confront the undesirable odds of high risk flood areas, and ease into the narrative.
As the visualization progresses I become more concrete and personal, pointing out where exactly in New York is probabilistically high risk and then shifting angles to the undeniable by presenting where in the city has historically flooded. I then shift away from abstract choropleth maps to satellite imagery to give the visualization a more tangible feel, further attempting to not overwhelm the viewer by easing into this portion of the narrative. The imagery is first presented at a coarse, pan-New York level, the scrolling gradually zooming in on Flatbush, Brooklyn until the viewer is confronted with a high resolution view of the buildings and sidewalks that have frequently flooded over the past five years. The satellite imagery fades into the murky reflection of a person laden with shopping bags and trying to cross a street filled with puddles. The neighborhood, the viewer is compelled to admit, looks like any other in New York. The person in the water? “Well, that could be me.”
- Mundane headline imagery: It’s tempting to select a dramatic picture for the headline, but a wave crashing into a house or a street flooded to the roofs directly contradicts the point of this story: flooding is an everyday phenomenon. I selected the picture of the bench because it’s something involved in the mundane cadence of our day-to-day lives. I thought the black and white solemness of the image also appropriately set the tone for this narrative.
- Tackle the takeaway without tackling the math: In the first four slides of the visualization I tackle the misconception that an area which has a 1 in 100 risk of flooding each year has a 1% chance of flooding in a century. As the above math section demonstrates, the reason this is not true relies on probability theory related to compounding events. Given that I wanted this piece to be accessible to a wide audience, I decided to not try to tackle the math itself. Instead, I developed a visualization that anticipates people’s expectations and creates a moment of unignorable surprise. The page filling with red dots, juxtaposed to the previous slide which seemed so harmless, communicates the lesson: a 1 in 100 chance of flooding does not produce favorable odds over a lifetime.
- Keep map anchored in the same place throughout: Visual cognition research shows that viewers can more easily draw comparisons between images in a slideshow when an element is held constant. Throughout the slideshow the positioning of the map is my consistent anchor. This makes it much easier to see differences. For example, when the map goes from showing just the high risk area to the high and moderate risk areas, all the viewer sees is the moderate risk area appearing overlaid onto the previous view, when in reality the entire map is reloading. If the map were to move between the slides the viewer would need to refocus and it would take longer to process what the map was trying to show.
I also attempt to keep the color palette similar throughout, only deviating from this when I intentionally want to surprise the viewer. For example, I progress from a blue color palette to a red color palette when I shift from the FEMA data to the point map of all the 311 records. This emphasizes the dissonance between FEMA’s flooding expectations and the lived reality of New Yorkers.
- Dark background: I chose to give the visualizations a dark background throughout because I find this helps the message pop. I find a light background, especially with maps often distracts from what I want the viewer to focus on. This meant that in the choropleth maps I had to use color palettes that become lighter in the direction of emphasis.
- Progress from narrative to exploratory: The flow of this visualization was inspired by the martini glass structure described by professor’s Edward Segel and Jeffrey Heer in their 2010 narrative storytelling paper. “The structure”, they explain, “resembles a martini glass, with the stem representing the single-path author-driven narrative and the widening mouth of the glass representing the available paths made possible through reader-driven interactivity.” I find this structure compelling because the initial narrative element can serve as both inspiration and a tutorial, showing the viewer why it would be worth investigating this issue and what types of things they might want to explore. I tried to bridge the gap between these two sections by both leaving the pins of the heavily affected neighborhoods, while maintaining the capability to explore areas beyond these.
UX Results and Revisions
The results from the UX testing were helpful in refining my design. The users made the following observations while scrolling through the visualization.
User 1 (Novice)
This user was given the initial draft.
|User Observation||Reflection and Revisions|
|Didn’t understand why the math worked how it did, but was convinced by the visualization that high risk areas in fact have a 63% chance of flooding in a century.||I was surprised this had worked as intended!|
|The idea that light colors were the bad areas to focus on took a second to adjust to, but made sense once he could see the legend||I followed up and presented three different alternative color palettes. He chose the original. Ideally in the future I would get to test these on new participants.|
|Found the legends difficult to find||I agreed. Unfortunately this is a limitation of the software.|
|Felt the satellite imagery made it more relatable and personal in a good way, but the progression from the satellite imagery to the exploratory part in the initial draft was jolting and depersonalized the experience (In the initial draft I had a different image/format in this section.)||I used this feedback to revise the transition, making everything fall into the scrollable format and incorporating the picture of the person reflected in the puddle that you find in the final version.|
User 2 (Expert)
This user was given the final draft, which incorporated the feedback of user 1.
|User Observations||Reflections and Revisions|
|Loved the progression of the satellite imagery to the puddle picture because it gave the visualization a ‘human element’. Would have loved to see even more personalized pictures at the end of the narrative.||I 100% agree and hope to be able to incorporate these at some point. The open source images available are unfortunately limited for my rhetorical purposes.|
|Would be curious to see more at the end that digs into why, beyond impermeability, these areas flood.||This wasn’t surprising given that User 2 had significantly more background in urban planning and flooding than User 1. I think this is beyond the scope for a visualization intended for a wide audience, but is an area that I would love to explore in the future as an extension.|
- Legend: The legend is a critical component of most maps, but there is no way to customize the legend or make it appear automatically in the visualization. I found this to limit the comprehensibility of my visualizations.
- Importing data: It seemed unnecessarily complex to import the data and create the maps necessary for this visualizations. Even with all the data pre-analyzed and spatially-formatted it took multiple steps using the hosted feature layer and Web Map capabilities of ArcGIS Online to create the necessary maps. The fact that this was separate from ArcGIS StoryMaps, and a multi-step process every time I wanted to change something made it difficult to iterate quickly on little design tweaks.
I believe that this topic is increasingly important as our climate changes and once unprecedented floods become commonplace. People deserve to be aware of the risk that they face. I’m particularly interested in digging more into the following questions:
- What, other than impermeability, makes the 3 highlighted areas more likely to flood than surrounding areas? (as raised by User 2)
- How else can we help people intuitively grasp probabilistic projections of flooding?
- What types/levels of flooding do New Yorkers typically report via 311? (This is currently unavailable in the public records, but based on the questions in the ‘Report a flood’ form I think the DEP has the data).
I hope to have the opportunity to explore this further in the future.
Open Source Images from Pixabay