Comparing Flu Shot Site Locations and Neighborhood Populations


For this lab, I knew I wanted to focus on some aspect of public health in New York City but I was unsure of what, so I decided to browse the Health section of the NYC Open Data site to see what location-based data was available. I ended up coming across a data set that gathered all of the publicly accessible locations in NYC where New Yorkers could receive a flu shot. While the list contained over five hundred sites, it was obvious from just glancing over the data that they were not equally distributed throughout the city. Were there areas in New York City with high populations that didn’t have ready access to a flu shot site? I set out to create a map that would easily compare flu shot site locations with the population numbers of New York City.

For my inspiration, I sought out maps that either dealt with health, access to health-based sites, or maps that compared specific locations with choropleth-mapped data like population or income. The first map I found was Mapping Duane Reade, from I Quant NY, which maps which major drug store chains are most common in New York City and used by each neighborhood. The map creator color coded each New York City lot, as found on the Bytes of the Big Apple site, to match which of the four major drug store chains is closest. This map is very clear and easy to follow, although it would have been nice to have had an interactive feature where the user could focus on each drug store chain at a time. The second map compared the location of shootings to neighborhood income, originally from Reddit and posted by Gothamist. This map pinpoints the locations of shootings from Jan-Aug 2013 and colors neighborhoods based on reported income from the 2010 census. The map is mostly easy to read and understand thanks to the legend, although it is unclear what the difference is between the two different location markers for shootings. I would also like to see data from a full year rather than just part of a year. Additionally, I had some questions about the validity of the data used, as the creator admitted that the crime data did not come from “official sources” but was cobbled together through news/Twitter sources. The third map is from a community-driven project called Flu Near You, which seeks to track the spread of the flu in the United States. Users can report flu activity, which is then mapped. The map covers all fifty states, as well as United States territory like Puerto Rico. As the data is user-reported, it is questionable if reported symptoms are actually flu symptoms, but the site also provides an option to look at flu activity reported by the CDC.

Before anything, I needed to clean the data and assign proper geocodes. The data set provided latitude and longitude, but not specifically enough to actually map the locations of the sites. They also listed building number, street address, city location, state location, and zip codes all in separate columns, which made it difficult to read the address. I merged all of these columns and then ran them through Batch Geocode. As it was tricky to insert the new data into the already existing sheet, I created a new sheet and took information I needed from the original and added the complete addresses, specific latitude and longitude, and the accuracy rating from Batch Geocode. Using Google Refine, I filtered the locations that Batch Geocode had not been able to get an accurate geocode for, discovered where the errors were in the address and re-ran them to get a more accurate location.

I decided to use the Neighborhood Tabulation Areas, as they were created to project populations at a small area level and I thought that might show more areas with different populations. I was also unfamiliar with them and curious as to how they would look. The shape files were easily found on Bytes of the Big Apple, where I was also able to find data that converted the 2010 census population numbers from the census blocks to NTAs.


The interactive version of the map can be found here.

Once in CartoDB, I imported the flu shot location data set. Using the Wizard tool, I set the visualization type to category, so it would be easy to see what types of facilities are available publicly to New Yorkers. I chose colors that wold be easy to differentiate from each other, and avoided colors like red that might have a negative connotation. I also slightly reduced the size of the points so it would be easier to read the choropleth sections of the map when it is zoomed out (like above). Next, I imported the NTA shape files and population datasets. Using the Column Join feature, I merged these two files into one mappable data set. I set the visualization type to choropleth, choosing a green-based color range as it felt easy to read and didn’t seem to have negative connotations the way the default red-based color range did. I went back and edited the location point colors in the flu shot site layer so that there was no similarity between the coloring on both layers. I chose the Quantification setting to Quantile and the Color Buckets to seven, as these settings gave the map a better visual range for population and made the numbers more clear. I chose not to add any text labels, choosing instead to use tool tips that displayed the Facility Name and Address for the flu shot sites, and the 2010 population number and the NTA Name for the NTA areas. Finally, I edited the legend for clarity and added necessary information like Title and Data Source.


Looking at the city view, it is obvious that there are many flu shot sites available in Manhattan, especially in areas like the Upper East Side, the Upper West Side, Midtown, and Lower Manhattan. Above Central Park, flu shot sites become less and less frequent, though you do start to see some variety in the type of sites. Most of Manhattan’s flu shot locations are pharmacies, typically big name ones like Walgreens, CVS, or Duane Reade. In Upper Manhattan, other types of facilities appear like Public Clinics and Child Health clinics. Most of Manhattans flu shot sites are in areas that have populations in the lower or mid range. For example, Midtown has 19 flu shot sites, all pharmacies, and a population of 28,630 residents. Meanwhile, Central Harlem North/Polo Grounds has 3 sites, two clinics and a pharmacy, but a population of 75,282 residents. Because so many of the flu shot sites are pharmacies found in drug store chains, they may find it more profitable to operate in areas like Midtown, where many people work and residents tend to have higher incomes, as opposed to operating in residential areas like Central Harlem, where residents tend to have lower incomes.


The flu shot sites in Brooklyn are certainly less frequent and less highly concentrated than in Manhattan, but some similar patterns appear. The areas that have high populations do not not necessarily have a lot of flu shot sites, while areas that have lower populations do. Where most Brooklyn NTAs have one or two flu shot sites in their area, the Downtown Brooklyn area has 6 but a population of only 34, 495 residents. Meanwhile, East New York has a population of 91,958 residents and no flu shot sites. To be fair, this area is larger than Downtown Brooklyn but even areas of comparable size with high populations are underserved like Sunset Park East, which has a population of 72,340 and also no flu shot sites. Other highly populated areas of similar size don’t have many sites, like Bushwick South, with  72,101 residents and only two sites. Again, there may be an assumption that many people who live in these areas may have access to flu shot sites via their work neighborhoods, but there are certainly people who don’t, not just in Manhattan and Brooklyn, but in all five boroughs. This is concerning as people who cannot easily reach a flu shot site may delay getting the shot, which may lead to them getting the flu or endangering others. Flu shots are important, not only because they protect the individual but because they increase community immunity, which protects those who are unable to get the vaccine for medical reasons.

For future versions of this visualization, I would like to try a different shape file/population combination. The Neighborhood Tabulation Areas can be somewhat confusing and it is difficult to tell if population numbers might be off. For example, the NTAs include areas like Central Park and Prospect Park, labeling them as “park-cemetery-etc-borough name”. One would assume that these areas don’t have a population, but they typically do. The NTA that covers Central Park has a population of 1,849 residents. Are they people who live in apartment buildings ringing the park? An estimated homeless population? Or are these numbers an error from translating census blocks to NTAs? To be sure, I would need to try a different shape file/population combo and see if there are major differences between the maps.

I would also be interested to look at the average incomes of neighborhoods and how they compare to the prevalence of flu shot sites in those neighborhoods. Income can have a huge impact on accessing a service like receiving a flu shot. If a low-income New Yorker works in an area like Midtown or Downtown Brooklyn, they can walk to a nearby pharmacy to receive a flu shot. However, if a low-income New Yorker lives and works in neighborhoods without easy access to flu shot sites, then they may have to set aside additional funds and travel time in order to get a flu shot. This map would involve switching out the population data with income data, but it would be logical to continue to use NTAs, as this would make the income-based map easily comparable to the original population-based map.