For my CartoDB project, I am interested in the distribution and characteristics of food retail stores in New York City. I find a dataset on data.ny.gov that has the location of all licensed food retail stores in New York State, and decide to use it for my analysis. For inspiration I look at three examples of data visualizations, all maps that deal with food retail in the United States.
The first example is a project that analyzes the ratio of bars to grocery stores across the country. This is achieved with a cartogram consisting of even-sized circles colored on a 7-step, dichromatic scale; the two-tone scale works well, because it shows a clear distinction between bar-dominated (more brown) and grocery store-dominated (more blue-green) areas, and uses a neutral white where the balance is even. The choice of white could maybe be reconsidered, as it seems to suggest a lower density of store or a lack of data, even though this is not the case at all – this is especially true because the visualization is displayed on a white background.
The second example visualizes the travel distance to nearest grocery store from different starting points across the U.S. With line segments spanning from a given starting point (distributed in regular intervals) and ending at the nearest store, I find the result visually very effective. Metropolitan areas across the East and West coast with high population density are, not surprisingly, characterized by short distances to the nearest store. Less populated areas with desert and mountains have a noticeable longer travel distance, suggesting a much lower population density.
As a third example, I am looking at a set of small multiples showing “Grocery Store Geography”, that is, the distribution of a selection of some of America’s leading food retail stores. The use of small multiples is very effective, as it is easy to imagine that if the data had been all shown in one map, it would be cluttered to an extent of being impossible to read. Instead, each map shows only one retail chain, making it’s presence across the country easy to assess.
For my visualization, my first approach is to create a map showing the density of grocery stores across NYC’s five boroughs. Opening the dataset in OpenRefine lets me discover, that there are 14,172 entries related to counties I am interested in. I could have exported only these rows, but instead I import a csv of the State-wide data into Carto, so that I will have the opportunity to look at state-wide data as well if needed. I add a shape file with data on the five NYC boroughs. To eliminate the data points that I don’t need, and to be able to stylize the data in relation to the boroughs, I use the “Intersect Second Layer” analysis, which eliminates all data points not located within the counties of the shape file. Since a choropleth map doesn’t make a lot of sense when there are only five areas, I try using hexbins colored with a five-step, monochrome gradient by the attribute “count_vals_density” that Carto created as a result of the analysis. The result is successful in that it shows high density areas very clearly, however, the visualization is very unstable in Carto, and trying to manipulate it further results only in error messages, with the final consequence that I can’t reopen the map. Having tried it a couple of times, and also realizing that this visualization doesn’t provide that many insights (the hexbins make it impossible to get information about individual stores), I decide on a different approach; with store square footage also being an entry in my dataset, I decide to focus on store sizes. To do so, I go back to plotting individual stores as dots, but this time sized by square footage value. The values in the data set range from 0 (which I will also investigate further) to 230,000 sq ft, with an average of 2,500. I try out different sizing options, and end up finding an equal interval from size 4 to size 45 most effective in showing the difference between smaller store locations and really big ones. The approximated relationship of 1:10 mimics the 1:100 relationship between the average (which is relatively close to the minimum) and the maximum value.
I use a stepped color to enhance the difference, and also to tone down the visual impact of the many, many small dots representing small food retail stores.
Finally, I add a legend and a pop-up feature that allows you to see the store name and the square footage of each place. For the sizing legend, I decide to use a neutral gray because there is no option to use stepped color; using one solid color would potentially be a source of confusion because it makes the smallest dots look darkest (opposite my color assignment) due to the way Carto assigns transparency.
The final visualization shows a striking difference between a few, very big retail locations, such as Target (noticeably, the one in downtown Brooklyn doesn’t show up), and many, many small ones. It is also still possible to get a sense of the density of the retail stores in different parts of the city.
For a further development of this project, I would work more with the data set. Upon investigation, I realize that there are 1000+ entries with a 0 value for square footage, which can obviously not be true. I choose to assume that these stores are indeed small, and that the overall impression of the visualization is thus still valid. However, the average value for square footage probably not too reliable. Another concern is the very small number of food retail stores in Queens, which doesn’t seem accurate.
With a more complete data set, I would consider expanding the visualization to covering the entire state, to investigate whether this store distribution is characteristic only to NYC.