The history of human settlement is a topic that would naturally lend itself to mapping, save for the fact that the data is extremely limited, as historians and archeologists only have so much to work with. So I was excited to find this dataset of 6,000 years of urbanization from Meredith Reba, Femke Reitsma, and Karen C. Seto of Yale University. The dataset includes cities’ locations and populations, digitizing and aggregating two older datasets compiled by historians through painstaking analysis.
The authors created this map, which uses color to show when cities were established:
This presentation is clear, and the color scale communicates the differences in time period effectively.
Quartz used the data to create an animated version:
The video format allows for a guided tour, with annotations focusing on significant areas and time periods. While eyecatching, the bright style of the bubbles is somewhat hard to interpret, and the animation does not lend itself to exploring the data.
Data and Tools
The dataset includes information on cities’ names, the current country they fall within, latitude, longitude, year, population, and certainty of estimates. The observations in the dataset are city-years, where each row is an instance of a city in a particular year.
The data is provided as two CSV files, one for each of the datasets digitized by the authors, with an R script for aggregating the two files into one combined file. There are unavoidably some methodological issues with combining the two datasets, such as differences in coding or estimating, although the authors note that this shouldn’t impede analysis.
After some minor edits to align with the “v2” data files, I was able to use the R script to process the datasets and create a new aggregated data file.
To create the maps, I used Carto, a powerful web-based mapping platform. Carto is easy to get up and running – it runs in the browser and does not require coding knowledge – but also allows for significant customization using HTML and SQL. Ultimately, Carto proved to have some limitations with this particular dataset, although made visualization easy.
After uploading the dataset in Carto, I created a map and began adjusting the styling. I used a satellite-style basemap (one of the “Here” options) to suggest the importance of natural geography such as rivers and land cover in determining where humans settled. This is somewhat misleading given that the geography has changed over the last 6,000 years, but I assume this will be understood by the user.
I began by displaying the cities as bubbles and adding a widget to count the number of city-year records by country:
One of my first steps was to size the bubbles by population. Ultimately, I decided to make two maps: one with smaller bubbles appropriate for a world view where bubbles are tightly packed, and one with larger bubbles and labels appropriate for a more zoomed in view where bubbles are more spread out.
Next, I applied color based on the certainty value, which indicates the certainty of the estimate. I used a custom color scale to ensure enough contrast with the basemap, and applied a thinner stroke given the large number of small bubbles.
I also added popups on hover, and a legend, opting for a note about bubble size rather than a bubble size legend given the lack of customization options. I used the HTML editor to add the note and customize the tooltips.
One challenge was the time-series data. The data codes the BC years as negative numbers, which Carto does not recognize as dates, even after applying date formats in Excel. This results in the time series widget using number formatting (2.0K) rather than date formatting, and making it impossible to select individual years:
Filtering the data to only AD years allowed Carto to recognize the years as dates, but this still did not solve the issue as the time series widget only allows for a certain number of bins, and the data contained too many years even after rounding to the nearest century. I chose to keep the year column in number format in order to display all the data.
The world version, showing all city-year records in the default view:
The zoomed in version, initially focused on the Fertile Crescent and limited to a specific two hundred year period:
Exploring the map shows a number of interesting patterns. The historical importance of the Middle East and Mediterranean region is clear, as the earliest records of cities are all in that area. Interestingly, after the Middle East and Mediterranean, cities next emerge in China and Central America:
Overall, one major takeaway is that multiple millenia passed with there being only a handful of cities around the world – the period of widespread human settlements began recently in comparison. Future analyses could explore this timeline in more detail, and could include other datasets such as the availability of natural resources to look at possible causal factors.