Languages on the Map

Lab Reports, Maps, Visualization
Water Drop Digital Wallpaper


The existence, state, and death of languages can be influenced by geography. On a map, languages can be plotted to show their vicinity to each other in space. On a map, one can also see how natural land formations, such as mountains or deserts, influence where languages do or do not survive. This lab shows both dead and living languages on a geographical map and permits the exploration of this dynamic.


The project was created on Carto, a cloud platform that provides GIS, web mapping, and spatial data science tools.

The datasets used were World Language Family Map and Extinct Languages, found on Kaggle.

The map file used was a customized version of the Bubble map on Mapbox. 


I found the datasets while browsing Kaggle in the beginning of the semester. When the map assignment came up, I saw that these datasets might be suitable for this lab. 

An inspiration for this map was After Babylon, which was both informative and visually appealing. I especially liked the first map that had points to show where languages were from. I also appreciated the simple aesthetic. The additional graphs would not be present in my lab, but did provide useful information.

After importing one dataset, Extinct Languages, into Carto, my first priority was getting a map that displayed geography. I believe that mountain regions, islands, deserts, and coastlines influenced how a language may grow, or be hindered. It is also interesting to see a language in isolated areas, such as islands. I recognized that I did not have data to show non-geographical factors, such as war, politics, famine, etc. that would also lead to the endangerment or extinction of a variable of culture such as language. I initially used the grey geographical map provided in Carto.

Because the dataset had population data, I used that to control the size of each point. I then used color to distinguish between different states of endangerment. I referenced The Guardian’s article, which talks about this dataset, for the order.

I created pop-ups that included the name of the language, the region of the world it is from, and the population of speakers. While there was additional information that I wanted to add (such as the name of the language in its original language), many of these were missing and created a “null” line that was not helpful. Also, some pop ups required scrolling, or needed to be made into a large unwieldy size, so it was ultimately kept simple.

I thought it would be valuable to show these points in relation to the whole-ness of existing languages. I imported the World Languages dataset, which is a list of all languages. I made these points blue, and set them all to a general size that was equal (thus inaccurate) to each other, but larger than any language the Extinct Languages dataset. This was because the population was not included for this dataset, and it was possible that those points could be overwhelmingly large in comparison to the small red, orange, and yellow dots of Extinct Languages. I created a legend that showed what each color represented.

I created pop ups for this second dataset, but only provided the name of the language and the region of the world it is from, since the population data was missing.

After requesting for help, I was able to find a visually pleasing (but still colorful) geographical map on Mapbox. Wanting the grey quality from Carto’s map, but wanting the more visible distinction of land and the artistic quality of the Mapbox map, I customized the colors of the Mapbox map to be greyscale.

While clicking around, I realized that there was duplicate information. For example, a Hawaiian language was in the Extinct Languages dataset as well as the World Languages. To fix it, I merged and deduped the datasets, and imported them. However, the sizes were inaccurate because of the lack of population from the World Languages dataset. I then split the de-duped dataset back into two files, and re-uploaded them separately. I recreated the settings I had from the earlier datasets.


The final product (link here) is a grey world map with blue and warm-colored dots representing languages of various states. A legend sits on the top left of the map. Each point has a pop-up with the name of the language and the region of the world it is from. Endangered and extinct languages have the additional information of population of speakers.

Many endangered languages exist in Central America, South America, Africa south of the Sahara, along Himalayan mountain range, and along the Pacific Rim. Languages not considered endangered exist in all religions with human population, with great diversity in Africa, the Americas, South Asia and Australia. 

There is a greater density of languages near the equator, which dissipates closer to the North. Languages also exist on small land masses such as islands. No languages exist in Antarctica. 


This map needs more information and context. Migration, trade, politics throughout history can inform a viewer, and almost create a line or pathway with dotted lines, the movement of language across land. However, purely with the map, a user cannot see that movement. 

Perhaps annotations would have helped, but what the map needs the most is more data that can somehow provide a timeline, world events, or other historical and cultural information.

In addition, it would be helpful to see a count of endangered and extinct information. 

It was also fun because I was able to see a in a geographical (“physical”) way how data relates to history, human movement, and information movement. I enjoyed looking at the variety of map textures, and can imagine how other languages, such as social media, can travel across spaces not influenced by mountains and deserts in the same way.