Examining Non-English Languages Spoken by NYC Residents

Charts & Graphs, Lab Reports, Visualization
Happy people saying hello in different languages.
Image by pch.vector on Freepik


New York City is home to a array of diverse people and cultures. Residents, commuters, and visitors who live, work, or visit one of the five boroughs that constitute NYC are guaranteed to go about their day and hear several languages other than English. While their are certainly a great deal of foreign travelers that contribute to the many languages spoken in NYC, the topic of my visualization deals specifically with persons who reside in one of the various communities that make up New York City. A significant number of residents speak and understand non-english languages more fluently than english. My visualization is a reflection of a dataset which was provided by the NYC Civic Engagement Commission.


The inspiration for my work did not begin with a specific visualization example but with the idea of using a tree map. I found myself viewing examples of tree maps and I thought they were easy to understand and learned they can work well for data sets with less categories; as well as work favourably at visualizing hierarchical data. I appreciate simple visualizations that can tell a story without being too cluttered. However, due to a change in direction, and my novice abilities with tableau, my final visualization diverted from this format.

Screenshot depicting multiple examples of tree maps from flowingdata.com
Screenshot of Tree Maps Example from FlowingData.com
Source: (Yau, n.d.)


The data set used in my visualization was found and retrieved from NYC Open Data. Specifically, the data set was made available by the NYC Civic Engagement Commission, which was made public in 2021 and was last updated in 2022. They describe the data as having been “derived from the Census Bureau’s American Community Survey (ACS), includes information on over 1.7 million limited English proficient (LEP) residents and a subset of that population called limited English proficient citizens of voting age (CVALEP) at the Community District level.”(NYC Civic Engagement Commission, 2021) The data set contains 8,024 rows and 9 columns and can be found using the following link: Population and Languages of the Limited English Proficient (LEP) Speakers by Community District

Preview of table data from a data set called "Population and Languages of the Limited English Proficient (LEP) Speakers by Community District"
A section of the table preview on NYC OpenData

After acquiring the data set from NYC OpenData, the next step in my process was running the exported CSV file through OpenRefine in order to assess and clean the data. Fortunately, much of the data was ready to be put to use after reviewing the rows and columns for errors. The minor change I implemented using the tools in OpenRefine was to split the first column into two so that one column contained the beginning and and the other contained the end date of the time period of which this information came from.

Subsequently, the data set was exported from OpenRefine as a new CSV file and it was uploaded to Tableau Public in order to create the final visualization. As I mentioned previously, I had gone in with the idea of creating a treemap for this data set. As you can see in the image below, my first realization of the data set was done in this manner. I liked the idea of the differently sized areas which depicted the relative size of non- english languages by borough. However, I felt that the treemap did not offer everything that I wanted. A small detail that made me change course was the fact that I did not like how repetitive the labels were for the borough names. Since i’m a beginner with using Tableau Public as a tool, I decided to adjust the data in a different style so that I could practice with another chart.

Snapshot of the first visualization iteration

Ultimately, my final visualization was developed into a bar chart with characteristics of the tree map. The visualization can be found on Tableau Public using the following link: Non-English Languages Spoken by Residents in NYC (2015-2019) and I have added a snapshot of the worksheet on Tableau below. The main data points that were selected for the visualization were Language, Borough, LEP Population, ACS time period begin date and end date. There was the option to use Community District Name in the chart instead of borough, however because district name is not as identifiable to most people, I decided to stick with the five boroughs since they are recognizable to most. As for the aesthetic of the chart, I kept the same colors I had selected for the tree map, however I would welcome feedback on this aspect.

To further supply the visualization with helpful features, I included a filter to the final chart that could be of use for those who wanted to narrow down to a specific language. The discussion with my peer review partner helped me identify the need for the addition of a filter for boroughs. A user can select one borough and be able to view labels for more languages than what can be viewed with no filter. This helps users take a closer look at the various languages that are spoken fluently by residents. Take for example Staten Island, there is less data for this borough therefore it makes up a smaller part of the chart. As a result, a user has to hover over the cells to see the language names if they are not using the filter. However, the visualization is meant to be interactive. Hovering over a language cell will present the user with multiple labels the data points previously mentioned. Ultimately, the intention of this visualization is to discover trends in Non-English Languages spoken by residents in a comprehensible manner.


Snapshot of Final Visualization Worksheet

The main outcome from this visualization that I interpreted was that, not surprisingly, there is an overwhelmingly large population in NYC residents that speaks Spanish more fluently than English from the time period of 2015-2019. For each borough, Spanish noticeably tops tops the chart. Interestingly, a close second turned out to be Chinese (Including, Mandarin, Cantonese), which is present as second position in four out of the five categories. Both Russian and Bengali also appear to be the next two languages that are most commonly spoken. The rather large list of languages reinforces the diversity present in residents of NYC.


The steps in creating the visualization using Tableau Public definitely involved a learning curve and was not as straightforward as I had imagined. I had trouble with the software freezing on my desktop. It would be beneficial if Tableau used a different method for using their tools, perhaps using a online platform instead of downloading the tool onto your desktop. There was also less customization of the chart in the worksheet than what was expected. However, I believe fundamentally the chart is successful in creating a visualization that can be manipulated and understood simply by users. In the future, I would work on the possibility of incorporating another data source in order to learn more from the data.


Free Vector | Happy young people saying hello in different languages. Students with speech bubbles and hands in greeting gesture. (n.d.). Retrieved February 19, 2023, from https://www.freepik.com/free-vector/happy-young-people-saying-hello-different-languages-students-with-speech-bubbles-hands-greeting-gesture_11235604.htm#page=2&query=languages&position=48&from_view=search&track=sph#position=48&page=2&query=language

NYC Civic Engagement Commission. (2021). Population and Languages of the Limited English Proficient (LEP) Speakers by Community District [Data set]. Retrieved February 12, 2023, https://data.cityofnewyork.us/City-Government/Population-and-Languages-of-the-Limited-English-Pr/ajin-gkbp

Yau, N. (n.d.). Treemap. FlowingData. Retrieved February 17, 2023, from https://flowingdata.com/charttype/treemap/