How does the population in Chicago distribute in relation to the city areas?


Visualization

Introduction

Last summer I visited Chicago and it brought me such a good memory. But the most unhappy memory is the crowdedness of the city almost stressed me out as a tourist. This busy city gave me an overwhelming feeling other than NYC. According to NBC Chicago, Chicago was ranked the second ‘Worst’ City For drivers in the US in 2022. Some other research indicates that Chicago is a diverse and dynamic city with a population of over 2.7 million people, and understanding its demographics can help policymakers, businesses, and individuals make informed decisions. Indeed for this geographic assignment, I wish to Visualize the Chicago population, which may gain insights into the demographics, trends, and patterns of the city’s residents and living styles. Based on my assumption and previous learnings, major cities should have much more populations than rural or suburban cities. Indeed having both geographic signs, as well as the timeline expansion can give us more information about these two dimensions.

Figure 1. Chicago City Boundary Map by Population Distributions from 2018 to 2021

Choice of Data & Tools

DATA.GOV

My main values file is found at data.cityofchicago. Here is the main population source that I wish to gather and it carefully displayed 24 different age groups and populations in relation to years from 2018 to 2021. This is my second time finding the right data since the first data I found here turned out to be not specifically specified and out of date. This source comes from Chicago Population Counts. I misinterpret the dataset for this assignment has to be a “GIS” based file so I accidentally searched all of my master files from a GIS website which turned out to lack credentials.

GitHub Gist

Here is the file where I made a mistake in interpreting the term “geographic location”. Since the demo in class demonstrated a careful way of connecting income with geographic locations, I misinterpret it as I have to get all “physical locations”, which in my thought the longitude and latitude of each location, to be used to analyze and display in QGIS. Turns out that when I combine files to make my master file, the zip code is more of a join point instead of the exact geographic location because it makes more sense for my shape file to recognize the zip codes instead of the geographic locations. Here is the file I try to combine with zip code, latitude, and longitude. For Chicago specifically, the zip code is listed from 60601 to 60827.

Chicago Data Portal

Here is where I found my shape file. Instead of the two bundles I have downloaded for my two tries (Boundaries-City and Boundaries-ZIP), there are so many other possible ways of exploring shape files. I will explore and use more of these possible links for my final project later.

OpenRefine

The original data file is having trouble with distributing zipcodes and years. All zip codes repetitively appear because the file is listed with respect to years. So I have about 60 different roles, one total population role, and introduction roles repetitively appeared over four years. Openrefine helps with dividing the main CSV file into four separate files so each one only contained the data from that specific year. Openrefine also helps with combing my population file with my geolocation file, even though this combination was abandoned on my redo because the zip code works better.

Process

I made a huge mistake in this lab since the first try of the dataset is lack of data verification and I confused myself when making the side table. As displayed in Figure 2, my first attempt is messed up with the unit, and have no idea what should I list in the visualization for what kind of unit I am using based on the data I gathered. Another major problem is that since I did not check the credentials and units that I gathered for the shape file, I used a 2013 shape map onto 2016’s datasets and a lot of numbers and units do not match.

Figure 2. Population Distribution Map of Chicago, the shape file does not match the ratio, and the population is disoriented.

Indeed in my redo, I acted carefully on data selection and make sure all data are at the same time period with the same unit. The two bundles provide a different sense of how the city is displayed, and I chose ZIP because it contains the area and can more easily recognize different cities when I implement the population. The shape file also contains the data that recognized the documented area of the city/zip code it circled out, so it gives a possibility of showing how the city is been divided and allows me to have a visualization of how the land is distributed in relation to the population that may apply into layers later on.

My second step is adding my four population files to the visualization. I have a problem of always losing my main figure when I add layers in, so I have to continuously zoom into my layer to show the main figure and this takes me time to figure out. I join my four files by adding their geography record with the shape file’s Zip column so they can interpret the same dataset at the same time.

Visualization and Interpretation

For this lab, I finally came out with 5 different visualizations and one GIF file to show how the population transformed from 2018 to 2021.

Figure 3. Chicago City Boundary Distribution by Land Area

Figure 3 shows the land distribution that I wish to display. This is the main component of my statement because in this land I want to highlight how the area of each city that is divided in the map, ideally, should be evening out with the population distribution to avoid too crowded and too quiet cities. The legend of this figure layout is a little bit tough to handle because the area is too specified into over ten digits after the decimal. And since I choose to smoothly display all colors, I have 30 different categories of color choices which lead to 30 roles in the table. Indeed I choose not to display my legend because it is too disturbing. What I have shown in the figure above is that the top left and bottom right of Chicago attempt to have more land than middle and middle north Chicago, because they tend to have a deeper color and it is obvious that they have larger areas than more pale color areas.

Figure 4. 2018-2021 Chicago total population distribution by years

Figure 4 is a GIF file that I made containing the rest of the four layouts I exported from my visualization file. Join the two files together made it possible when combining two data together. In QGIS I have to set my layer properties to each year’s total population and get the population distributions in different years. When exporting to print layouts I have failed several times to finally realize I have to set a grid first in order to make everything stand at the same point. One of my four data, the population from 2020, is missing some information so I don’t have a perfect proportion on scaling the map. In my graph arrangement settings, I set my four illustration symbols on the right and set my Legend item properties to only show items inside the linked map. I have tried to export over 20 times on this in order to make the perfection of matching maps together.

From the transitions of the population through the four years, we can clearly see that the population distribution is not proportionate with the size of cities. We can see that the pattern stays around three main areas and divided Chicago into top, middle, and bottom. Each area has a center of a very dense and crowded city. Clearly combining figures 3 and 4 we can see that even though Northeast Chicago does not have that much land size, the population is very high around the area. And although Northwest Chicago is large in size, it does not attract a lot of population in the four-year frame.

Reflection on Lab 4

Overall this lab is interesting and useful when I have to deal with demographic datasets and information. It is relatively easier than R because it is more visualized and more acceptable in file forms. The shape file and GIS file trapped me longer than I thought and as a non-coding person, it is very hard for me to examine if some data is listed wrong or misleading in a shape file. Half of my wasted time was consumed on the wrong dataset but I was lucky enough to get an extension and redo my whole process. It is also surprising to me that there is so much hidden information that was maybe previously included inside the shapefile that I originally had no knowledge of using them. Once I join my datasets together I gradually know how should I combine them and get used to dragging and dropping data from one to another.

Some other places that I wish I have more time to explore are details like color mix and match. Also when doing population analysis, a heatmap can be a very great visualizing tool. However in my dropdown menu where the demo shows, there is a heatmap button, but nothing showing on my dropdown menu. Moreover, when visualizing data in years, I noticed some areas are blank or missing information. Since I am using the same shape layer, I don’t know which part of the information is missing, or where the blank units were. Also when I try to set a solid number on displaying legend, the ratio is actually not set to a certain range, which made me manually set all ranges to be specific and look the same. I will definitely reach out and explore more of these problems in the final project and hope I can do better in these illustrations.

Reference

https://www.nbcchicago.com/news/local/chicago-ranked-the-second-worst-city-for-drivers-in-us-new-study-reveals/3013342/

https://catalog.data.gov/dataset/chicago-population-counts/resource/571ce83e-9d97-4e6f-b396-8e3cab880d94

https://data.cityofchicago.org/browse?tags=shapefiles

https://gisapps.chicago.gov/mapchicago/

https://data.cityofchicago.org/Facilities-Geographic-Boundaries/Boundaries-ZIP-Codes/gdcf-axmw

https://gist.githubusercontent.com/erichurst/7882666/raw/5bdc46db47d9515269ab12ed6fb2850377fd869e/US%2520Zip%2520Codes%2520from%25202013%2520Government%2520Data

http://hwww.diva-gis.org/