The Government Of New York City Job Posting From 2013 To 2018


Lab Reports

Introduction And Questions

For this data visualization project, I chose to discover insights about jobs posted by the Government of New York City. Out of curiosity about this topic, I found a rich dataset on Open Data NYC that would allow me to get enough information and to learn how to use Data Visualization software Tableau. The following are three questions that I wanted to know the answer through visualizations.

  1. What are the major categories of jobs posted by all governmental offices?
  2. Which governmental office has posted the most jobs?
  3. What is the trend of jobs posted by each government office?

I created three graphs that answer each question with three types of visualization. For this project, I also wanted to explore different kinds of visualization that are not commonly used. I considered widely used types as graphs that are often seen in reports and publications, such as histogram, line charts, etc.

Inspiration

My inspirations came from a Ted talk “ The Beauty of Data Visualization” given by David McCandless, which he showcased his previous work in the press. He talked about how he used visualizations to help people understand the concepts better, and why is visualization powerful to us. He mentioned his project “Billion Dollar O-Gram,” which he explained his frustration of not understanding the amount of spending for different purposed mentioned in the press. He used a treemap to compare the amount of money spend into different categories, as well as color to represent the motivation behind the spending.

I was also inspired by the treemap example given in the textbook “Now You See It.” It creates a sense of clearness and organization that is very appealing to me. With the heated color based on the quantitative value of each cell, it shows the value of each cell very nicely. The eye absorbs multiple dimensions of information simultaneously.

Materials

The dataset was found on Open Data NYC under the category of “city government.” The dataset is named “NYC Jobs” and was created by the Department of Citywide Administrative Services. The original dataset contains 28 columns and 3,358 rows. There was a separate spreadsheet that explains the meaning of each column, which helps me to understand the dataset I was working with. I used OpenRefine to clean up the dataset, and then I used Tableau Public to create visualizations.

Method

I used two tools to help me create this set of visualization: OpenRefine and Tableau. When I took the first look at the dataset, I realized it contains much data that do not necessarily contribute to answering my question. I then decided to clean up the data in OpenRefine first.

I ran into different problems when I was trying to clean up the dataset. The data cleaning process allowed me to understand OpenRefine’s function better. The original dataset contains 28 columns and 3,358 rows, which is way too much information for my study. I get the columns down to three, which are “Agency” “Job Category” and “Posting Date.” Then I looked at text facet for each column and realized that the data is not clean enough to use in tableau directly. There are 51 unique agency and 135 different job categories mentioned in the entire dataset. All of the 135 categories can be broken down into 12 unique categories. There was also one text facet that does not fit into this column that has 72 rows. Instead of deleting these rows, I decided to go back and looked at the original dataset again. It looked like what should be in the column next to it in the original dataset “Residency Requirement” is misplaced in this column. Within the selected 72 rows and each facet groups, I copied over what should have been in the “job category” column over to the correct column. I deleted 9 rows of blank field. I created a column based on information in the “Posting Date” Column and chose only to include the year month and days. I ended up with 3277 rows and four columns.

With the clean dataset, what I intended to do in Tableau was reasonably straightforward. First I created the treemap (Figure 1) that shows which department has posted the most jobs over the duration of 2013-2018. The size of each square represents the total number of job posting by each department. The color intensity shows how frequent each department has posted jobs, counted by the total number of unique dates.

Figure 1 Treemap

The second graph was the packed bubble graph (Figure 2). Because there were many overlapping of job categories, I wanted to know what are the major job categories needed by all governmental departments from 2013 to 2018. In this case, I chose to use packed bubble chart because the size of each bubble shows the total number of each category, which shows the most mentioned job categories.

Figure 2 Packed Bubble Chart

The third graph was the highlighted table (Figure 3). I chose to use year as the column, and each department is listed in separate rows. The intensity of highlighting color represents the total number of jobs posted by each department in each year.

Figure 3 Highlight Table

Result Discussion

Highlighted Table (heat map) — Years are listed horizontally, which gives an intuitive sense of time progression. With the help of highlighted color, it shows the different numbers of jobs posted in each year, by each department. The color intensity is good at explaining trends with time progression.

Packed Bubbles — Even though the human eye is not good at distinguishing the precise size of circles, this chart simply shows what the few most significant job categories that differentiate from the others. However, It won’t work well when comparing total value of the different categories by year. It works well when getting just a sense of overall numbers for each category.

The treemap shows which governmental department of NYC has posted the most jobs, it also shows how many days in total each department has posted, which shows the frequency of job posting behavior. If one department has posted on more dates then the other department, it shows that positions are more frequently open for that department.

On top of choosing the best visualization type, I decided not to include a second color because each graph visualizes the topic without the use color variation. Adding another color would be distracting in this case. The default blue that Tableau has is calming and a reasonable color choice for a job-related topic.

Future Direction

In the future, I would like to explore what category of jobs are most needed in different years. I would split job postings that have combined job categories into unique stand-alone groups, comparing each job category by year. I also would want to add interactivity in charts, so the viewer can decide which year to look at, or view trend of how each job category is needed throughout different years.

Because I developed these graphs with different problems in mind, the three graphs I have doesn’t present well in one dashboard. If I were thinking to arrange all graphs into one single dashboard, I would approach the question differently. Regarding dashboard, I will consider what type of graphs would flow together to be included in one view and helps the understanding of a topic coherently. Presenting different charts in one dashboard that would deepen the understanding of one issue.