Visualization of New York City Hall Library Publications


Visualization

Introduction

In local government, agencies file various types of reports on their activities to organize information and to inform the public and elected officials of the work they are doing and the progress they’re making.  These reports include everything from budget reports to legislative documentation.  From these reports, it is possible to detect trends which can inform future government actions.   This presentation seeks to show how many reports are filed each year, what are the most common types of reports filed, and which agencies are the most active in New York’s City Hall.

Materials

To best represent all of the information for this presentation, the program TableauPublic was utilized.  The information for this presentation was drawn from the NYC Open Data, a compilation of various datasets that were collected from New York City government agencies.

Methods

This presentation uses three visualizations.  The first is a line graph representing the number of publications made by City Hall over time.  The second is a bar graph showing how many publications were produced by the top ten most active agencies.  The third is a packed bubble chart showing what types of reports were made for the different topics.

User testing was conducted both face-to-face and through social media.  The tests included one interview, two observations, and a Likart scale.  All of the testers come from different backgrounds and have varying degrees of experience with regard to data visualization, local government, and library publications.  For example, one tester is a journalist who has covered events at City Hall before, but otherwise has no experience with the institution.  Another tester is a grant writer for non-profit organizations with a background in law.  The third tester is knowledgeable about statistics and has experience in creating information visualization.  The Likart scale was sent to an educator who had experience with colorblind people.

Design Choice Rational

The line graph was chosen for the first chart because it is the easiest way to showcase the number of publications filed over time.  It’s simple, easy to read, and provides context for the two graphs below it.  The only difficulty was that one user felt the title of the presentation was the title of that first graph alone.  Because of this, he didn’t realize that the other two graphs were connected.  To rectify this, the main title was bolded and centered to clarify the information for the reader.

Due to the vast number of agencies who published their works to the New York City Hall library, the number of agencies on the second graph was reduced to the top ten.  Initially, the graph was created through vertical bars, but one user felt that horizontal bars were easier to understand because they could be read. Color-coding was also provided in the first draft of the graph to show the range of topics.  However, one user who didn’t have a strong affinity for color found it confusing that two graphs were color-coded.  Because of this, the color-coding was removed from the second graph.  It was also moved above to make more room for the third graph.

The packed bubble graph was perhaps the most complicated to work with, but in turn, it became the most interesting.  Initially, several visualizations were examined to determine which would best represent the information.  The packed bubble graph was chosen because it was the easiest to interpret.  The size of the bubbles related to the number of publications regarding specific topics, and they were also color-coded by the types of reports published under each topic.  One user felt that sharp contrast between colors would be better than a standard pallet.  As such, a gradient between green and gold was used for the color-coding.

One tester suggested a certain degree of interactivity to better read the information provided by the graphs.  Since TableauPublic has an inherent interactivity function, more direct action did not appear to be necessary.  However, another tester who was unfamiliar with the software was unaware that one could hover over different points on the graph and gain specific insight on each point, bar, and bubble.  For the purposes of further clarification, clickable filters and instructions were added.

Results

The final presentation has some rather intriguing results, as shown below:

The most salient information shown in the first graph is a sudden spike in the number of publications starting in 2004.  Before then, City Hall was publishing under one article per year.  After this time period, the trend shows peaks and valleys, with a sharper decline after 2008.  One can only speculate as to the reason behind this trend.  The only indication of a cause is an increase in Housing and Building legislative documents in 2003.  Otherwise, there are no readily-available articles from 2004 or 2008 that would suggest any significant events that would warrant these changes.  In addition, it is not clear whether this decline continues after 2010.  This is due to the fact that there is no further data after 2010.

The second graph suggests some interesting implications.  While each of the top ten agencies publish a few hundred reports in total, the department of City Planning outpaces them all with total reports numbering in the thousands.  This holds significance when viewed in conjunction with the information displayed on the third graph about topics and report types.

The randomized appearance of the third visualization is due to the nature of the packed bubble format.  Nevertheless, there is interesting information that can be derived from it.  For example, the biggest bubble shows that most legislative documents are related to housing and buildings.  The size of this bubble correlates with the overwhelming number of articles published by the agency of City Planning that is shown in the second graph.  It is possible that this correlation is linked to New York City’s housing market, which favors landlords over small businesses.  The displacement of power renders many small businesses unable to pay their rent, hence the overwhelming number of legislative documents.  The correlation may suggest a causal relationship, but more research is necessary for further investigation.

User Testing Results

User testing highlighted some of the limitations of TableauPublic.  As stated before, one user felt that there was no real indication that the graphs were interactive.  That same user felt that the layout would have been more easily read if stacked vertically.  However, if that design were chosen, all of the relevant information would be compressed to the point where it would be rendered illegible.  The best solution was to insert blank spaces between each of the graphs to clear up the confusion.

The spacial restriction that cut off some of the labels was a point of contention brought up by another user.  Initial drafts compressed the legends of the second and third graphs so that they wouldn’t detract from the main visualizations.  This led to labels being cut off, which in turn made the user frustrated.  With the packed bubble graph, certain bubbles were simply too small to properly show labels without hovering over them.  Removing the labels on the packed bubble graph was considered, but it only served to make the graph more chaotic.  As such, there wasn’t an easy fix to this complaint.  The full legend on the right was the best tactic to alleviate the problem.

Moving Forward

While the dataset up to 2010 is complete, there is surprising little analytical research completed with regard to the conclusions drawn from this information.  In particular, the spike in publications in the year 2004 and subsequent decline after 2008 have no firm explanation.  To fully understand why this trend exists, further research into the subject may be required.  Additional information may discern whether this decline continues after 2010 and why.

The graphs in this presentation have shown the information supplied in the dataset.  Nevertheless, the limitations of TableauPublic prevented full comprehensive display of visualization of the data.  Visual borders around each graph or presenting them in a stacked vertical fashion may in turn make the visualization more accessible.  However, space limitations and format restrictions would not allow for this particular design.  Perhaps user testing of the application may lead to further advancement in visualization and allow analysts to showcase their datasets in a manner that is easier to comprehend.