NYC Open Data – Foundations of Information

Metadata for All!

I attended “NYC Open Data’s Metadata for All Initiative: Project Presentation” hosted by the Metropolitan Library Council on September 25th for my event review for INFO 601-04. This event presented the Metadata for All Initiative’s results after six months of working with New York City Open Data. The Sloan Foundation sponsored the project, which was completed in partnership with the Mayor’s Fund to Advance New York City, METRO, Mayor’s Office of Data Analytics, NYC Open Data Team, Pratt Institute, Brooklyn Public Library, Queens Public Library, New York Public Library and Tiny Panther Consulting.

The initiative aimed to help make New York City’s Open Data, which includes more than 2,100 datasets, more accessible to general users through the improvement of metadata standards. Tiny Panther Consulting, a team of data librarians founded by Julia Marden, was brought on board to do a pilot study on how to make the top 100 used datasets more user-friendly. They studied this through discussions with relevant governmental departments, workshops in all five boroughs and the creation of templates for certain metadata documents.

Metadata Improvements

In order to best assess the current metadata quality, Tiny Panther created a dataset documentation checklist. (This was provided as a handout for the audience.) The checklist contained a rubric that verified the overall usability, user guide, and data dictionary. The goal was to determine if a user would be able to understand what was in the dataset, and maybe more importantly was is not in it.

The data dictionaries function like a Rosetta Stones for the dataset and are required for users to understand what is actually in each dataset – for example what all of the rows and columns mean. However currently only 90% of the datasets had a dictionary, and there wasn’t a standard template for them, so they are of varying quality.

In addition to improving the data dictionaries Tiny Panther recommended the creation of user guides tailored to each dataset. These guides would provide a context for the data, let you know when it was last modified, clarify what which data was raw or added by the city, in addition to many other factors. Tiny Panther found that many of the documents associated with datasets used inside lingo that would not be comprehensible to user who were not employed within the departments that created the dataset. The three proposed user guides were provided as handouts as well.

Audience

The most successful aspect of this event wasn’t directly about the initiative. What made this event most noteworthy was its audience. The event was not directed towards librarians and information professionals, who probably already buy into the idea of accessible metadata. About half of the audience was comprised of government workers. (This is based on a show of hands conducted early in the presentation.) These are the professionals who create and maintain the datasets, and did not necessarily have a background in information studies. It was very impactful to hear their points of view. A few members of the panel were representatives of departments Tiny Panther worked with, and discussed their impressions of the challenges around the project. The audience was given an opportunity to ask questions about the project as well.

The only way the metadata standards can be maintained across all of the datasets is if they understand why it is important. They are the ones that will be doing this extra work, on top of everything else they are responsible for. It is not as if each department has a data librarian whose sole role is to maintain their open data. Although there is probably enough work to do that it could be a full-time job! I think the presentation was accessible to them, and hopefully demonstrated the power and utility of comprehensive metadata.

Conclusion

I will be keeping tabs on NYC Open Data, checking what the metadata looks like, as well as the actual data, over the next few ‘data dumps’ to see how their metadata evolves. Open data is a very exciting tool for civic engagement, but only if users can understand what the data are actually telling them.

On Monday, I attended an event of high interest and of great appreciation. This is a free workshop held by Civic Hall of New York City, located at 118 West 22nd Street, 12th Fl, New York City, NY 10010.

I found this governmental class on Eventbrite, this class is among several classes, seminars, conferences and networking events held under the organization named BetaNYC. Thanks to the platform Eventbrite, for making events open and known by all internet users.

To my surprise, I entered Civic Hall through a normal size door located at 22nd Street, which is just a 10 min walk from PMC. I thought of Civic Hall as large as New York Public Library, or even a site of tourism. It is not the same case here, but I am equal excited.

I looked for the event organizer Civic Hall, when I got home, they call themselves a “community space” and there is a new place called Civic Hall @ Union Square just received approval by the full NYC City Council on August, 2018 and a estimated opening in the fall of 2020.

On the 12th floor, there is a welcome desk on the right hand side. It is of no difficulty to find the exact classroom, as there is an obvious big welcoming notice on the right white board, even when there is no one waiting at the welcome desk. In the “hall” of Civic Hall, there is another workshop going on, which made me less nervous as I am just one of many learners or participants with similar goals.

I am delighted to find there are a number of classmates who share the same goal with me and as a hand craft lover, I am more excited when i see playful cards in this data related class. In the introduction session, I mentioned I am participating this class because event attendance related to my academic focus area is requested by my professor. However when I finished this interactive class I am just keenly waiting for the next open class. I am not a pure outsider with no knowledge of data as I have taken classes from school for several weeks. Still, this class is successful and reaching out of my expectation. I will explain why I would recommend this type of classes.

At the beginning of class, it was a normal lecture with fundamental knowledge of information and introduction to NYC open data set. Then comes the “human data icebreaker”, which makes the class much warmer and everyone more relaxed.

Human data icebreaker, is a class activity, with everyone holding one piece of service request of different cases under several categories, changing shapes of lines in order to simulate the function of “filter”,”sort”,”Group by”,”Roll-up”, “Count”. For example, under the instruction: Rank the top 5 complaint types. First we need to “Group” our cases into different complaint types; then we “Count” how many cases are there; then we “Sort” our team into correct order of groups in line. After these steps and actual steps of walk we took, we are clear and correct to tell, which are the top 5 complaint types, interactively and visually.

After the icebreaker exercise, everyone get to know each other better and feel closer, as we have been in the same data set and under one task. Each of us act as datum and we served together for a common purpose. We moved to the next activity : Playing with cards!

Before cases begin, we are introduced that there are four types of cards, “Story”,”Action”,”Data Column” and “Data Value”. We are encouraged to generate task on our own and then process the task as placing cards in possible steps. In this time, steps are taken with cards as displays rather than actual steps on our feet. This is much easier to understand and compare, instead of hypothetically calculating or processing in mind. After each case we need to introduce the task and the possible actions we took in order to accomplish the task, by processing as a data platform ourselves, we became confident when we actually operate the data set by NYC Open Data.

After several rounds of “card games”, we were provided with handout of this class and we were informed that the second class Open data 102 would be held on Thursday.

Thanks to Noel Hidalgo, who gave this wonderful class (the first gentleman from left). Big Thanks to Beta NYC (or NYC Open Data ) who provided Class handout, Slides and Cards to everyone. Thanks to Sloan Foundation who sponsored this class.

Additionally, here is the handout and slides for NYC Open Data class 102! Looking forward to the following classes.

Foundations of Information

Pratt School of Information

Foundations of Information

Tag: NYC Open Data

Metadata for All Initiative – Event Review

Metadata for All!

Metadata Improvements

Audience

Conclusion

NYC Open Data 101（& 102!）