Sciencescape’s Approach to Big Data in the Realm of Scientific Research

The current image of consumer technology, personal computers and front-end applications, is one of simplicity; small boxes that house smaller components and microchips, departing further and further away from the hulking, esoteric, push-pin and vacuum tube monoliths of the past. This is, of course, ideal, as the growingly tech minded society compels its citizens to participate more wholly within its self-designated bounds.

Even Google’s server rooms express the hushed, confident, uniformity of its homepage

But this picture belies the massive data swarm on the other side of the front end. We might only catch glimpses of it through our interfaces, but Big Data is consuming the world, for good and ill.

In their article “CRITICAL QUESTIONS FOR BIG DATA,” Danah Boyd & Kate Crawford characterize the concept:

We define Big Data as a cultural, technological, and scholarly phenomenon that rests on the interplay of:

(1) Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large data sets.

(2) Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims.

(3) Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy (663).

This interplay implies the Big Data approach to research and thought, where the potential of technology to facilitate, manipulate, process, and analyse torrents of data roots the belief that Big Data holds meaningful value in understanding the world thus constructed.

Sciencescape very much falls under the banner of this paradigm: it’s data set, the universe of scientific academic research and journal articles. I am intensely interested in the site, because of its attempt to organize, for our ease of access, a nebula that doubles in output every nine years (Van Noorden). To this end, the site claims to have indexed 24 million articles.

This number, in itself, is not impressive, as big academic indexes are common. Instead, the novelty arises from Sciencescape’s method of presentation and management. Here is a video from their About section:

From this clip, it appears as though Sciencescape wishes to establish a web of scholarly data through computational algorithmic methods to pick out key concepts such as subjects, authors, institutions, etc., as a means to order them in a way that exposes their topic, field of study, and impact. Big Data methodology for Big Data sets, Sciencescape works to provide meaningful orientation in the realm of science at a level that would seem too grand for more traditional approaches. The site offers free accounts to access to their database, so let us now explore the site’s aspirations in their implementation.

The way we will discover the functionality of the site is through a contemplation of the elements as they present themselves on the guided tour of Sciencescape. I do this not only to easily distinguish features of note, but as a means to peer into the minds of the site’s creators; I want to get an idea of what they think as valuable, novel, and unfamiliar, in their approach to Big Data analysis.

First, the overall presentation of the site before log-in feels very much like the now familiar startup interface (though the company’s headquarters is in Toronto).

Very professional and enticing toward potential users and investors alike, this web page contains all the standard links to the information about the project and those behind it. Clicking on the about tab allows a user to learn about the functionality of the site. Of particular note, the publishing partners sub-tab:

From the about tab, we are also able to access the guided tour, which allows us to bypass log-in.

PAPERS

The first stop is a page for an individual paper:

This particular web portal holds many interactive elements. For instance, we can follow the writers and journal, into the past and into the future when new publications are added. If the paper is available to us, then we can receive a .pdf. Other web 2.0 mainstays like “Add to Library,” “Broadcast,” and “Bookmark” also appear, as well as buttons to facebook, twitter, and email.

At the bottom of the page, there are lists of citations to and from this specific paper, but also “recommended readings” and “fields.” I am not certain as to what factors go into the recommendations, perhaps relevancies with authors and topics, but the “Fields” tab interests me the most. The list is populated with associated concepts: all of which expand through time, in order to encompass other articles.

FIELDS

The next main section of the tour presents an example of these “Fields.” Displayed is the timeline associated with AIDS research:

These field pages, I think, really display the greatest value in Sciencescape. This research timeline shows the sudden jump in scientific awareness of AIDS during the period of April-July 1983, when 825 citations mentioned it. The three months before show only 30 citations. This tool is immediately helpful in accounting for the history of the disease as science has seen it.

The other significant value on this page is the “Eigenfactor,” which Sciencescape uses to rate papers’ impact. In the field of AIDS research, we see that a paper published in May 1983 influenced others to the greatest amount.

The Eigenfactor is one of the defining characteristics of whole engine:

The Eigenfactor algorithms take into account not only the citations, but where the citations come from. This is similar to how modern search engine algorithms rank websites. A citation from a highly referenced journal, article or author carries more weight than one from a journal nobody has read (https://sciencescape.org/eigenfactor).

With this factor, scientists have a potential metric for significance that accounts for the entire web of data, rather than just explicitly referenced citations.

THE FINAL STOPS

The last two sections of the tour deal with the facilities in place to allow someone to keep up to date on her research. She can follow elements, such as journals, universities, authors, and fields. New papers associated with these concepts then appear in her Feed as they are published. Also, an extra level of organization exists; called “Streams.” these groups can be constructed by the follower, as a means to further link the items of which she follows. She can sort by these streams to focus her attention on one area of association.

Overall, the tour was promising to me, as an information scientist. Ideally, this specific conglomeration of social and semantic tools should increase relevancy and visibility over the extreme output of scientific research. Nevertheless, I worry about the implementation of such algorithms over such big data sets; I am not a bio-scientist, and so I cannot judge how well Sciencescape’s metrics work in coordinating fields and impacts. Certainly, more inquiry is needed, but the concepts that underlie the site are compelling.

Works Cited

danah boyd & Kate Crawford (2012) CRITICAL QUESTIONS FOR BIG DATA, Information, Communication & Society, 15:5, 662-679,

Van Noordan, Richard “Global Scientific Output Doubles Every Nine Years.” News blog. Web. 30 Nov. 2015. <http://blogs.nature.com/news/2014/05/global-scientific-output-doubles-every-nine-years.html>

Foundations of Information

Pratt School of Information

Foundations of Information

Sciencescape’s Approach to Big Data in the Realm of Scientific Research

Leave a Reply Cancel reply