Sciencescape’s Approach to Big Data in the Realm of Scientific Research

The current image of consumer technology, personal computers and front-end applications, is one of simplicity; small boxes that house smaller components and microchips, departing further and further away from the hulking, esoteric, push-pin and vacuum tube monoliths of the past. This is, of course, ideal, as the growingly tech minded society compels its citizens to participate more wholly within its self-designated bounds.

 

Even Google’s server rooms express the hushed, confident, uniformity of its homepage

 

But this picture belies the massive data swarm on the other side of the front end. We might only catch glimpses of it through our interfaces, but Big Data is consuming the world, for good and ill.

In their article “CRITICAL QUESTIONS FOR BIG DATA,” Danah Boyd & Kate Crawford characterize the concept:

We define Big Data as a cultural, technological, and scholarly phenomenon that rests on the interplay of:

(1) Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large data sets.

(2) Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims.

(3) Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy (663).

This interplay implies the Big Data approach to research and thought, where the potential of technology to facilitate, manipulate, process, and analyse torrents of data roots the belief that Big Data holds meaningful value in understanding the world thus constructed.

Sciencescape very much falls under the banner of this paradigm: it’s data set, the universe of scientific academic research and journal articles. I am intensely interested in the site, because of its attempt to organize, for our ease of access, a nebula that doubles in output every nine years (Van Noorden). To this end, the site claims to have indexed 24 million articles.

This number, in itself, is not impressive, as big academic indexes are common. Instead, the novelty arises from Sciencescape’s method of presentation and management. Here is a video from their About section:

From this clip, it appears as though Sciencescape wishes to establish a web of scholarly data through computational algorithmic methods to pick out key concepts such as subjects, authors, institutions, etc., as a means to order them in a way that exposes their topic, field of study, and impact. Big Data methodology for Big Data sets, Sciencescape works to provide meaningful orientation in the realm of science at a level that would seem too grand for more traditional approaches. The site offers free accounts to access to their database, so let us now explore the site’s aspirations in their implementation.

The way we will discover the functionality of the site is through a contemplation of the elements as they present themselves on the guided tour of Sciencescape. I do this not only to easily distinguish features of note, but as a means to peer into the minds of the site’s creators; I want to get an idea of what they think as valuable, novel, and unfamiliar, in their approach to Big Data analysis.

First, the overall presentation of the site before log-in feels very much like the now familiar startup interface (though the company’s headquarters is in Toronto).

Capture

 

 

Very professional and enticing toward potential users and investors alike, this web page contains all the standard links to the information about the project and those behind it. Clicking on the about tab allows a user to learn about the functionality of the site. Of particular note, the publishing partners sub-tab:

 

Capture2

 

From the about tab, we are also able to access the guided tour, which allows us to bypass log-in.

PAPERS

The first stop is a page for an individual paper:

 

Capture3

 

This particular web portal holds many interactive elements. For instance, we can follow the writers and journal, into the past and into the future when new publications are added. If the paper is available to us, then we can receive a .pdf. Other web 2.0 mainstays like “Add to Library,” “Broadcast,” and “Bookmark” also appear, as well as buttons to facebook, twitter, and email.

At the bottom of the page, there are lists of citations to and from this specific paper, but also “recommended readings” and “fields.” I am not certain as to what factors go into the recommendations, perhaps relevancies with authors and topics, but the “Fields” tab interests me the most. The list is populated with associated concepts: all of which expand through time, in order to encompass other articles.

FIELDS

The next main section of the tour presents an example of these “Fields.” Displayed is the timeline associated with AIDS research:

Capture4

These field pages, I think, really display the greatest value in Sciencescape. This research timeline shows the sudden jump in scientific awareness of AIDS during the period of April-July 1983, when 825 citations mentioned it. The three months before show only 30 citations. This tool is immediately helpful in accounting for the history of the disease as science has seen it.

The other significant value on this page is the “Eigenfactor,” which Sciencescape uses to rate papers’ impact. In the field of AIDS research, we see that a paper published in May 1983 influenced others to the greatest amount.

The Eigenfactor is one of the defining characteristics of whole engine:

The Eigenfactor algorithms take into account not only the citations, but where the citations come from.  This is similar to how modern search engine algorithms rank websites.  A citation from a highly referenced journal, article or author carries more weight than one from a journal nobody has read (https://sciencescape.org/eigenfactor).

With this factor, scientists have a potential metric for significance that accounts for the entire web of data, rather than just explicitly referenced citations.

THE FINAL STOPS

The last two sections of the tour deal with the facilities in place to allow someone to keep up to date on her research. She can follow elements, such as journals, universities, authors, and fields. New papers associated with these concepts then appear in her Feed as they are published. Also, an extra level of organization exists; called “Streams.” these groups can be constructed by the follower, as a means to further link the items of which she follows. She can sort by these streams to focus her attention on one area of association.

Capture5

Overall, the tour was promising to me, as an information scientist. Ideally, this specific conglomeration of social and semantic tools should increase relevancy and visibility over the extreme output of scientific research. Nevertheless, I worry about the implementation of such algorithms over such big data sets; I am not a bio-scientist, and so I cannot judge how well Sciencescape’s metrics work in coordinating fields and impacts. Certainly, more inquiry is needed, but the concepts that underlie the site are compelling.

 


Works Cited

danah boyd & Kate Crawford (2012) CRITICAL QUESTIONS FOR BIG DATA, Information, Communication & Society, 15:5, 662-679,

Van Noordan, Richard “Global Scientific Output Doubles Every Nine Years.” News blog. Web. 30 Nov. 2015. <http://blogs.nature.com/news/2014/05/global-scientific-output-doubles-every-nine-years.html>

Digital Fractures in Attention: The Splintered Librarian

Perhaps not entirely obvious, yet still no where near a controversial statement, librarians devote themselves to attention. The library is replete with it. Since childhood, its stacks imbued us with a sense of hushed tones and solemn contemplation, bodies hunched, minds deep in congress with personal gods. But the reality behind this impression runs as deep as the foundation. Without attention, no library would exist further than a massive hulk of unordered books in some forgotten cellar (if that, as even this compilation requires some attention).

Essentially, we, as librarians, deal in attention: by the attention we provide in the formation of collections, for the attention of the members and visitors who hope to connect to the information they require. Classification, organization, and preservation enable minds to access otherwise esoteric, dispersed information, and thus we provide a service, attending to collections so others may more freely attend to the content of their interest.

Libraries establish the source of societies extended attention span, and attention founds all human experience. To this latter effect, the psychologist William James remarks:

Only those items which I notice shape my mind – without selective interest, experience is an utter chaos. Interest alone gives accent and emphasis, light and shade, background and foreground – intelligible perspective, in a word. […] but without it the consciousness […] would be a gray chaotic indiscriminateness, impossible for us even to conceive (James).

Vanguards against the buzzing manifold, we channel the ever deepening ocean of information through artifacts of our epistemology, generalities and contexts, so that others should keep themselves afloat in the process of their own inquiry. Digital technology allows librarians the ability to disperse their attention to many more millions of items, but with this advanced reach comes many issues. Some problems involve digital archives, preservation, memory, and power. I, however, wish to deal with toll this new technological information age takes on our own attention spans and, by extension, our well-being.

First, consider Marcia J. Nauratil’s engaging The Alienated Librarian, an exposition of burnout in the library as the “proletarization of professional labor.” To explain, she focuses on the emergence of bureaucracy as an ultimate power source over librarian autonomy:

Bureaucratic discouragement of professionalism, with its components of suppressed autonomy, role strains, and proletarianization, is a potent inducer of work alienation. The bureaucratic structure of libraries has also fos­tered and enhanced the alienating effects of all the other developmental factors considered thus far (Nauratil 55).

Hierarchical, bureaucratic oversight from library, university, or government administration stymies regular employees’ self-determination. This structuralized oppression likens librarians to Charlie Chaplin’s character in Modern Times:

Similar to the methods used to enact authority as seen in this clip, authoritarian measures legitimize bureaucracy in libraries. Some of these, as Nauratil lists, are deskilling; role strain, overload, and ambiguity; and intensification of the labor process (23-25).

But these factors of control form behind closed doors, occur perhaps unintentionally, and certainly hide under the guise of budget constraints and austerity measures. My questions: where do librarians feel depersonalization most immediately? What do we hear and do in place of the mechanical repetition in the world presented by Modern Times?

William James expounds on a familiar effect:

[…]the confused, dazed, scatterbrained state which in French is called distraction […] We all know this latter state, even in its extreme degree. Most people probably fall several times a day into a fit of something like this: The eyes are fixed on vacancy, the sounds of the world melt into confused unity, the attention is dispersed so that the whole body is felt, as it were, at once, and the foreground of consciousness is filled, if by anything, by a sort of solemn sense of surrender to the empty passing of time (James).

Here, James distinguishes distraction as the opposite of attention, an effect of contemporary life that we find intimately familiar.

As attention brokerage firms, libraries situate their workers right at the heart of the technological data swarm. One paper asks: “how is it possible to be a knowledgeable librarian in the twenty-first century? […] When constantly overwhelmed with information and distraction by its overabundance, it is difficult to focus or even know where to start (Dewan 101).” The burden of excellence in reference librarianship is at extreme odds with the very nature of the field in its current state. So, when we acknowledge that “people depend on librarians to navigate information that simply overwhelms them,” — which requires librarians to “subscribe to email alerts, listservs, RSS feeds […] in an effort to keep pace with today’s ever increasing body of knowledge(100),” — the childhood intuition of the quiet reverence of the library belies a furious deluge behind the reference desk.

To cope, librarians tend to multitask, sequentially drawn to and from sources, which splinters attention and stresses the addled mind (Levitin; Dewen 107). Librarians habitualize this constant repositioning of interest, and develop an attention deficit trait, ADT (108). The pathology of our “age of distraction:”

It is brought on by the demands on our time and attention that have exploded over the past two decades. As our minds fill with noise—feckless synaptic events signifying nothing—the brain gradually loses its capacity to attend fully and thoroughly to anything (Hallowell).

Attention, the process that underpins our connection with the world, “[…] the very root of judgement, character, and will (James),” and the principle of a librarian’s craft, erodes in the cacophonous polyphony of “You Got Mail.” If anything removes us from our “species-being,” look no further than how many tabs you have opened in your browser.

However harsh I sound; I am no luddite. Technology has extended our attention out to the furthest galaxies and the smallest quantum; it gives the world stage to whom would otherwise tremble in silence. The internet has crumbled physical boundaries and allowed perspectives to promulgate. Ultimately, meaning can proliferate beyond all prior bounds. But this does not discount the deliriant effect on us, and knowledge laborers in specific.

Written in 1989, Nauratil’s book predates the expansion of the internet, with broadband service, smartphone devices, and wifi. Nevertheless, under these considerations, we can see that the advanced access to information disperses attention — the “opium of the masses.” Nauratil’s remedies for alienation and burnout require our recognition of bureaucratic authority, to take action against the power that would separate us further from our work. Yet, if librarians are kept busy, stressed, and disorientated, the reality of our alienation internalizes; we view ourselves as the problem, not the structural imbalances from which these issues spawn. Our attitude becomes, “I barely know what is right in the world, when everything seems to go in its own direction, and its own set of considerations. How can know what happens in administration? I am too tired to even look.”

Attention is our direction in the world, that is, to the world. When we focus, with a clear and unburdened mind, we should see the reality of the situation, if only through our own perspective. When our mind fails to acknowledge the world, as it is, then we should attend to others in more apt vantage points. Thus our own interest can expand intersubjectively. Technology and the internet enables effect, but also disables it. Knowing this dual nature is crucial.

 


Works Cited

Dewan, Pauline. “Can I Have Your Attention? Implications of the Research on Distractions and Multitasking for Reference Librarians.” The Reference Librarian 55, no. 2 (2014): 95-117.

Hallowell, Edward. “Overloaded Circuits: Why Smart People Underperform.” Harvard Business Review. Harvard Business Review, 01 Jan. 2005. Web. 29 Oct. 2015.

James, William. “Classics in the History of Psychology.” James (1890) Chapter 11. Ed. Christopher D. Green. York University, Toronto, Ontario, Web. 29 Oct. 2015.

Levitin, Daniel. “Daniel Levitin: “The Organized Mind: Thinking Straight in an Age of Information Overload”” YouTube. YouTube, 24 Oct. 2014. Web. 29 Oct. 2015.

Nauratil, Marcia J. The Alienated Librarian (New York: Greenwood Press, 1989)

Trace Information, Transcendence, and the Proliferation of Meaning

Marcia Bates’s treatment of trace information, in her Fundamental Forms of Information, seemed awfully cursory. While it fits as a necessary component within the overarching framework she establishes for information, she neglects this process of removal, compared to the ample thought given to the other aspects of her topic. But if we provide trace information further thought, it appears more significant than merely “the pattern of organization of the residue that is incidental to living processes or which remains after living processes are finished with it (Bates).”

Before we can truly comprehend the importance of trace information, we must briefly explore the foundations that Bates establishes for the entire scope of her system. In her paper, she defines information as patterns of organization, which then get encoded and embodied through how living beings store, translate, and communicate information genetically, neuro-culturally, and exosomatically. Thus, Bates constructs a vision of information as a dynamic flow through complex and interweaving channels. Metaphorically, imagine this process as a tidal pool ecosystem, where inlets bring in substance that sustains and shapes life within its bounds. Without this inflow, life shall starve; however, life also creates waste, which the living then removes by the tide pool’s outflow.

For Bates, trace information represents the function of this outflow:

The flow here is of a different sort — the Biblical “dust to dust” – in which structures previously associated with life recede back into their natural, inert forms. Trace information is that information that is degrading from being represented information (encoded or embodied) into being natural information only (neither encoded or embodied) (Bates).

She likens trace information to the “residue” that once “represented” life: “no-longer-used wasps’ nest, waste heaps, carrion, disintegrating ancient scrolls, and so on (Bates).”

All these effects are necessary for the aforementioned reasons. Nevertheless, something seems amiss when juxtaposed with this picture of carrion, an exemplar she mentioned:

 

Pictured: Carrion – not waste, lions, hyenas, and vultures

 

The very name “trace information” implies that it is beholden to the source of its trace. In her example, the residual value of the fire leveled house resides in the building that once supported life: to expand this idea, death as a slow return to dust. With this, the husk of a wasp’s nest stands as a likely forgotten testament to its time in use. But what then explains all the life in the photograph? Certainly, the lions, hyenas, and vultures find much use in this dead elephant, and apparently not for its trace of origin.

Here, questions arise, what more does death contribute? Does decay allow information to transcend its original use into something greater? Can destruction encode more into life? To further investigate, let us leave the inhospitable systems at play in the savanna for something more digestible by humans.

Before I link this next piece, I shall provide some context. William Basinski created The Disintegration Loops through his process of conversion of old tape loops, a common medium for his compositions, from analog to digital. He realized that, because of the magnetic strips’ advanced age, this transfer considerably deteriorated the music represented on the medium itself. The strips literally fell apart as he recorded them. Because he worked with short loops of larger compositions, he could track the continual degradation. In his own words:

[I] looked at the CD recorder to make sure it was on — it was — so I just sat there, listening as this gorgeous melody decayed over a period of an hour in such a beautiful way. I was just stunned […] ‘Wow, something different is happening here. I don’t need counter melodies. This is its own thing (Basinski).

Clearly, the resultant music is trace information, once from the residue of playback, and twice from the fact that Basinski found the original loops as left over from recordings stored from many years ago. But in their decay, greater meaning emerges, unexpected from what their original inert forms suggest; the sound loops would diminish in value, if their consistency remained.

September 11th, 2001 occurred during Basinski’s production, and gravely, the music symbolizes the transpired events. The linked video communicates this; the loops serve as a diminishing echo as the sun sets on the abject horror of the day. “So grave and so beautiful and stately,” the original sound mirrors the World Trade Center in its life, which then fades, in time, against the world’s growing realization of the situation.

The Disintegration Loops enable us to grapple with this reality. Because this power, Basinski’s piece, adapted for orchestra, was featured at the Temple of Dendur for the tenth anniversary of September 11th. As Basinski accounts,  

[Y]ou know, no one wanted to go out that day, nobody wanted to remember an anniversary. You don’t celebrate this kind of thing, but it was a day of remembrance and several people told me how profoundly moved they were and how they felt that the whole energy had changed and somehow the resonance had lifted. Maybe, somehow, there had been a moment of healing in that silence (Basinski).

The death of a small segment of a forgotten piece of music in one man’s attic transcends into a universal expression about the transience of life. With this art, we can confront that which we do everything in our power to hide. In effect, we re-encode this trace information back into our lives, through the meaning we weave into the holes that open up in the tape.

In this case, as in many others, trace information yields a proliferation of meaning. When information starts to unbind from its original, rigid, patterns of organisation, a freedom emerges; and freedom begets new uses. The dead elephant becomes a community of consumption around which its participants activate their own patterns of organization. Therefore, trace information does not merely hold value as a residue of its source, but has latent powers in its decay for rebirth as something both novel and profoundly meaningful. Bates’ characterizes trace information justly; and now we recognize its potential impact and capabilities.

One more point of discussion, however: the practical lesson. By its own nature, the library is replete with trace information. As time progresses and thinkers, scientists, and authors propose and try fresh ideas and methods, old frameworks fall out of date. And, naturally, questions arise about the approach to this obsolescence. Should we think about old books merely in the historical context of their time, as a trace back into the minds and minutiae of their creation? Or does previously obsolete material contain yet dormant powers to re-inform? Certainly the former is obvious, the object of museums and exhibits since their inception. But, the latter, the ability to actively communicate with the past, also seems true, when we acknowledge the phoenix-like ability of trace information.

We see hints of this effect in this week’s reading of Rodger’s “New Theoretical Approaches for Human-Computer Interaction,” where Soviet Activity theory originally intended to explain “cultural practices […] in the development and historical context in which they occurred (Rodgers 103).” This lens has transcended the destruction of its initial use in context, where it now applies to contemporary studies of Human-Computer Interaction: its Soviet denotation decayed, to enable capitalist appropriation.

A final, more evident example occurred in ethical discourse, where Aristotelian virtue ethics fell out of favor during the enlightenment and resurged back into contemporary use in the 1950s (Anscombe). Today, it stands as a viable and compelling opponent to the prevailing utilitarian and deontological ethical viewpoints.

I recognize the semantic loads that some the concepts I mentioned carry. We require much further study into the true implications of trace information as it stand as part our epistemological horizon. Therein, however, lies my point. Perhaps we can now comprehend the substantial weight trace information carries, and therefore we can better explore its depths.


Works Cited

  • Anscombe, G. E. M. (1958). Modern Moral Philosophy. Philosophy 33 (124):1 – 19.
  • Bates, M. J. (2006). “Fundamental forms of information.” Journal of the American Society for Information and Technology 57(8): 1033-1045. http://www.gseis.ucla.edu/faculty/bates/articles/NatRep_info_11m_050514.html.
  • Basinski, William. “Divinity From Dust: The Healing Power Of ‘The Disintegration Loops'” Interview by Lars Gotrich. Npr Music. Npr, 15 Nov. 2012. Web. 27 Sept. 2015. <http://www.npr.org/sections/therecord/2012/11/12/164978574/divinity-from-dust-the-healing-power-of-the-disintegration-loops>.
  • Rogers, Yvonne (2004). New theoretical approaches for human-computer interaction. Annual Review of Information Science and Technology, 38(1), pp. 87–143.