Missing Bodies
Scientists Access Data - They Don't Read Bodies of Text

by Robert Ubell

(Issue 6 · posted April 18, 1997; archived May 2, 1997)


Scientific publishers beginning to establish a Web presence have quickly gotten the message from their readership: Provide data that can be easily browsed and downloaded fast. The Web rule, long established, that readers of electronic publications do not want to read text on-line, leads us to wonder about their reading habits off-line.

Fast rewind. If you were a fly on the wall in a library and could observe scholars as they browse through journals, you'd discover some peculiar but fairly common behavioral characteristics. The first thing an investigator does when she plucks a periodical off the rack is to glance down the table of contents, looking for titles that fall within her circle of interest and searching for familiar names, notable authors, friends, colleagues, and rivals.

Scientists read tables of contents the way travelers study train schedules. Can they make it? Or has their train already pulled out the station? Will they be left standing on the platform, waving good-bye as the express speeds along past them?

Eugene Garfield was the first to appreciate this particular behavior when he created Current Contents. It was a brilliant insight into the way scientists actually behave - not the way we would like them to, or the way they are expected to, act. Reproducing easily accessible tables of contents right out of publishers' own journals, Current Contents was the first secondary publication in the scientific literature business to penetrate the culture of scholarship in a unique way, to offer scientists all of the tables of contents in their field, not just the ones they happen to have at hand.

Today, you can buy collected contents in print and in a variety of electronic formats. You can even subscribe to on-line contents services. BioMedNet's version of MEDLINE (Evaluated MEDLINE), for example, allows you to select from thousands of journals for browsing or, better yet, to search for topics or authors of interest.

What does the scholar do next? If he spots the name of his rival as the author of an article in the contents, he flips immediately to that paper. But instead of sitting down at a nearby table to read it from beginning to end, our scientist turns to the reference list at the end of the article, skipping the text entirely, to determine whether his name appears among the authors cited. This act is among the most complex of all behaviors. Apart from superficial narcissism, his search through the citations reveals another consequence of the culture of science. With a great sigh of relief, he spots his name. At least he wasn't ignored. But then comes anxiety. Did the author cite his work accurately? Did he accept our scientist's conclusions? Or did he trash it, moving beyond our investigator's results to the next stop, leaving our friend stranded on the platform?

Once again it was Gene Garfield who discovered the persuasive power of citations that catapulted him into the pantheon of information gurus. Some years ago, International Thomson, the giant publishing conglomerate, paid more than $200 million to Ted Cross, Joe Palazollo, and others for their share of Garfield's incredible and unique database, Science Citation Index. (The sum was also paid for other products, including Current Contents.) Today, you can get ISI's citation index in print, on-line, tape, CD-ROM, whatever format suits you, your lab, or your library. Librarians tell us that ISI's scientific journal impact ratings are among the most important items they review when deciding whether to renew or cancel a subscription. Those periodicals that fall to the bottom of ISI's scale are going to have a tough time making it this summer and autumn when the renewal season comes around.

Let's return to our scholar, observing her as she continues to go through the article. What does she do next? If our scientist is typical, she then scans the article, glancing from illustration to illustration, to absorb the results as displayed in diagrams and flow charts, captured on opposing axes, reproduced as saw-toothed spectra or in halftones as X rays or chromatographs, or simulated in false-color computer graphics. What she sees tells her most of what she needs to know about the meaning of the article, since these figures generally encapsulate the way research was performed, the experimental evidence, the results, and even the conclusion.

Our investigator then browses further through the article, spotting mathematical formulae, algorithms, and other symbols, performing mental calculations as he wanders through the pages. Now he stops to consider his position and judge his rival. He concludes that (1) the author has performed some sound research; (2) his competitor may be on to something important; (3) the bastard scooped him. Or (4), happily, the author is completely off track. If the latter, our scientist speeds along on the express, leaving his rival vacantly on the platform - a scene that gives new meaning to Einstein's train experiment explaining relativity.

No doubt by now you've noticed that the scientist we're observing hasn't read a word of the body of text. Except perhaps for the methods and materials section, the article's primary text (introduction, results, discussion, conclusion) has been summarily ignored. If, however, you were to approach her as she places the journal down on the table and ask, "Did you just finish reading that issue?" she wouldn't hesitate to respond affirmatively.

Classically, publishers have been guardians of words. Figures have been considered a less elevated species, an afterthought, commonly plopped into empty squares and rectangles on the page. And they remain a nuisance in the newly created word-captured databases. In the emerging Web technology, figures, formulae, and numbers are not easily identified, formatted, or retrieved.

As the above example was intended to illustrate, however, it's clear that even in the dark ages when scholars were found in libraries, words have not been the primary means of conveying and perusing information for the typical scientist. Now the medium exists to exploit this knowledge. The next generation of innovation in the technical information business will not come from words, but from pictures and numbers. The World Wide Web has succeeded, and is mushrooming as the king of content providers. It facilitates the quest for small and numerous and connected bites that are easily absorbed and integrated. Publishers must follow the lead of software vendors and innovative scientific authors to further develop, adapt, and promote graphical representation of complex and varied data. When you can transform what is at present inert into something truly dynamic - allowing interactive manipulations and comparative analysis in addition to purely striking and informative displays of data - you'll have products and services worth the attention of scientists. Another challenge is to assemble the graphics into searchable databases, so that the information they carry can be as easily queried and retrieved as regular text.

David Letterman's lists, the visual world of MTV - these have reinforced the scientist's natural penchant for scanning and seeking out attractive bits. We must respond in kind with a blueprint for scientific publishing on the World Wide Web that enshrines data in bodies other than text.

Robert Ubell is President of BioMedNet USA. He has held positions in the field of professional and scientific publishing for over 25 years, including President and American Publisher of Nature, and creater and founding publisher of Nature Biotechnology.

Andrzej Krauze is an illustrator, poster maker, cartoonist and painter who illustrates regularly for HMS Beagle, The Guardian, The Sunday Telegraph, Bookseller and New Statesman.