PROFILE

Finding the Value in Scientific Pictures
Image Informatics and Scimagix, Inc.

by Deborah J. Ausman

Posted July 21, 2000 · Issue 83


Abstract

While researchers have plenty of systems at their disposal for managing alphanumeric data, the visual descriptions of experimental outcomes - scientific image data - must still be handled manually. Scimagix is using technology borrowed from machine perception and multimedia research to make images searchable, mineable, and, ultimately, accessible.


Clichéd linguistic references aside, what is a picture really worth? An immunoblot or microscopy image might not garner much at eBay, but if that image helps to reveal the mechanism behind a particular pathology, its value could be tied to millions of U.S. dollars in worldwide new drug sales.

Image informatics: Companies find a niche mining image data.

Conversely, the image will have no value at all if researchers never glean any information from it. The preponderance of image data produced by modern labs - as much as 70 percent of the experimental data generated - has revealed a dearth of tools for managing this data effectively. In many labs, image files are stuffed in a directory, inaccessible to anyone other than the researcher who owns that computer. And given the volumes of data generated by today's high-throughput techniques, all those immunohistograms, in situ hybridization results, and electrophoresis patterns can be difficult for researchers to compare manually.

"They call it gel gazing," explains Robert Dunkle, president and CEO of Scimagix, Inc., a start-up software firm tucked along the water in Redwood Shores, California. The "they" he's referring to are proteomics researchers at Parke-Davis, one of the firms that has partnered with Scimagix to create software applications for managing scientific images. To identify patterns in protein expression, Parke-Davis researchers and others like them hunker down and tediously look at and compare gel after gel after gel. It's a decidedly low-tech and laborious end point in an otherwise high-tech process.

Scimagix aims to make images as searchable as text.

"The last decade has seen significant advances in laboratory automation and, along the way, tools for databasing and mining alphanumeric data have matured," says Dunkle, who has spent over 15 years managing and marketing informatics products for chemistry and biology. But compared to these areas, image handling and analysis has lagged. Hence Scimagix, which aims to make image content as searchable as text and numbers through a unique technology developed initially for broadcasting.

Multimedia Meets the Western Blot

The idea propelling Scimagix is not merely to extract alphanumeric data from images, but to turn images themselves into queries that can be used to find other, similar images. The concept may sound far-fetched, but, in fact, the ability to perform low-level image searching and analysis has wended its way from intelligence and military applications to consumer products for organizing things such as photo albums and stamp collections. Today, extensions of the technology are also being used to manage broadcasting and industrial video archives.

Virage pioneered image search technology for broadcasting.

Virage (San Mateo, California) is one of the companies that pioneered image-search technology for broadcasting. Its particular innovation is "visual information retrieval" (VIR), developed by Ramesh Jain, a computer scientist and engineer specializing in multimedia information systems, image databases, machine vision, and intelligent systems. Scimagix was born when Virage saw the opportunity to apply VIR to a new sector: pharmaceutical research and chemical discovery.

"The initial problems Ramesh and his group were trying to solve were along the lines of 'Find the pictures that look like this picture,'" says Paul Lego, Virage's CEO and a member of Scimagix's board of directors. VIR accomplishes this by characterizing four "primitives" that combine to create an image: color, shape, size, and texture. The characterization produces a 70-dimension vector that mathematically describes images. To search for images that are "like" each other, users tune the primitives. An emphasis on color, for instance, would help you tell an apple from an orange; an emphasis on shape would enable you to distinguish between both of these objects and a Rubik's cube.

Figure 1
The ability to select "regions of interest" within images makes VIR more efficient than other image-analysis techniques developed to date - and it is what has enabled Scimagix to apply this technology to pharmaceutical research and development successfully. Rather than characterizing and searching entire images using VIR, researchers can instead define a particular region of a 2-D gel that contains interesting features, such as protein "constellations" known to confer drug toxicity. The defined region can then be used to query a database of gel patterns to find other instances of those features (see figure 1).

Dunkle acknowledges that VIR requires algorithmic tuning. "If the question is one of black versus white, native VIR does fine," he says. "It's the shades of gray, such as telling the difference between healthy tissue and first-stage pathology, that need tweaking." But Dunkle is also adamant that the technology isn't intended to replace visual searching by humans. Rather, it's intended to support human searches so that those same humans can make better decisions.

VIR aids the human eye by organizing information.

"We know that patterns of expression exist across different drug classes," Dunkle points out. "But even with our uncanny perceptual abilities, humans just can't take in the hundreds of images necessary to resolve those patterns. VIR offers a way to pull together related, or even just possibly related, image data and view that data in tandem with other results, which gives researchers a better chance of being able to detect trends and decide what story the data is telling."

The "Image Informatics" Space

Like high-throughput screening and combinatorial chemistry, which were initially criticized for being imprecise and risky, image informatics (Scimagix's term for the space created by its products and services) needs to prove its value before it will become an accepted part of life science R and D. When a reporter asked the list server of the Laboratory Robotics Interest Group to comment on the technology, most respondents assumed that "image informatics" referred to data aggregator technology or to software for one-to-one image comparison. Searching the image content itself? When list server members grasped the concept, most doubted that it was possible.

Most researchers can't imagine searching for image content.

"We've really had to educate researchers, asking them straight out, 'Do you realize that you actually can search on an image and find information relevant to your research?'" explains Suzanne Mattingly, a molecular biologist and vice president of marketing at Scimagix. Part of the problem is cultural, according to Mattingly. Because of their complexity, images have yet to become part of the informatics mix that today includes alphanumeric and chemical structure data. In fact, images are often used after the fact, rounded up and pasted into reports to back up decisions, rather than used in conjunction with other data to guide decisions.

"There was no image-management strategy," responds Michail Esterman, an information consultant at the scientific imaging center at Lilly Research Laboratories (Indianapolis, Indiana), to a question about how his company has managed image data in the past. "The problem has been that images are scattered over many directories, either local or remote. It often takes less time to redo the experiment than to find an image" from an experiment done more than a few weeks back.

Figure 2
Before Scimagix can market its VIR-based technology for searching databases of images (a 2-D gel analysis and mining module is slated for release in the fall), it has had to develop capabilities for getting images into databases in the first place. In March, the company released the Scientific Image Management System (SIMS), an Oracle/Web-based package for organizing, storing, retrieving, and mining images (see figure 2). Both Parke-Davis and Eli Lilly have publicly announced their licensing of SIMS.

It's been a focused effort, according to Dunkle. In the nine months since his official appointment as CEO (announced in October 1999, simultaneously with the company's completion of its first round of venture capital financing), Dunkle has seen Scimagix quadruple in size. The bulk of the new hires are in development, and the team has adopted a round-the-clock development cycle. "Some of our developers work until two or three in the morning, and then our head of engineering [Bryan Van Vliet, a founder of the company] walks in around five or six and gets the day going," Dunkle said. "By the time the morning staff arrives, Bryan has integrated all of the prior day's work into a new build."

Most employees are scientists turned computer scientists.

It's a common sentiment among start-ups, but Mattingly - an executive who states strongly, "I don't do 'also-rans'" - insists that the staff is driven to succeed because they are committed to defining the next step in scientific data management. Most of the employees, particularly in development, are scientists turned computer scientists who have worked with the data types currently handled by other scientific software vendors - vendors that are currently scrambling for position as object-oriented programming and the preeminence of Oracle obsolesces their current product strongholds.

"While chemical structures, for instance, are really only relevant to chemists, images can be relevant to everyone if they can only be made widely and readily available," notes Mattingly, whose history includes a VP position at Oxford Molecular. She relates a story from one of the pharmaceutical companies that she visited recently. A member of one project team happened to see an image produced by another project team. One glance, and the team member discovered that the image, generated by a member of a completely different team, had relevance to his area. His team was soon pursuing it as a new drug candidate.

"Success in science often rests on data exchanges."

Dunkle concurs. "Discoveries like this one don't have to be accidents," he says. Making images part of the informatics mix not only helps those responsible for image management, but also opens up further opportunities for discovery as the data contained in images is shared and integrated with other experimental data. "Success in scientific research often rests on data exchanges," say Dunkle. "It's just a matter of getting the right images in front of people so that they can see things they couldn't before. This is image informatics."

Deborah J. Ausman writes about the tools and technologies that support pharmaceutical and chemical discovery and development.
Grant Jerding is a freelance illustrator who specializes in photo collages and manipulations. His clients include Audubon magazine, Better Homes and Gardens, Consumer Reports, Discovery Channel, Popular Science, Scientific American, USA Today, and U.S. News and World Report.

Tell us what you think.
FeedbackFeedback

Endlinks

Dart Home Page - includes links to the Shoebox software for managing digital photos. From AT&T Labs.

Computational Vision Lab - investigates how computers perceive color. From the School of Computing Science at Simon Fraser University (Burnaby, British Columbia, Canada).

Computer Vision Homepage - offers resources, links, and in-depth information on research in computer vision and image analysis.

Face Detection Home Page - much of the work in pattern recognition has focused on how humans are able to recognize different faces. This page describes research in defining algorithms for detecting faces in arbitrary scenes.

Medical Imaging Resources: Content Listing - a list of resources on medical imaging, one of the areas into which Scimagix plans to expand. From the Center for Medical Imaging Research at the University of Leeds.


Previous Profiles

Sniffing for Success: Senomyx, Inc.
by William A. Wells (Posted July 7, 2000 · Issue 82)
It's Not Just DNA Anymore: Prion Proteins and Hereditary Information
by Rabiya S. Tuma (Posted June 23, 2000 · Issue 81)
Send in the Gas: NicOx S.A.
by William A. Wells (Posted June 9, 2000 · Issue 80)
Vaccine Pharming: Charles Arntzen and the Boyce Thompson
Institute for Plant Research
by Sara Latta (Posted May 26, 2000 · Issue 79)
TIGR's Minimal Genome Project:
How Many Genes Are Necessary to Sustain Life?
by Vicki Brower (Posted May 12, 2000 · Issue 78)
Painting a Brighter Future for Dogs and Humans
by Sharon Kingman (Posted April 28, 2000, 2000 · Issue 77)

more