by
Abstract
Bioinformatics marries computer science and biology. The young union won't flourish unless we cultivate practitioners well versed in both disciplines, and give them creative work to do.
Bioinformatics is one of the hottest areas in the biomedical marketplace. Every week, in the back of the major scientific weeklies Science and Nature, there are several advertisements for bioinformatics positions in industries ranging from start-up biotechnology companies to Fortune 500 pharmaceutical giants. In the academic world, universities are having a hard time holding on to and replacing bioinformatics specialists lured to industry.
With the increase in complexity and power of both computer systems and bench
research techniques, human "bridges" who understand both
disciplines, and can communicate with scientists in either field, are in
very great demand. Bioinformatics was called a scientific "wave of the
future" in Science feature stories in both 1996 and
1997
(paid subscription
required to access these full-text articles). An article in The Scientist last
year by Thomas W. Durso declared that "As Genomics Grows, Future For Bioinformatics Is Bright,"
further bolstering the confident forecasts for the field, since the various
genome sequencing projects are sprinting quickly toward completion.
All these signs point to a prosperous future, and to streets that seem paved with salary and grant gold for scientists who are and want to be bioinformatics specialists. But a more careful examination may reveal some hidden clouds in these sunny skies. A closer look at the advertisements for industrial bioinformatics positions reveals that expert training in biological sciences, and the demonstrated ability to solve biological problems, is buried among a fairly long array of requirements regarding knowledge of programming languages and methods. Even though it might be difficult to find bioinformatics scientists to fill positions in departments that offer bioinformatics degrees in academia, there are very few openings in other departments for scientists who use bioinformatics to answer fundamental biological questions. There are only five programs in North America that offer Ph.D. degrees in bioinformatics or computational biology.
In addition, funding for the training of bioinformatics scientists is
limited to training grants in medical informatics programs from the National Library of Medicine, and to Department of Energy fellowships to computer
scientists who want to enter the field. Funding mechanisms for stand-alone
bioinformatics research projects are not yet in place. For certain
projects, such as databases, funding is usually provided only for the
development of the project, and not for its maintenance. Furthermore,
publication of research using only bioinformatics approaches is very rare in
mainstream scientific journals. It seems that on the one hand there is a
great demand for (and only a small supply of) those skilled in
bioinformatics, compounded by the fact that very few institutions provide
formal training in it. On the other hand, there are very few openings and
opportunities other than support and collaborative positions for
bioinformatics scientists in today's job market.
What is the reason for this discrepancy? One answer might be that the field is still very young and not well defined, even among bioinformatics practitioners themselves who, to add to the complexity, come from diverse training backgrounds such as computer science and medicine. Even the name of the discipline - computational biology, or bioinformatics? - is a matter of debate among those in the field. Historically, the use of computers to answer biological questions, which is a functional definition of bioinformatics, started with the development of algorithms and their application to understanding the interactions of biological processes and the phylogenetic relationships among organisms based on gene sequence information. The exponential increase in the amount of genomic sequence data available, as well as the increase in computer-driven machinery for data acquisition and analysis, expanded the breadth of bioinformatics.
Databases must be constructed to hold the data, and specialists must be used
in computer-based data collection and analysis. Scientists involved in this
type of project are now thought of as support personnel assisting in the
work of the bench scientists, who are exclusively involved in the actual
information collection that goes into the databases constructed or analyzed
by bioinformatics scientists. This is the kind of thinking that informs the
previously mentioned industry hiring practices. This practice is
understandable, since most companies have a lot of data on hand, and much
more on the way, to be stored and retrieved as fast as possible. The
scientists who are best able to construct these databases quickly and well
are computer scientists, who have little background or interest in data
analysis. That is usually the work of the bench-oriented genomics
scientists supported by these databases.
What differentiates a scientific discipline from a support field is that the former involves hypothesis-driven research, while the latter supports such research. Unlike many support fields, bioinformatics has involved hypothesis-driven research since its inception. Theories of molecular evolution have been examined using post-sequencing genomics. Theories on molecular interactions, and on complex processes such as nervous excitation, have been examined using molecular modeling. The genomics and modeling areas of bioinformatics are starting to be viewed as a scientific discipline, as evidenced by the increased publication of stand-alone papers on these subjects.
How could bioinformatics research in areas such as database structure be
treated as such? A database is conceived to aid data collection and analysis
by bench scientists, and so is a construction for support and collaboration.
But there is much room for hypothesis-driven research in the database field.
One can point to the analogy of the plasmid construction area of molecular
biology: databases that are just storehouses of data are as useful as
plasmids that store a particular gene without having any additional
functionality, allowing investigators to get information about the activity
or the structure of the gene. As plasmids are usually constructed for use in
particular experiments to answer particular questions, so databases can be
constructed so that particular biological questions can be answered by data
mining.
The challenge in database construction is to establish an architecture that allows for intelligent searching, communication with other databases, and the coupling of specific analytical tools to solve specific biological problems. Scientists who can construct these databases must have the background to determine which particular scientific problems need solving, and which methods best solve them. Scientists not versed in basic biological research cannot meet both of these requirements.
In terms of their acceptance as scientific disciplines, there are many
parallels between the early days of molecular biology and those of
bioinformatics. Only time will tell whether the latter will follow the
course of the former. Watch the important signs in the back of your
favorite scientific weekly, not in a special section dedicated to the field,
but in the classifieds: positions for bioinformatics researchers in basic
science academic departments, grants and fellowships specifically for
bioinformatics training and research, new bioinformatics departments, and
industry positions requiring expertise in both biology and informatics.
Emmanouil Skoufos is a postdoctoral fellow at the Center for Medical Informatics at Yale University School of Medicine.
Andrzej Krauze is an illustrator, poster maker, cartoonist, and painter who illustrates regularly for HMS Beagle, The Guardian, The Sunday Telegraph, Bookseller, and New Statesman.


Endlinks
Science: Current Positions Advertised - weekly job listings.
Genome Monitoring Table - with up-to-date statistics on the status of the different genome projects.
Genome Sequencing Projects - linked list of all genome sequencing projects.
A Curriculum for Bioinformatics: The Time is Ripe - editorial by Russ Altman of Stanford University. Requires Adobe Reader.
Bioinformatics in a Post-Genomics Age - by Diane Gershon, from the September 25, 1997 issue of Nature. Free registration is required for access.
University Bioinformatics Programs list of training programs. Maintained by Indiana University.
Computational Biology and the Cross-Disciplinary Challenge: Finding a Home in Academia - paper from the Computer Science and Telecommunications Board National Research Council symposium of May 16, 1996. By E.H. Shortliffe. Requires Adobe Reader.