Molecular Modeling
Internet Resources for Biologists

by Christopher M. Smith


(Posted October 30, 1998 · Issue 41)


If a picture is worth a thousand words, then in a literal sense, computational models are nearly priceless. From the viewpoint of a biological scientist, modeling provides a portal to the exploration of intricate details of biological structure through interactive pictures. The viewer controls the ambience of the encounter, the vista from which the molecular landscape is viewed - in essence becoming an integral part of the model. This experience provides unique opportunities for scientific discovery because the scientist's intuitive wisdom becomes part of the perspective, a perspective that includes subjective reasoning and objective reality. No longer are molecular scenes limited to simple temporal and spatial snapshots. Scientific knowledge, intuition, and hypotheses build upon one another in the creative process of model building, and the modeler is afforded seemingly endless opportunities to analyze and test models as they are being constructed.

The profound influence of modeling (in essence, of art) in the process of scientific discovery was made very evident in a research laboratory in Cambridge, England during the early 1950s. A few years before, Oswald Avery had clearly demonstrated that deoxyribonucleic acids, not protein, constituted nature's genetic material. Important chemical properties of DNA (such as Chargaff's rule) had recently been established, and new crystallographic studies had revealed molecular details that could not be explained using then-current methods of data interpretation. What did the data mean? How did DNA work? These were the questions of the day. These were only resolved when a new mode of data interpretation was tested, with astounding success, by Francis Crick and James Watson. Using physical three-dimensional models, the pair deduced the structure of DNA by literally building it. From the structural model, it was quite obvious how DNA worked and how replication occurred.

It was evident that along with intellectual effort, a mixture of artistic ingenuity, three-dimensional model building, and access to certain critical pieces of data had been required to resolve the enigma of DNA. The importance of multidimensional molecular modeling in scientific discovery had been made clear. Moreover, although it was partially overshadowed by the physical model, the significance of information, or more accurately of access to information, was also quite apparent. It was becoming increasingly clear that modern bioresearchers needed greater access to a broad spectrum of data, and to the tools required to analyze it. Each of the major scientific discoveries of the 1950s, 1960s, and 1970s reemphasized these needs. Some of the first computational modeling programs were developed in the early 1960s. Over the next 20 years technological advances in computer science and in mathematical algorithms yielded some very useful modeling applications - from basic visualization tools to high-end homology modeling utilities and routines for predicting tertiary structure from primary structure ab initio. Although these programs were a major asset to those using them, their use and their benefits to the entire biological community were limited by the lack of universal access to the programs themselves, compounded by the lack of ready access to structural data. Most of these hurdles were removed with the advent of the Internet and later of the World Wide Web. Armed with new modeling tools and seemingly unrestricted access to structural data and related information, today's biological researchers are well equipped to carry scientific discovery into the 21st century.

General Molecular Modeling Information Resources

The Web sites of the Center for Molecular Modeling at the National Institutes of Health (NIH) and the CMS Molecular Biology Resource (CMSMBR) at the San Diego Supercomputer Center (SDSC) are treasure troves of information and links to modeling resources on the Web. They contain listings of modeling centers, research projects, and academic programs, along with compendiums of instructional materials and modeling software. Although these software libraries are fairly broad, the most comprehensive and organized resource is perhaps Network Science's List of Computational Chemistry Software. This compendium includes brief summaries of a vast repertoire of programs (commercial, shareware, and public domain) in categories that include molecular modeling, structural chemistry, bioinformatics, and cheminformatics. Modeling software is further organized according to visualization type, and to the mathematical methods used to analyze or create the models. Researchers can easily find modeling tools appropriate for their needs, from simple structure-visualization applications to quantum mechanical routines for calculating surface electrostatics. Other excellent software listings and program repositories include the European Bioinformatics Institute BioCatalog section on Molecular Modeling & Graphics, the Quantum Chemistry Program Exchange (QCPE), and the Computational Center for Macromolecular Structure (CCMS) site. The QCPE, run by a consortium of application developers, provides a mechanism for managing the distribution of their programs. Similar to the QCPE, but on a smaller scale, the CCMS site is a distribution interface for programs developed by research groups at The Scripps Research Institute, the Salk Institute for Biological Studies, and the SDSC. The use of the Internet as a conduit for distributing software and technical information has expedited biological research considerably by allowing researchers to access needed programs and technical support, and to communicate with program authors almost instantaneously.

Structure and Sequence Information Resources

To visualize or create a molecular model, the modeler must supply the application with data that describes the atomic features of the molecule. Most structure data is created from NMR and X-ray analysis of crystallized substances, and written into what are commonly called coordinate files. These contain textual information (such as the molecule name, author, NMR/X-ray technique, and atomic resolution) along with the spatial location (Cartesian coordinates) for each component atom. Additional information usually included describes the characteristics of the coordinate data (e.g., B-factors, which define the degree of mobility for each atom). All published coordinate files are archived at the Protein Data Bank (PDB) of the Brookhaven National Laboratory (BNL), and are referred to as PDB files. The BNL PDB archive is being taken over by the Research Collaboratory for Structural Bioinformatics (RCSB), a non-profit consortium comprising the Nucleic Acid Database project at Rutgers University, the San Diego Supercomputer Center at the University of California, and the National Institute of Standards & Technology. Researchers can access structures in the PDB using BNL's 3DB Browser, the Entrez Biomolecule 3D Structure Search tool maintained by the National Center for Biotechnology Information (NCBI), and soon new tools from the RCSB.

Although the rate at which protein structures are being solved has risen dramatically over the years, it has not kept pace with the rate at which new proteins are discovered, or with biological researchers' expanding need for structure information about these new proteins. This information is particularly important in the pharmaceutical industry, where drug development programs almost always depend on modeling the interactions of proteins with small-molecule ligands (potential drugs). One way to solve this problem is to create a theoretical protein structure model. One means of accomplishing this is to superimpose the predicted primary and secondary structure of the protein of interest on the known structure of a closely related member of the same protein family. The homology modeling for beginners online course at the European Molecular Biology Laboratory (EMBL) Web site provides a useful introduction to this approach. Most homology modeling projects require high-end applications and hardware, but the data and preliminary analyses usually originate from Internet resources.

Primary sequence alignments may be performed using the CLUSTALW, MSA, or ALIGN multiple sequence alignment utilities, and secondary structure predictions obtained using the PredictProtein Server, SOPM, or Predator computational tools. These Web-based tools will execute analyses on user-submitted data and information extracted from a remote database. Most of them provide a transparent interface to the major protein sequence databases. The databases - SWISS-PROT, GenBank, PIR (Protein Information Resource), and MIPS (Munich Information Centre for Protein Sequences) - can also be accessed directly. A comprehensive listing of various sequence databases and tools for predicting secondary structure is available via the CMSMBR Structure Prediction & Modeling Services and Protein Sequence Databases & Search Engines Web pages, respectively. Other comprehensive databases and listings of tools include ABI and DBCAT.

Online Visualization

Simple visualization and interactive modeling can now be achieved directly using the Web. Most Web browsers today (including Microsoft’s Internet Explorer and Netscape 4.0) are Java-compliant, and appropriate plug-ins - helper applications that work in concert with the browser, expanding its functionality - are readily available. MDL's Chemscape Chime, NCBI's Cn3D tool, and Roger Sayle's RasMol are three excellent molecular visualization plug-ins that provide basic interactive model visualization and will work on practically any computer. Java-based applications, such as PDB3D, Java Molecular Viewer, WebMol, and QuickPDB, do not require plug-ins, yet provide added features for interactive visualization and manipulation of protein structures. Some viewers, among them QuickPDB, also display the corresponding primary sequences, allowing simultaneous examination (and highlighting) of sequence and structure elements. Although these applications are not as sophisticated as their stand-alone counterparts, their simplicity and ease of use make them excellent assets for occasional and experienced modelers.

Virtual Reality Modeling Language (VRML) literally and figuratively adds a new dimension to structure visualization. With most three-dimensional visualizations, the viewer is always a bystander looking at the molecule from outside. With VRML, the viewer can visually step into the molecule, as if becoming a part of it. What 20 years ago was science fiction, and 10 years ago took months and was only available from Hollywood, is available on our desktop computers today. VRML plug-ins are available for a number of operating systems and Web browsers. The VRML Repository at SDSC is an excellent Web resource for these plug-ins, along with additional VRML information. The Molecules-R-US Web site currently offers VRML-based structure visualization as part of a panel of viewing choices created in response to users' search requests. Potential VRML modelers can also access the Virtual Reality Modeling Language in Chemistry site, which focuses specifically on VRML structure visualization.

Modeling Inroads to Scientific Discovery

We live in the age of information, where the possibilities for scientific discovery are endless. Ideas, thoughts, and hypotheses are less restricted than ever by the lack of information, access to that information, and the technology for analyzing it. Web-based tools and resources bypass the need for expensive applications and hardware or advanced computer literacy, and bring modeling capability to all researchers. By reducing the technological impediments to the use of an invaluable analytical tool, they enable researchers to focus on biological problems instead of computational hurdles. Structural bioinformatics, the study of macromolecular structure from atomic movements to macromolecular interactions, has advanced rapidly in the past few years. Progressive scientists may now partake in the fruits of the computational biologists' labors - molecular structure tools for scientific discovery.

Christopher M. Smith is the coordinator for the Protein Kinase Resource/Database project and curator of the CMS Molecular Biology Resource at the San Diego Supercomputer Center.

Send us your comments and ideas for future articles.

Endlinks

Molecular Modeling Centers and Research Groups

Modeling Software Information

Macromolecular Structure Data Resources

Small-Molecule Data and Information Resources

Instructional Resources, Guides, Tutorials

Basic Modeling Visualization Applications

Image Resources


Previous In Situ Articles
Science News on the Net: Fast Food, Bistro, or Order In
by Sean Henahan (Posted October 16, 1998 · Issue 40)
Pixels at an Exhibition
by Beth Schachter (Posted October 2, 1998 · Issue 39)
Complex Systems in Biology
by Marina Chicurel (Posted September 18, 1998 · Issue 38)
A Billion Base Pairs Up for Grabs
by Jo McEntyre (Posted September 4, 1998 · Issue 37)
Summer Surfing with the Kids
by Amy Fluet (Posted August 7, 1998 · Issue 36)
Web Resources for Model Organisms
by Pamela M. Gannon (Posted July 24, 1998 · Issue 35)

more