Web Resources for Model Organisms

by Pamela M. Gannon


(Posted July 24, 1998 · Issue 35)


For over a hundred years, research scientists have studied model organisms to examine the mechanisms of inheritance and development. More recently, researchers have analyzed the molecular biology and biochemistry of model organisms to provide insight into gene function. The explosion in sequence data has revealed that many genes in lower eukaryotes and bacteria are homologous to human genes, suggesting that the gene products may share a common functionality. A wide variety of Internet resources are available to aid in the study of model organisms and to interface with the available genomic information.

E. coli

The simplest model organism is the bacterium Escherichia coli. Studies in E. coli culminated in the 1950s with the discovery of DNA as genetic material and continued with the elucidation of the chemical details of replication and transcription. The genome of the widely used E. coli lab strain K12 was completely sequenced in September 1997.

Online information about E. coli is scattered across many different Web sites, including several that provide overlapping information. A good starting point is the E. coli Index, assembled by Gavin Thomas at the University of Birmingham, United Kingdom. The index presents an extensive, actively maintained collection of links to, among other things, genome databases, E. coli researchers, journals, protocols, and professional societies. Somewhat current news about E. coli is provided on the home page.

Two groups of researchers have performed systematic sequencing of the E. coli genome. The E. coli Genome Project at the University of Wisconsin at Madison provides access to the full updated sequence of the E. coli genome with annotations. Users can browse or download sequences of the 4,405 identified open reading frames (ORFs) or of the entire genome sequence. The Escherichia coli WWW Home Page at the Nara Institute of Science and Technology includes the sequences determined by the Japanese E. coli genome project team and other teams worldwide. Users can search for and retrieve information on particular genes, ORFs, and sequences and construct a genomic view of the data. Users can also download the genomic sequence and generate ORF maps.

The E. coli database collection (ECDC), at Justus Liebig University in Giessen, Germany, supplies some functions complementary to those of the genome center sites. Although ECDC does not appear to have been updated since March 1997, it provides a good searchable index with E. coli genetic maps and a collection of data tables that reference genes, promoters, and tRNAs. Sequence homology searches using the BLAST and FASTA search engines are available directly from the site. GenProtEC, an E. coli genome and proteome database at the Marine Biological Laboratory (MBL) in Woods Hole, Massachusetts, lists E. coli genes and gene products and is a good place to browse for genes based on physiological role, gene product, and gene type; it also provides references. The E. coli Genetic Stock Center at Yale University allows you to search and directly order strains used in publications. The information is nicely linked to other information the database and publications, but the interface is not very easy to use.

EcoCyc: Encyclopedia of E. coli Genes and Metabolism uses an inventive graphical interface to provide an overview of E. coli metabolism. Users can click on gene names, components of metabolic pathways, and individual reactions within pathways to acquire detailed information about genes and gene functions. EcoCyc incorporates recent sequence data and references to the literature. Access is free to academic and government institutions; commercial users may purchase a subscription.

Yeast

Saccharomyces cerevisiae, commonly known as baker's or brewer's yeast, has been studied since antiquity. Through the efforts of over 100 laboratories worldwide, the complete genome sequence of the S. cerevisiae lab strain S288C was obtained in the spring of 1996, making yeast the first eukaryotic organism to be completely sequenced.

Three major Web sites with extensive databases provide outstanding coverage of online yeast resources. The cornerstone of online information on S. cerevisiae is the Saccharomyces Genome Database (SGD) at Stanford University. Extremely well designed and meticulously maintained, SGD provides genetic information, sequence analysis tools, structural data, listings of mammalian homologies, and yeast nomenclature information. SGD also includes yeast community information and news, and links to other yeast WWW resources. "Hot tips" on the home page show users convenient new ways to utilize the database.

The excellent Yeast Protein Database at Proteome indexes all yeast proteins whose sequences are known. Protein sequence information is updated daily, and users can utilize a short or long form to search by gene names, keywords, and protein properties. Detailed protein reports display all the information known about each protein, including accession numbers, synonyms, molecular weight, and modifications. The reports present information on subcellular localization, molecular function, protein interactions, and purification when this is available. Users can quickly find related genes and view alignments with protein sequences from other model organisms. For references, direct links to PubMed are provided. Weekly reports summarize new or updated protein reports.

The Yeast Genome Project at the Munich Information Centre for Protein Sequences (MIPS) provides utilities to search the yeast chromosomal and mitochondrial genomes for protein sequences. The site has a particularly good collection of tables that itemize essential and nonessential genes, protein interactions, and transmembrane domains. A number of catalogues provide information on protein function, phenotypes, and physical and genetic pathways in S. cerevisiae.

The World Wide Web Virtual Library: Biosciences: Yeast section provides a comprehensive collection of links to resources for all three of the commonly used yeast model organisms, S. cerevisiae, Schizosaccharomyces pombe (fission yeast), and Candida albicans. The collection includes resources for sequence analysis projects and laboratory protocols. The newsgroup bionet.molbio.yeast, accessible from the site, has been active for many years and discusses current issues concerning all type of yeasts.

C. elegans

Caenorhabditis elegans is a small hermaphroditic nematode that was developed as a model organism in the 1960s by Sydney Brenner and colleagues. It is distinguished by the fact that it is possible to trace the cell lineage of every one of its approximately 1,000 constituent cells. C. elegans is used by researchers primarily to study the genetics of development and neurobiology. The database software that was developed for analysis of the C. elegans genome is used by many other genome projects.

Online resources for C. elegans are few, and those that exist could use further development to take full advantage of Web technology. The most extensive resource is the Caenorhabditis elegans WWW Server, which is actively maintained by the University of Texas Southwestern Medical Center. The site includes announcements of interest to the research community, a list of lab home pages, and a searchable gopher index of C. elegans researchers.

The C. elegans Genome Project, maintained by the Sanger Centre at Cambridge and the Genome Sequencing Center at Washington University School of Medicine in St. Louis, provides access to completed C. elegans sequences and preliminary sequence data. A BLAST server allows users to search the sequence data. Users can also search the current database of C. elegans expressed sequence tags (ESTs) and the C. elegans protein database, WormPep. The site includes links and access to the 1995 and 1997 International Worm Meeting abstracts.

The ACeDB (A C. elegans Data Base) site, provided by the Genome Informatics Group at the National Agricultural Library, supplies an alternate access to C. elegans genomic information in the online version of ACeDB. The browsable and searchable site integrates a wide range of information, including DNA sequences, expression patterns, and cell groups. The resource includes lists of authors and publications. The complex resource seems to be designed for experts in ACeDB software and language; however, some online help sections are provided.

The Caenorhabditis Genetics Center at the University of Minnesota maintains stocks of over 3,000 C. elegans strains. Users can browse and search the gopher index and order strains via email. The site also maintains an updated C. elegans bibliography. To find additional information or view current C. elegans news, try the the moderated newsgoup bionet.celegans.

Drosophila

The fruit fly Drosophila melanogaster, because it lends itself so easily to classic breeding experiments, has been used as a genetic system since early in the 20th century. In the 1980s researchers began characterizing the genes that corresponded with mutant phenotypes and discovered homeobox genes, which are involved in developmental patterning and have since been identified in other species, including vertebrates. Only 12% of the Drosophila genome has been systematically sequenced; however, Venter et al. plan to test their shotgun sequencing approach on Drosophila [1].

A collection of excellent Drosophila Web sites, many cross-referenced with one another, provide information on sequence data, images, and other Web resources. FlyBase at Indiana University provides comprehensive information on the genetics and molecular biology of Drosophila. FlyBase incorporates results from both the Berkeley and European Drosophila genome projects. The searchable site supplies genome maps, a browsable image library, lists of stock strains (with an associated ordering mechanism), and contact information for Drosophila researchers. The reference database includes classic texts dating back to the 1920s. For current Drosophila news, FlyBase archives personal communications sent to the site and provides direct access to the newsgroup bionet.drosophila.

The Berkeley Drosophila Genome Project (BDGP) provides a variety of tools to access and view Drosophila genomic information. The site has BLAST search and sequence pattern searches, access to software developed by the project, a well-designed query form for the Berkeley Fly Database, laboratory methods, information on obtaining materials, and well-thought-out informational FAQ sections. The new BioView Java tool displays seven different graphical representations of sequence data. The Berkeley genome project is also compiling a library of expressed sequence tags (ESTs) by expression pattern.

The Interactive Fly, developed by Thomas Brody and hosted by the Society for Developmental Biology, is an excellent general informational resource for Drosophila designed to showcase Drosophila genes and their roles in development. The site contains an index of genes categorized by name and function; entries for individual genes present detailed descriptions of the gene and gene product, including the effects of mutations, evolutionary homologues, and protein interactions. Each entry also provides the genetic map position and FlyBase accession number, with references linked to PubMed. The Interactive Fly displays an introduction to the stages of fly development, with images, and also includes a guide to evolutionarily conserved developmental pathways and a well-chosen collection of links.

Two German Web sites provide illuminating and useful Drosophila image collections. FlyView, at the University of Münster, concentrates on expression patterns during development, and cross-references FlyBase and the BDGP site. Flybrain, mirrored in Freiburg, Tokyo, and Tucson, presents an extensive image atlas and database of the Drosophila brain and nervous system. Links to additional Drosophila resources can be found at the Drosophila Virtual Library, which includes links to Drosophila labs on the Web and protocol information.

Mouse

Although the mouse, Mus musculus, and human diverge by 75 million years of evolution, their DNA is remarkably similar, and mice are widely used to study the biology of genetic diseases. Some mutant mice have forms of human diseases, such as diabetes, and can be used as model organisms for studying those diseases. In the early 1980s researchers began to produce transgenic animals by inserting human genes into fertilized mouse eggs. Recently, the technique of homologous recombination has been used to target human genes into the mouse genome. These approaches have greatly increased the number of human diseases that can be modeled in mice.

The most extensive online resource for mouse researchers is the outstanding Mouse Genome Informatics (MGI) site at the Jackson Laboratory, a major developer and vendor of mouse strains. The resource offers a wide range of information on the biology and genetics of the laboratory mouse. The Mouse Genome Database includes information on mouse genes and phenotypes, mammalian homologies, mapping data, and molecular probes. The resource incorporates data from many research groups and provides references for all data. The first module of the Gene Expression Database has just become available. The site also provides searchable databases of mouse strains and mouse mutants.

For additional genomic information, the Genetic and Physical Maps of the Mouse Genome site, created at the Whitehead Institute/Massachusetts Institute of Technology Center for Genome Research, provides a database that is searchable by genetic marker or by yeast autonomous chromosome (YAC) within the YAC-borne sequence library. Alternatively, users can download genomic data directly. The Mouse Atlas and Gene Expression Database is being developed at the Medical Research Center Human Genetics Unit in Edinburgh. The visual atlas of mouse development will be integrated with gene expression data. Unfortunately, only sample pages from the atlas and anatomical sketches from the Standard Anatomical Nomenclature Database are currently available to all users.

Online databases are available to search for transgenic mouse strains. The Transgenic/Targeted Mutation Database (TBASE) at Johns Hopkins University School of Medicine organizes information on transgenic mice and targeted mutations in the mouse. TBASE includes a citation database, a glossary, and a useful collection of links. The search interface is easy to use, and the resource encourages researchers to submit published and unpublished data. BioMedNet's extensive Mouse Knockout and Mutation Database was originally created from extensive data published in Current Biology. The searchable and browsable database has been revised and updated; the additional material includes gene insertion mutations and other mutations of known molecular nature. Access is by subscription only.

There are a large number of online mouse resources provided by individual researchers or research groups. For a comprehensive listing of mouse links for researchers, try the Mouse and Rat Research Home Page maintained by Eric Mercer at the California Institute of Technology. The collection is well organized and up to date and includes links to genome resources, suppliers, technical information, and conference announcements. An introductory section provides information on recent additions to the resource and personal commentary on the mouse genome projects.

Other Model Organisms

The systems described above are considered to be the major model organisms. However, many other organisms, such as Dictyostelium discoideum, Xenopus laevis, and more recently zebra fish (Danio rerio), are used to study development. Research scientists also examine a large number of plant systems, the best-known being Arabidopsis thaliana. Specific Web sites and online resources are available for these and many other model systems.

Pam M. Gannon is the founder and Webmaster of Cell and Molecular Biology Online.

Send us your comments and ideas for future articles.

Endlinks

XREFdb - a database interfacing genome information from model organisms with the human genome. Cross-references homologous genes and proteins for humans and model organisms. Maintained by the National Center for Biotechnology Information.

Genomics: A Global Resource - a general resource for genomics information containing links to informational resources, specific topic areas, and legislative information. Sponsored by the Pharmaceutical Research and Manufacturers of America and The American Institute of Biological Sciences.

CMS-SDSC Molecular Biology Resource - a well-organized link collection with a good section on model organisms.

Nucleic Acids Research, Volume 26, Issue 1 - the Genome Database issue of Nucleic Acids Research includes summaries and descriptions of online resources available for all of the genome projects, with full text available to subscribers.

Other selected model organism databases:

Previous HMS Beagle features concerning model organisms:

Web sites mentioned in this column:

E. coli

Yeast

C. elegans

Drosophila

Mouse


Previous In Situ Articles
Travel Medicine
by Dean A. Haycock (Posted July 10, 1998 · Issue 34)
Internet Resources for Women Biologists
by Susan L. Forsburg (Posted June 26, 1998 · Issue 33)
Useful Beauty: Photomicrography Websites
by Marina Chicurel (Posted June 12, 1998 · Issue 32)
Discussion Groups on the Web
by Amy Fluet (Posted May 15, 1998 · Issue 30)
The Forsburg Lab
by Pamela M. Gannon (Posted May 1, 1998 · Issue 29)
Grant-Writing Tips and Resources
by Amy Fluet (Posted April 17, 1998 · Issue 28)

more