BEAGLE REVIEW
Analysis of Protein Sequences
with PepTool

BioTools Inc.

[Overview] [Program Features]
[
The Bottom Line] [System Requirements]
[
Purchasing Information]

Reviewed by Stuart M. Brown

(Posted February 20, 1998 ? Issue 25)


Test Platform

Power Computing PowerCenter 150 (150 MHz Macintosh compatible, Power PC 604 processor), System 7.6.1 with 64 Mb RAM

Overview

PepTool is a new program for the analysis of protein sequences developed by BioTools Inc. of Edmonton, Alberta, Canada, and distributed by National Biosciences, Inc. of Plymouth, Minnesota. The program offers an extremely large collection of tools for analysis of protein sequences, including database similarity searching, 3-D structure prediction, pattern/motif searching, and multiple sequence alignment. Many of the classic bioinformatics algorithms have been reworked for improved speed. The program offers a simple, menu-driven interface and an excellent multiple sequence editor, as well as graphical output for many analyses that can be annotated with built-in text and drawing tools.

Program Features


Figure 1
PepTool is more than just a suite of programs for manipulating protein sequences. In addition to the standard set of protein tools including hydrophobic moment, flexibility, helical wheel, dot plot, and pattern search, the program also performs powerful similarity searches, multiple alignments, 3-D structure prediction, and motif searching. PepTool is based on the SeqSee program created as a Unix-based research tool at the University of Alberta by Dr. David Wishart and others. One unique result of this ancestry is that many of PepTool's common bioinformatics algorithms have been modified and improved in a variety of ways. The interface has the feel of a Java applet, with each window having its own set of pull-down menus (see figures 1 and 2). This was a bit unfamiliar on a Macintosh, but was not difficult to get used to. Since the interface is identical on all platforms, users can easily move between computers to collaborate with colleagues. A program such as this is ready for future innovations in network computer technology.


Figure 2
PepTool uses its own similarity searching algorithm called FAST, which is derived from Dr. William Pearson's FASTA program. [1] This review was based on PepTool's demo version, which is fully functional but contains only a 20% subset of the full protein database. On a 150 MHz Power Macintosh computer, a Fast search of the protein database took less than one minute, so a search of the full database should still take well under 5 minutes. One problem with the Fast search algorithm is that it lacks a rigorous statistical measurement for the significance of the matches that it finds. It does produce a similarity score, but this is not directly comparable to the E-values produced by FASTA or BLAST. PepTool also has a version of the Needleman-Wunsch dynamic programming algorithm for rigorous similarity searching, which took about 12 minutes for the same search of the same database (approximately 1 hour for the full database), and found essentially the same set of similar sequences. It should be noted that the Needleman-Wunsch method looks for global similarities between sequences, rather than the local similarities found by the popular Smith-Waterman method. Global alignment algorithms are often not as effective as local alignment for highly diverged sequences, and they do not reflect the biological reality that two sequences may share only limited regions of a conserved sequence.

PepTool comes with its own compressed, nonredundant protein database derived from PIR and Swiss-Prot. The database only occupies about 52 Mb of hard drive space, as compared to 80 Mb for the OWL nonredundant protein database. The PepTool database search makes use of an additional simplification: all of the sequences in the database have been grouped into families based on homology, and a single sequence has been designated to represent each family. Fast similarity searches only compare the query sequence to these representative sequences. If a significant match is found, additional comparisons are made to the other sequences in that family. While this is a clever method of speeding up the similarity search process, it is not clear that it is sufficiently rigorous for routine scientific use.

BioTools intends to update this database quarterly as a free download from their Web site or for a small fee by CD-ROM. Although a quarterly update may not be frequent enough for some users, PepTool can also use protein databases in PIR or Swiss-Prot format. Users can download and install their own copies of the full databases if they are uncomfortable with BioTools' nonredundant one, or if they need to search for recently released sequences in a freshly updated database. Unfortunately, it is quite a challenge to successfully import a database, and it will take between 5 to 20 times longer to search PIR or Swiss-Prot than the custom BioTools database.

BioTools has also created their own protein motifs database, which is similar to the Prosite database and their own version of a motif pattern searching tool. The PepTool motif searching tool is not able to use custom motif databases supplied by the user, but it can search protein databases with single motifs typed in by the user.

Graphics


Figure 3

Figure 4

Figure 5
PepTool produces good graphics for many of its standard protein analysis functions such as hydrophobicity (see figure 3)and helical wheel (see figure 4). Once the graphic is generated by the program, drawing tools are provided to modify or annotate the image. There is also a catchall analysis routine called Protein Statistics that provides bar graphs of the amino acid frequency (actual and expected) and amino acid weight as well as a graph of pH/charge distribution over the length of a protein (see figure 5). A bewildering array of other protein statistics are also calculated, ranging from the "ratio of % hydrophilic to % hydrophobic amino acids" to the "nonpolar accessible surface area of folded proteins" in square angstroms. Comments on each statistic provide guidance as to their interpretation.

Multiple Sequence Alignments


Figure 6
PepTool also has tools both for creating and editing multiple alignments. The alignment algorithm is a progressive pairwise method called
XALIGN [2], which is very similar to the popular PILEUP program in GCG (Wisconsin Package for sequence analysis, Genetics Computer Group). It was able to align 50 sequences of about 100 amino acids in length in just a few seconds. The alignment editor is one of the most impressive parts of the program. It is attractive, intuitive, and provides many useful functions including highlighting of columns of identical amino acids (with an adjustable cutoff for percent identity), display of a consensus sequence, and coloring amino acid residues by their chemical properties (see figure 6). Gaps can be inserted anywhere in the aligned sequences, entire blocks of sequences can be shifted, and individual sequences added to existing alignments.

Protein Structure Tools


Figure 7
PepTool makes greater use of protein structure as a form of sequence annotation than any other bioinformatics program. It has an extensive set of protein structure tools that can predict alpha helices and beta sheets based on the Chou-Fassman and Garnier algorithms. A consensus structure is displayed that takes into account information from these methods (see
figure 7) in addition to information from prediction of hydrophobic moment and comparisons of a test sequence against databases compiled from published information and structural motifs derived from aligned sequences. Probable membrane spanning regions are also identified. The predicted structure can then be loaded back into the sequence editor or to the multiple alignment editor for comparison with other sequences. Once back in the sequence editor, the user is free to modify the predicted structural information. The ability to view aligned sequences that have been annotated (in color) with structural predictions can provide valuable insights into homologies.

Efficient Use of Resources

PepTool also includes a novel feature called network parallelism, which allows users on a local network. (e.g., a university or corporate Ethernet) to make use of "unused CPU capacity" of other computers on the network. This feature is available only for Unix and Windows 95 versions of the program. Since most of the computers on a network are not being used at full capacity at any given moment, there is tremendous potential to accelerate time-consuming tasks such as exhaustive (dynamic programming) similarity searches. This feature was not evaluated for this review.

The Bottom Line

The PepTool program is fast, very easy to use, and provides many analysis tools. It has the most comprehensive set of protein analysis functions of any desktop computer program and an interface that is much easier to use than any mainframe program. According to Scot Fortin, BioTools' manager of software development, "PepTool is part of our strategy to bridge the gap between in vitro laboratory molecular biology experiments and 'in silico' experiments in the computer." The cross-platform and network-savvy features make it an attractive option for institutional purchase so that users with different types of computers can access the same tools and easily share data.

While working with the program, this reviewer was alternately amazed and skeptical. BioTools has very much chosen to go their own way, rewriting their own versions of all of the major bioinformatics algorithms and creating their own protein sequence and motif databases. This produces a lean, fast program with excellent internal consistency and cross-platform compatibility, but it leaves the user a bit uncertain. Will the results achieved with a PepTools analysis stand up to scrutiny by other researchers that utilize "standard" tools such as GCG, FASTA, and CLUSTAL? The intuitive graphical interface and short learning curve certainly make PepTool an attractive entry point for researchers new to protein analysis. The multiple alignment editor is among the best available for any computer system, in a better interface than most mainframe programs. The jury is still out on whether experienced bioinformatics researchers will find that this program meets their analysis needs.

System Requirements

PepTool is available for Macintosh (Power PC only), Windows 95, Windows NT, Sun Solaris, and SGI IRIX. The manufacturers claim "universal platform compatibility" allowing simultaneously produced upgrades for all computing platforms and users accessing the same program on different computer hardware. On a Power Macintosh PepTool used over 19 Mb of RAM, and since there were often more than six windows open at one time, the work felt cramped on a 17-inch monitor. Performance was quite good with a 150 MHz processor, but PepTool would not be recommended on a PowerMac running at less than 80 MHz with a minimum of 24 Mb of RAM. For Windows 95/NT machines, a Pentium processor running at 166 MHz or greater is recommended.

Purchasing Information

PepTool is available for $1,490 ($1,341 for educational users). Discounts are available for multiple copies of the program or for joint purchases with GeneTool (the soon to be released DNA analysis companion to PepTool) or with Oligo (the PCR Primer design program). BioTools had made PepTool available from National Biosciences, Inc. (NBI), 3650 Annapolis Lane North, #140, Plymouth MN 55447-5434. NBI can be contacted by phone at (800) 747-4362, faxed at (800) 369-5118, or emailed. More information can be obtained from the NBI Web site or at the BioTools Web site. BioTools. may be reached at 420 Sun Life Place, 10123 99th Street, Edmonton, Alberta, Canada, T5J3H1; or by phone or fax at (403) 423-1133, or by email.

Stuart M. Brown is a research assistant professor at NYU Medical Center, where he is the bioinformatics consultant to the Research Computing Resource.

Tell us about your favorite software.

Previous Beagle Software Reviews
S-PLUS 4.0 for Windows: MathSoft, Inc.
reviewed by Dylan Bulseco (Posted January 30, 1998 ? Issue 24)
MacVector 6.0: Oxford Molecular Group
reviewed by Jose G. Teodoro (Posted December 19, 1997 ? Issue 22)
Molecular Images: Version 1.3 for Macintosh
reviewed by Steve Woods (Posted December 5, 1997 ? Issue 21)
Origin 5.0 for Windows: Microcal Software, Inc.
reviewed by Dylan Bulseco (Posted November 14, 1997 ? Issue 20)
KaleidaGraph 3.08: Synergy Software
reviewed by Marie-Claire Daou (Posted October 31, 1997 ? Issue 19)
ISIS/Draw: MDL Information Systems, Inc.
reviewed by Gil Alterovitz (Posted October 17, 1997 ? Issue 18)

more