SOFTWARE REVIEW

 

GeneSpring 3.1

Reviewed by Joseph P. Silva


Software

Posted July 7, 2000 · Issue 82


Overall scores
Installation Excellent
Learning curve
(beginner who can Web surf and word process)
Above average
Technical support Excellent
Features Excellent
Customizability Very good
Utility to biologists Very good
Value for money Excellent

Overview

GeneSpring is a tool for viewing and analyzing gene-expression data. It has a rich feature set and takes some time to master, but it is well worth the investment for anyone trying to make sense of the simultaneous-expression profiles for thousands of genes. GeneSpring takes into account that an experiment may have multiple sets of measurements, with each set having one or more parameters (name or number) associated with the set: for instance, patient number and treatment type, or time and tissue type. These associations are used in the various display schemes available.

The program has numerous options for displaying the data: physical position, gene classifications, gene/experiment trees, graphs, scatter plots, and overlays (arbitrary mapping of genes to any 2-D image). Each displayed gene can be colored by expression level and confidence value, by a Venn diagram (unique color for each of three gene lists, with unique coloring for each of the four possible intersections), or by parameter (such as time and tissue type). Data sets that are part of a continuous set of readings - for instance, a time series - can be animated.

Data-analysis features include profile clustering into sets (k-means) or into trees, profile searches (for an existing or a hypothetical profile), regulatory-sequence analysis, and a general binary search using absolute or relative expression-level criteria (i.e., level greater than x in data set A, and level in set B fivefold greater than in set C, etc.). Search results can be saved as a gene set, which can be used to limit the genes displayed or used for further analysis.

Available platforms

Macintosh, Windows, Unix

System requirements

All systems
64 Mb RAM (128 Mb strongly recommended)
15 Mb free hard-disk space
1,024 x 768 resolution with 16 bit color
Java Virtual Machine software

Macintosh
Mac OS 7.6.1 or later; Power Macintosh or better

Windows
Windows 95/98/NT/2000, Pentium II or better

Test platform

Gateway GP7-500 500 Mhz Pentium III running Windows NT 4.0, 6 Gb hard disk, 256 Mb RAM, 19-inch monitor, 1024 x 768 resolution at 32 bits ("true color")

Price

Full version $20,000
Academic $5,000

How Long Did It Take to Learn to Use It Productively?

It took me about two hours to go through the tutorial, which was sufficient to be able to use the program once data was loaded. It took me an additional hour of reviewing the Data Loading manual to be able to load our Lynx MPSS data into the program; however, data from sources for which the program has already been configured may take less time. The Data Loading manual is very thorough, but loading data from a database is documented in a separate online manual that needs some work. Despite that, a little help from Silicon Genetics got me to the point where I could load data directly from our own custom database.

Product Quality

Ease of installation Excellent
User friendliness Very good
Interface Very good graphical interface, plus extensive use of text files
Intuitiveness of design Good

Customizability

See "Ability to Program in Scripts" below.

Ability to Program in Scripts, Add Extension Modules, etc.

No scripting is available. Extension modules are added via a clearly defined interface using text files to describe each extension. Data is passed as text.

Ability to Import and Export in Different File Formats

Imported files must be in one of several formats compatible with GeneSpring. The formats are clearly documented and are all based on simple text files. GeneSpring also supports direct database access via ODBC. This access requires a database description file that the user must create. Silicon Genetics can help with all of these import issues.

Useful or Unusual Features

Gene-expression experiments can have lots of data points. You can easily be looking at expression levels for thousands of genes over a set of one to ten distinct conditions. Indeed, for the MPSS data we have here at Lynx, we are getting expression data for tens of thousands of genes. There is no way one can consider these genes one at a time. You have to rely on graphs and statistical analysis, and separate the data into interesting subsets.

GeneSpring provides several useful ways to graph and analyze the data, the most obvious of which is to graph expression level versus experimental parameters. This graph shows the "profiles" for all of the genes. These profiles are the basis for all statistical analyses. You can draw a hypothetical profile, such as upregulated in one condition and downregulated in all the others, and then have the program search for genes with similar profiles. Alternatively, you could have the program search the profile for a gene of interest, such as one with a known function, and then have the program find genes whose expression profiles are similar, working on the assumption that coregulated genes are somehow related biologically.

Finally, there are two ways to have the program automatically "cluster" genes with similar expression profiles. The clustering can be either k-means (creates a simple set of clusters), or hierarchical (a dendrogram analysis, such as clusters of clusters). There are quite a few similarity metrics that can be used for searching and clustering: standard correlation, distance, smooth correlation, change correlation, upregulated correlation, Pearson correlation, Spearmann correlation, Spearmann confidence, and two-sided Spearmann confidence. I must confess, taking a statistics class is on my to-do list, so I can offer no advice on the suitability of these metrics.

A key method for making sense of large data sets is to filter them, that is, to select some interesting subset to display. In GeneSpring, filtering is done with "gene lists." A list can come from an analysis, such as the ones mentioned above, or it can come from public genome annotations that group genes into various classifications. For instance, the yeast-experiment data that comes with GeneSpring contains numerous classifications from the MIPS and PIR databases: "cell growth," "meiosis," "cytokinesis," "calcium transport," and "lipid metabolism."

One analysis function apparently unique to GeneSpring, and potentially very useful, is the "Find Regulatory Sequences" function. If the program is provided with the sequence information for a genome and given a list of genes of interest plus various settings, the program will search for likely regulatory sequences for those genes.

Another way to make sense of large data sets is with color coding. The primary coding scheme in GeneSpring is to map relative expression levels to colors: reddish colors for upregulated genes and bluish for downregulated genes. Also, the program can map a confidence value for each data point to the intensity of the coloring. In this case, high confidence readings have bright colors, while low confidence readings have dark colors, to the point where the colors can barely be seen on screen. There is also an option to color the data based on the classic Venn diagram. You can assign gene lists to each of the three sets in the diagram. Each gene that is in only one of the sets gets a color unique to that set. Genes that are in more than one set get a color unique to the particular intersection of, for example, A and B, B and C, A and C, or all three.

Last but not least, GeneSpring has what it calls an "Overlay" view. This is simply the ability for you to give it a 2-D graphic plus a list that maps genes to locations on the graphic. I think this tool has great potential. The yeast example includes a schematic diagram of the cell cycle for mitosis. The GeneSpring tutorial suggests that "this can be used to confirm and display hypotheses on such things as regulatory and metabolic pathways." I totally agree with this statement. We are all in the process of reverse engineering a very complicated system, with many interacting subsystems. Schematics are very useful in making sense of such complexity. With my background in software and hardware engineering, I find that schematics are essential to understanding complex systems.

Limitations

GeneSpring is specially targeted to gene-expression analysis and, as such, one might find it lacking in some of the data-analysis methods that a more general-purpose analysis tool would have.

Getting data into GeneSpring is a somewhat demanding task. The data has to be specifically formatted in a tab-delimited text-file format. Some translation tools for common file formats could be of use here to avoid the step of creating a tab-delimited text file. GeneSpring can read many common data formats (not file formats, but column and row formats) used with devices from vendors, such as Affymetrix, Incyte, and CLONTECH.

The program assumes that the expression data set is always to be normalized to some control, or to itself. But in the case of the MPSS data here at Lynx, we are dealing with absolute expression levels, which can typically vary over four orders of magnitude. I found it difficult to tell GeneSpring to leave the data as is (i.e., absolute values).

Also, I found that its mapping from expression level to color on screen does not work for a large value range. I tried the program option to interpret the data as base 10 log of the expression level, but even then the color scheme was not properly scaled to the data-value range.

It is useful to look at the fold change in absolute number of mRNA between experimental conditions; however, it is not necessarily the case that all of the measurements in an experiment are to be compared to those at one particular condition (i.e., the normalized control). Sometimes you need to look at fold changes between various combinations of data sets. For example, I was looking at a particular plant experiment where there were data sets for two parents and two children. In this case, none of these was a control, and depending on what I was looking for, I wanted to see the fold changes between any two of the four data sets. It was not obvious to me how to make these analyses. The vendor said that most users usually want to normalize the whole data set to a control and not use the absolute values, hence the difficulty in doing analysis without normalization to a single value.

I found the automated annotation updater to be very slow with a large genome with lots of unannotated genes. The program has a way for it to use a local copy of the public database for its updates, rather than going to the public site on the Internet. It was not readily apparent to me that I could ask the program to do this via a setting in the preferences.

The program's documentation seems incomplete and out-of-date. There is an "introductory" manual in PDF format that is very incomplete. There is also a 200-page Microsoft Word document that has "GeneSpring 3.0" on its title page. This manual appears to be the 2.0 manual with only the title page changed. These manuals do not cover all of the items in the program's user interface. Also, they do not explain the various similarity metrics for k-means and hierarchical clustering or profile searches. Also, I could find no explanation for what criteria are used for the "Interesting Genes" feature. Due to user demand, the version 3.0 software was released before the manual was ready. The manual has just been released, but I did not receive it before this review was completed.

Comparisons with Similar Software

SpotFire, with optional Array Explorer package, is comparable. SpotFire itself is a general-purpose scientific visualization tool. It is designed to be used with any set of measurements. The Array Explorer package adds features that are unique to gene-expression data. With the optional package, SpotFire provides expression-profile display and analysis, and the profile analyses are similar, although the set of similarity metrics is smaller and somewhat different than GeneSpring's: Euclidean distance, correlation, and "city block distance." One thing of note: SpotFire's documentation for these analysis functions is much better than that which is in the GeneSpring manuals.

I really cannot recommend one program over the other. Both GeneSpring and SpotFire offer powerful tools to understand gene-expression data. GeneSpring has been designed specifically for this purpose, whereas SpotFire is a general-purpose data-analysis tool with some features added that are unique to gene-expression data. I suspect that some researchers would prefer one to the other, and I myself find the genome-centric design of GeneSpring to be a plus. Spotfire seems mainly a visualization tool and may be more directly comparable to GeneSpring Lite. Unfortunately, I have not had time to explore what general-purpose tools SpotFire has. I recommend taking a look at both programs before you make a decision between the two.

Technical Support and Documentation

Documentation is provided via PDF files. The tutorial is very helpful. The GeneSpring Web site offers a FAQ list and email discussion list.

Commercial licenses include: one day of installation and training, although travel expenses outside the San Francisco Bay area are extra; two days of file-format customization; unlimited email support for one year; and phone support (the lesser of two hours or 10 calls per month) for one year.

Target Users

Anyone who needs to measure gene-expression levels for many genes at once will find this tool invaluable. Since data is input as a text file, one can transform data from many different sources into a format that GeneSpring can use for analysis.


Publisher information

ilicon Genetics
2601 Spring Street
Redwood City, CA 94063

Tel: (650) 591-4459
Fax: (650) 591-5574

Web site: www.sigenetics.com

Pricing structure

Prices are per copy.

Full version: $20,000 professional/$5,000 academic, with volume discounts available. Includes one year of upgrades and support. Additional years of upgrades and support are $995/yr.

Lite version: $1,995 professional/$995 academic. Online purchasing available.

Lite version includes line, scatter, and bar graphs; array layouts; k-means clustering; "comprehensive" normalization options; "Find Similar Genes" function; and links to annotation.

The Lite version does not contain:

  • Hierarchical clustering (gene/experiment trees)
  • Overlay view
  • Venn diagram coloring
  • Update genes from GenBank or LocusLink
  • Multiexperiment correlation of expression profiles
  • "Find Interesting Gene" function
  • "Find Regulatory Sequences" function

A representative from Silicon Genetics said "[The Lite version] is primarily a visualization tool. Less analysis. GeneSpring Lite is a companion tool for a research assistant to do preliminary analysis."

Software class

Data analysis and visualization


Joseph P. Silva has been working in the bioinformatics group at Lynx Therapeutics for the last year.


Want to see a review of particular software? Send a suggestion.

Previous Beagle Software Reviews

StatsDirect 1.615
Reviewed by Virginia Fitzpatrick (Posted June 23, 2000 · Issue 81)
Digital Frog 2
Reviewed by Susan Chacko (Posted June 9, 2000 · Issue 80)
KaleidaGraph 3.5
Reviewed by George W. Chacko (Posted May 26, 2000 · Issue 79)
NIH Image 1.62
Reviewed by Charlie Schick (Posted May 12, 2000 · Issue 78)
GraphicConverter 3.8
Reviewed by Charlie Schick (Posted April 28, 2000 · Issue 77)
CodeWarrior Release 5
Reviewed by Douglas Bowman (Posted April 14, 2000 · Issue 76)

more