|
by |
![]() |
| This article will appear in a forthcoming issue of Trends in Genetics. | |
Abstract
Recently, much has been made of the prospects (e.g. Ref. [1]) and pitfalls [2] of using single nucleotide polymorphisms (SNPs) to map complex diseases. This meeting provided the beginning of an answer to a number of burning questions in the field: are common diseases caused by a few common SNPs or by many rare alleles? Are genome scans or candidate gene studies more productive? Is linkage disequilibrium in the human genome large or small? Is it better to do association studies with population samples of unrelated patients, or pedigree studies of families with multiple affected individuals?
And the answer is: both (and neither). Based on the large amount of data presented at this meeting, the very idea that there would be one answer to these questions seems so . . . 20th century. The new data show that there is tremendous variation among diseases and among different regions of the human genome, so the answer to each question depends on where you look. And this is not bad news. This complexity reflects the fact that, for the first time, we have the technology for exposing the genomic complexity of human populations at high resolution. Lots of exciting new technology was presented at this meeting. However, in this brief report, I will focus on the problems that were exposed.
Weak Signals
Many speakers highlighted the difficulties inherent in association studies of complex diseases. Despite all the new technology, the statistical signal is often marginal. What do you do with a hit of LOD SCORE 2 (see Glossary) when the size of your study makes it likely you could get such a hit by random chance? Researchers find themselves struggling to boost the signal at every turn - to get more markers, more informative markers, more affected individuals, more pedigrees, more phenotypes, more personal data that can root out unwanted stratification, and so on.
For example, everyone is finding SNP detection rates in real populations to be well below the 95% validation rate reported for the public SNP dataset. Leaving aside technical failures (which range from 14-40% of SNPs depending on the assay, according to Leena Peltonen (UCLA, CA, USA), up to 20% of SNPs might be "private SNPs" found in only a small number of people (Ray Miller, Washington University School of Medicine, St Louis, MO, USA). Seventy-three percent were present at good frequency (20% or more people) in at least one of several ethnic groups, but focusing on Caucasians, that number can fall as low as 40-50% (Patricia Taillon-Miller, Washington University School of Medicine, St Louis, MO, USA). Many speakers commented that a noticeable percentage of public database SNPs violate HARDY-WEINBERG EQUILIBRIUM and are likely to be paralogous loci rather than genuine polymorphisms.
Almost all (99%) SNPs are not in genes. Moreover, many SNPs are strongly linked to others and thus add little independent information. Craig Venter (Celera Genomics, Rockville, MD, USA) suggested that a large fraction of the current SNP dataset will not be very useful, estimating that of the 2.3 million human SNPs, only 2,000 change an amino acid.
Differing arguments presented by Eric Lander (Whitehead Institute, MIT, Cambridge, MA, USA), Andrew Clark (Pennsylvania State University, PA, USA) and others demonstrate that, by choosing different assumptions, one can justify both the assertion that common diseases should be caused in many cases by common alleles, and the opposite conclusion. In the latter scenario, association studies will probably fail. In actuality, both cases are observed; for example in the Finnish population. Leena Peltonen cited a set of Finnish disease alleles that are common (present in a third of Finns), as opposed to phenylketonuria and cystic fibrosis (present in only 0.3-0.6% of Finns, despite being common in other populations). This debate appears to resolve into a detailed question of history: whether a given disease mutation occurred before or after a major bottleneck, such as the one hypothesized for the European founder population 50,000 years ago.
Haplotype Mapping
HAPLOTYPE linkage structure was another major focus of many studies. Recent estimates of LINKAGE DISEQUILIBRIUM (LD) of 30-60 kb in the human genome are 5-7 times higher than previous predictions, reflecting a major bottleneck in the human population, probably associated with its migration from Africa. And indeed, many speakers reported large LD - for example, 200 kb in lactase (Peltonen), 400 kb in CYP2D6 (Chun-Fang Xu, GlaxoSmithKline, Stevenage, Herts., UK), up to 1.9 Mb (Taillon-Miller) - mostly in the form of haplotype blocks. For example, for ten SNPs in b-fibrinogen, just seven haplotypes account for 95% of the population (François Cambien, INSERM, Paris, France).
These haplotype blocks are important for obtaining a large enough target for disease gene mapping by genome scans. In PPARg, two haplotype blocks were found using SNPs (19 kb, 55 kb); the former, consisting of seven SNPs in total LD with the Pro12Ala mutation, is associated with Type II diabetes (Lander). By contrast, in APOA1, which is associated with cardiovascular disease, two haplotype blocks of strong LD were observed, but neither was linked to the APOA1 mutation that has now been identified as being associated with disease risk (Cambien). Thus, it would have been extremely difficult to find this SNP by genome scanning. In this case, a candidate gene approach, re-sequencing selected genes in a patient population, was necessary to find the associated SNP.
At the opposite extreme, very large haplotype blocks can also be a problem. Lander presented data on mapping of inflammatory bowel disease susceptibility to a 250-kb region of SNPs in strong LD to each other. Although this large LD facilitated the initial mapping, it also means that most of these SNPs are just not informative for high-resolution mapping (and there are eight genes in this 250 kb).
Many speakers agreed that the next phase of research requires the construction of a haplotype map of the human genome. Among other things, accurate LOD score detection for mapping phenotypes with these markers depends on knowing the detailed haplotype structure in the population under study. In addition to a public effort, Venter said Celera will resequence every human gene in 40-50 individuals, giving many more SNPs and a haplotype map for each, that can be translated into an optimally condensed set of informative SNPs.
"Proximal Phenotypes" as Quantitative Trait Loci
Many speakers suggested that stronger signals might be obtainable from "proximal phenotypes;" for example, instead of studying asthma as a Yes/No clinical condition, study airway hyperreactivity (and additional parameters) as a QUANTITATIVE TRAIT. This synthesizes the focus on understanding the biology of a disease (e.g. the candidate gene approach) with mapping methods. First, this could help to dissect the multiple causes of a complex disease into observably distinct traits that can be mapped individually. A given clinical disease might be the conjunction of multiple proximal phenotypes - a certain combination leads to disease. Second, proximal phenotypes could increase signal by expanding the useful sample; for example, within pedigrees, many individuals who do not have clinical disease might show quantitative effects in proximal phenotypes that are quite informative for mapping them. Thus, the effective population size that is informative for mapping a proximal phenotype could be much larger than for mapping the disease itself.
Model Organisms
One of the most exciting syntheses of these ideas is the use of model organisms. By taking proximal phenotypes from the human disease to a mouse model, and using the genome SYNTENY to relate mouse results back to human genes, the full power of the diverse set of inbred mouse strains can be brought to bear on human disease. These strains solve the problems of whether disease results from rare versus common mutations (each mutation is amplified to 100% frequency in a given mouse strain), association versus pedigree studies (by doing crosses we can create any set of pedigrees we want), candidate gene approaches versus undirected mapping (treating proximal phenotypes as a QTL mapping problem synthesizes biological understanding of the disease with a mapping approach), and genetic versus environmental factors (both are totally controlled in the model organism experiments).
Gary Peltz (Roche Bioscience, Palo Alto, CA, USA) presented an exciting example of successfully mapping one cause of human asthma, by crossing mouse strains with high versus low responses to experimentally induced airway hyperreactivity, and using the high versus low responder extremes from the progeny to map the QTL. This showed that complement factor V expression controls airway hyperreactivity, by regulating monocyte IL-12 production. IL-12 inhibits asthma in experimental models. This work also illuminated the interesting role of C5 and IL-12 at the interface between innate and adaptive immune responses. Peltz reported that experimental automation can perform a full genome scan in one day. Such approaches with mouse and other model organisms seem to have enormous potential.
The Benefits of Data Sharing: "Four refutations = One confirmation"
Eric Lander described a striking example of the need for larger study samples. The PPARg locus was reported as being associated with Type II Diabetes in a Japanese population, but four subsequent studies found no significant association. However, when data from all four studies were combined (Lander), PPARg again emerged with a statistically significant LOD score, confirming the original study instead of refuting it. An independent study of 5,000 samples by Lander's group has also confirmed the association. The specific mutation (Pro12Ala) has been identified and appears to constitute about 25% of the disease risk.
Unless these four studies had agreed to make their data available for an analysis that ultimately reversed their conclusions, the result might have been missed. Even with free, open access to community genomics data, mapping of complex diseases is very difficult. Without such access, it could be impossible. As the Human Genome Project has demonstrated, this field needs universal archiving and sharing of raw data (i.e. chromatogram traces, not just processed sequences), mandated by funding agencies as a condition for funding, and by journals as a condition for publication.



SNPing in the Human Genome - reviews recent advances in the technologies for constructing high-density genetic maps and for high-throughput DNA typing of SNPs. From Current Opinion in Chemical Biology, 2001, 5:1:78-85. Full text available from BioMedNet.
SNP Analysis to Dissect Human Traits - highlights the results from some of the most significant studies. From Current Opinion in Neurobiology, 2001, 11:5:637-641. Full text available from BioMedNet.
High-throughput SNP Discovery and Typing for Genome-wide Genetic Analysis - examines some of the issues. From New technologies for life sciences: A Trends Guide, 2000, 2000:6:36-42. Full text available from BioMedNet.
Linkage Disequilibrium and the Mapping of Complex hHuman Traits - argues that a LD-based resource could be seriously compromised if important sampling and analytical factors are overlooked. From Trends in Genetics, 2002, 18:1:19-24. Full text available from BioMedNet.
Insights from Linked Single Nucleotide Polymorphisms: What We Can Learn from Linkage Disequilibrium - focuses on new insights that have been gained from using LD information. From Current Opinion in Genetics and Development, 2001, 11:6:647-651. Full text available from BioMedNet.
SNPs as Windows on Evolution, The Promise that Haplotypes Hold, Of SNPs and Smells, and SNP Potential - several recent articles from The Scientist.
The Human Genome - a guide to online resources, including a database of SNPs. From the National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health.
The SNP Consortium Ltd. - provides public information on SNPs located throughout the human genome.
Related HMS Beagle articles: