Abstract
At the Wellcome Trust Genome Campus, three institutes cooperatively pursue genome research. Supported by yet funded beyond the resources of government, the campus is a world leader in molecular biology, though its commitment to placing DNA sequence information in the public domain has aroused the ire of commercial researchers. The campus plans to grow in size and scope.
Where do you go to find the greatest expertise in gene sequencing and bioinformatics in Europe, possibly in the world? To a small village called Hinxton in the English countryside not far from Cambridge, home to a gleaming new complex of state-of-the-art laboratories, the Wellcome Trust Genome Campus.
The campus consists of three independent but closely linked institutes.
First on the site was the Sanger
Centre, established jointly by the Wellcome
Trust and the UK Medical
Research Council to carry out automated high-throughput genome
sequencing and associated research. The center first opened in 1993, and in
1995 committed itself to sequencing at least 500 million bases - one-sixth
of the entire human genome - by the year 2002. Its foundation's impetus came
from the success of methods developed by John Sulston and his colleagues at
the MRC Laboratory of Molecular
Biology in Cambridge to sequence the genome of the nematode worm.
"People were saying that for the sake of the Human Genome Project, the nematode work had to be capitalized on," says Sulston, now the director of the Sanger Centre. "The Wellcome Trust fortuitously had come into the money at the same time and it suited them to have a big project." The trust's income vastly increased in 1992 through the sale of shares in the pharmaceutical company Wellcome PLC. The trust has since become the largest source of grants for biomedical research in the UK. But never before has it invested on such a scale in a single project. It is funding the Sanger Centre's work to the tune of #10-12 million ($16-20 million) per year for the next five years. It has also secured the center's future by buying land at Hinxton and sharing with the MRC the construction cost of the spacious modern laboratories of the campus. The trust's purchase included Hinxton Hall, a gracious eighteenth-century country house, which has been restored as a conference center. The new development was officially opened on October 8 by Princess Anne.
The Wellcome Trust's determination to support genome research came at just
the right time for the other two partners on the site. European Molecular Biology
Laboratory in Heidelberg wanted to spin out its DNA data library as a
separate institute. With the trust's backing, the United Kingdom government
successfully bid to house the new European Bioinformatics
Institute at Hinxton. Graham
Cameron and his team moved
there in 1994. "I think it was a great decision," he says,
"because the context is ideal for it. We're far nearer to sequencing
the genome than we are to understanding the genome - so as well as managing
databases of gene sequences, protein sequences, and protein structure, we
have some theoretically and computationally oriented research biologists
here. They're looking into sequence-structure relationships, gene finding
systems, molecular evolution, and so on."
The third partner is the UK MRC Human Genome Mapping Project Resource Centre, funded by the Medical Research Council to support genome researchers. It acts as a central source of DNA libraries - sets of DNA fragments reproduced in yeasts or bacteria, ready for screening to find the location of a genetic marker - and other biological material needed by researchers, as well as computing and training services. "The important thing is that all three institutes do really quite different things," says the Resource Centre's director Keith Gibson. "They complement each other incredibly well."
Two principles underpin the foundation and development of the campus. The
first is to maintain the world-leading tradition established in Cambridge
at the Laboratory of Molecular Biology. There James Watson and Francis
Crick discovered the structure of DNA. John Kendrew and Max Perutz solved
the three-dimensional structure of a protein for the first time. Fred
Sanger was the first to sequence a protein, and then went on to develop a
method of reading DNA sequences, essentially the same as that in use today.
Not surprisingly, the founders of the Sanger Centre chose to honor this
double Nobel laureate by giving it his name.
John Sulston is in no doubt that if British molecular biology had had to rely solely on limited government funding through the MRC, this momentum could have been lost. "It's quite clear that we would not have been able to do this at all," he says. "The interesting thing is that we're not just maintaining a foothold internationally, we're taking a lead. The existence of the Sanger Centre clearly has influenced events in America, made them go at a faster pace and maybe changed their direction slightly." It is the style of work that Sulston pioneered in collaboration with Bob Waterston at Washington University in St. Louis that has been so influential. Rather than looking for new technologies to speed up genome sequencing, they scaled up their use of the existing technology, the automated sequencers supplied by Applied Biosystems, and increased its efficiency. Seeing the speed at which Sulston and Waterston were approaching the complete nematode genome was one of the factors convincing the international biomedical community that a complete human sequence was achievable in a realistic time frame.
The second principle underlying the operation of the campus is that DNA
sequence information should be in the public domain. New sequences from the
Sanger Centre are placed on the European Bioinformatics Institute database
as soon as they are confirmed. "We make the information available to
people throughout the world, effectively free of charge," Cameron
says. "Nowadays that means providing Web access. The Web site that we
run is seeing about 100,000 hits a day. Meanwhile, the amount of data in
the sequence database is doubling somewhere between every one and two
years." The Sanger Centre is the biggest single provider of public
domain sequence information in the world, but EBI collects from a wide
range of other groups throughout Europe and includes data on around 20,000
other species apart from humans. It also has agreements with GenBank in
the United States and the DNA Data
Bank of Japan to exchange data freely.
John Sulston is fully committed to this philosophy of open access, an attitude that has brought him into conflict with the commercial world. Many biotechnology companies argue that they must patent sequence information in order to protect the investment they make in developing new products. "What people should own are inventions and developments that are useful in medicine that come out of the genome sequence, but they should not own the genome itself," Sulston argues. He and Waterston proved their point by sequencing the region that includes BRCA2, a gene for familial breast cancer, and then publishing the data openly. "We followed our principles, and we got both castigated and applauded for doing so, but it was a good opportunity to make a statement about how things ought to be," he recalls.
Sulston is sanguine about the long-term resolution of such conflicts. He
argues that once the whole genome sequence is available, a target that
should be reached within the next ten years, the patent question will cease
to be an issue. Meanwhile all three of the Genome Campus institutes are
happy to work with commercial companies as long as the public availability
of their own data is not compromised. The Wellcome Trust now hopes to
foster further links by establishing a "biopark" on the Hinxton
site. "I think that's a very good thing," says Sulston. "It
will increase the critical mass here to the benefit of the UK, so Hinxton
will become really an epicenter of progress not only in sequencing, but
exploitation as well."
If the human genome sequence is completed by 2005, what will there be left for the Hinxton community to do? Its leaders have already moved on to thinking about the next stage. "Many of the things that biologists call databases kind of happened by accident," says EBI's Graham Cameron. "They were never really designed as databases - people just realized that they had a lot of information kicking around and they'd better do something about it. If I were to be critical of the biologists I would say there is still a big emphasis on capturing today's data, instead of looking a little bit further into the future and saying, 'Well, we've got the genome, what are we going to do now?'." Much of the work of his own research group is directed towards using the vast amount of data held by the EBI to answer the questions of the future.
"When the sequencing is finished, that's just the beginning,"
agrees the Resource Centre's Keith Gibson. "The real work then starts
in terms of finding out what these genes do. An organization such as the
Resource Centre will evolve over the next decade; we'll need different
sorts of tools, different resources that can be made available to the
community to look at these functional questions. So we've got a job for a
long time!"
Gesturing toward a pile of files containing problems in the nematode sequence still waiting to be solved, John Sulston admits that for him the sequence remains an exciting challenge, an end in itself. "The key reason for doing this is the investment for the future - it's the archiving of this sequence which after all is permanent, it's going to be important as long as there are people in the universe, and it will always be the reference point for doing biology."
Georgina Ferry is a scientific journalist based in Oxford, England.
Andrzej Krauze is an illustrator, poster maker, cartoonist, and painter who illustrates regularly for HMS Beagle, The Guardian, The Sunday Telegraph, Bookseller, and New Statesman.


Endlinks
Web sites mentioned in this column:
Sources of Bio-Information - a list of over 800 bioinformatics tools and resources available on the Web. Maintained by the National Human Genome Research Institute.
The C. elegans Genome Project - includes the latest sequence information from the C. elegans project and a link to ACEDB (a C. elegans database).
Human Genome Project Resources - includes information on human sequencing projects and resources as well as links to databases and sequencing projects of other organisms.
Genetic Sites - this well-organized listing of genetics-related newsgroups, mailing lists, databases, and other resources is accessible from the European site of HUM-MOLGEN, an international communication forum in human genetics.