Ordering
the Genome

The Wellcome Trust Genome Campus

by Georgina Ferry

(Posted October 31, 1997 ?&nbspIssue 19; archived November 14, 1997)

Abstract

At the Wellcome Trust Genome Campus, three institutes cooperatively pursue genome research. Supported by yet funded beyond the resources of government, the campus is a world leader in molecular biology, though its commitment to placing DNA sequence information in the public domain has aroused the ire of commercial researchers. The campus plans to grow in size and scope.


Where do you go to find the greatest expertise in gene sequencing and bioinformatics in Europe, possibly in the world? To a small village called Hinxton in the English countryside not far from Cambridge, home to a gleaming new complex of state-of-the-art laboratories, the Wellcome Trust Genome Campus.

The campus consists of three independent but closely linked institutes. First on the site was the Sanger Centre, established jointly by the Wellcome Trust and the UK Medical Research Council to carry out automated high-throughput genome sequencing and associated research. The center first opened in 1993, and in 1995 committed itself to sequencing at least 500 million bases - one-sixth of the entire human genome - by the year 2002. Its foundation's impetus came from the success of methods developed by John Sulston and his colleagues at the MRC Laboratory of Molecular Biology in Cambridge to sequence the genome of the nematode worm.

"People were saying that for the sake of the Human Genome Project, the nematode work had to be capitalized on," says Sulston, now the director of the Sanger Centre. "The Wellcome Trust fortuitously had come into the money at the same time and it suited them to have a big project." The trust's income vastly increased in 1992 through the sale of shares in the pharmaceutical company Wellcome PLC. The trust has since become the largest source of grants for biomedical research in the UK. But never before has it invested on such a scale in a single project. It is funding the Sanger Centre's work to the tune of #10-12 million ($16-20 million) per year for the next five years. It has also secured the center's future by buying land at Hinxton and sharing with the MRC the construction cost of the spacious modern laboratories of the campus. The trust's purchase included Hinxton Hall, a gracious eighteenth-century country house, which has been restored as a conference center. The new development was officially opened on October 8 by Princess Anne.

The Wellcome Trust's determination to support genome research came at just the right time for the other two partners on the site. European Molecular Biology Laboratory in Heidelberg wanted to spin out its DNA data library as a separate institute. With the trust's backing, the United Kingdom government successfully bid to house the new European Bioinformatics Institute at Hinxton. Graham Cameron and his team moved there in 1994. "I think it was a great decision," he says, "because the context is ideal for it. We're far nearer to sequencing the genome than we are to understanding the genome - so as well as managing databases of gene sequences, protein sequences, and protein structure, we have some theoretically and computationally oriented research biologists here. They're looking into sequence-structure relationships, gene finding systems, molecular evolution, and so on."

The third partner is the UK MRC Human Genome Mapping Project Resource Centre, funded by the Medical Research Council to support genome researchers. It acts as a central source of DNA libraries - sets of DNA fragments reproduced in yeasts or bacteria, ready for screening to find the location of a genetic marker - and other biological material needed by researchers, as well as computing and training services. "The important thing is that all three institutes do really quite different things," says the Resource Centre's director Keith Gibson. "They complement each other incredibly well."

Two principles underpin the foundation and development of the campus. The first is to maintain the world-leading tradition established in Cambridge at the Laboratory of Molecular Biology. There James Watson and Francis Crick discovered the structure of DNA. John Kendrew and Max Perutz solved the three-dimensional structure of a protein for the first time. Fred Sanger was the first to sequence a protein, and then went on to develop a method of reading DNA sequences, essentially the same as that in use today. Not surprisingly, the founders of the Sanger Centre chose to honor this double Nobel laureate by giving it his name.

John Sulston is in no doubt that if British molecular biology had had to rely solely on limited government funding through the MRC, this momentum could have been lost. "It's quite clear that we would not have been able to do this at all," he says. "The interesting thing is that we're not just maintaining a foothold internationally, we're taking a lead. The existence of the Sanger Centre clearly has influenced events in America, made them go at a faster pace and maybe changed their direction slightly." It is the style of work that Sulston pioneered in collaboration with Bob Waterston at Washington University in St. Louis that has been so influential. Rather than looking for new technologies to speed up genome sequencing, they scaled up their use of the existing technology, the automated sequencers supplied by Applied Biosystems, and increased its efficiency. Seeing the speed at which Sulston and Waterston were approaching the complete nematode genome was one of the factors convincing the international biomedical community that a complete human sequence was achievable in a realistic time frame.

The second principle underlying the operation of the campus is that DNA sequence information should be in the public domain. New sequences from the Sanger Centre are placed on the European Bioinformatics Institute database as soon as they are confirmed. "We make the information available to people throughout the world, effectively free of charge," Cameron says. "Nowadays that means providing Web access. The Web site that we run is seeing about 100,000 hits a day. Meanwhile, the amount of data in the sequence database is doubling somewhere between every one and two years." The Sanger Centre is the biggest single provider of public domain sequence information in the world, but EBI collects from a wide range of other groups throughout Europe and includes data on around 20,000 other species apart from humans. It also has agreements with GenBank in the United States and the DNA Data Bank of Japan to exchange data freely.

John Sulston is fully committed to this philosophy of open access, an attitude that has brought him into conflict with the commercial world. Many biotechnology companies argue that they must patent sequence information in order to protect the investment they make in developing new products. "What people should own are inventions and developments that are useful in medicine that come out of the genome sequence, but they should not own the genome itself," Sulston argues. He and Waterston proved their point by sequencing the region that includes BRCA2, a gene for familial breast cancer, and then publishing the data openly. "We followed our principles, and we got both castigated and applauded for doing so, but it was a good opportunity to make a statement about how things ought to be," he recalls.

Sulston is sanguine about the long-term resolution of such conflicts. He argues that once the whole genome sequence is available, a target that should be reached within the next ten years, the patent question will cease to be an issue. Meanwhile all three of the Genome Campus institutes are happy to work with commercial companies as long as the public availability of their own data is not compromised. The Wellcome Trust now hopes to foster further links by establishing a "biopark" on the Hinxton site. "I think that's a very good thing," says Sulston. "It will increase the critical mass here to the benefit of the UK, so Hinxton will become really an epicenter of progress not only in sequencing, but exploitation as well."

If the human genome sequence is completed by 2005, what will there be left for the Hinxton community to do? Its leaders have already moved on to thinking about the next stage. "Many of the things that biologists call databases kind of happened by accident," says EBI's Graham Cameron. "They were never really designed as databases - people just realized that they had a lot of information kicking around and they'd better do something about it. If I were to be critical of the biologists I would say there is still a big emphasis on capturing today's data, instead of looking a little bit further into the future and saying, 'Well, we've got the genome, what are we going to do now?'." Much of the work of his own research group is directed towards using the vast amount of data held by the EBI to answer the questions of the future.

"When the sequencing is finished, that's just the beginning," agrees the Resource Centre's Keith Gibson. "The real work then starts in terms of finding out what these genes do. An organization such as the Resource Centre will evolve over the next decade; we'll need different sorts of tools, different resources that can be made available to the community to look at these functional questions. So we've got a job for a long time!"

Gesturing toward a pile of files containing problems in the nematode sequence still waiting to be solved, John Sulston admits that for him the sequence remains an exciting challenge, an end in itself. "The key reason for doing this is the investment for the future - it's the archiving of this sequence which after all is permanent, it's going to be important as long as there are people in the universe, and it will always be the reference point for doing biology."

Georgina Ferry is a scientific journalist based in Oxford, England.
Andrzej Krauze is an illustrator, poster maker, cartoonist, and painter who illustrates regularly for HMS Beagle, The Guardian, The Sunday Telegraph, Bookseller, and New Statesman.

Send us your comments and ideas for future articles.

Endlinks

Web sites mentioned in this column:

Sources of Bio-Information - a list of over 800 bioinformatics tools and resources available on the Web. Maintained by the National Human Genome Research Institute.

The C. elegans Genome Project - includes the latest sequence information from the C. elegans project and a link to ACEDB (a C. elegans database).

Human Genome Project Resources - includes information on human sequencing projects and resources as well as links to databases and sequencing projects of other organisms.

Genetic Sites - this well-organized listing of genetics-related newsgroups, mailing lists, databases, and other resources is accessible from the European site of HUM-MOLGEN, an international communication forum in human genetics.

Recent Profile Articles
Cancer Gets the Red Light: Pharmacyclics, Inc.
by William Wells (Posted October 17, 1997 ? Issue 18)
Climb Every Mountain: Lee Hood's New Quest in
Biotechnology Innovations
by Jim Kling (Posted October 3, 1997 ? Issue 17)
What the Hedgehog Knows: Ontogeny, Inc.
by William Wells (Posted September 19, 1997 ? Issue 16)
Subverting the Cell Cycle: Mitotix, Inc.
by William Wells (Posted September 5, 1997 ? Issue 15)
Between What We Know and What We Do :
The Cochrane Collaboration
by Georgina Ferry (Posted August 15, 1997 ? Issue 14)
Shine a Light :Aurora Biosciences Corporation
by William Wells (Posted July 25, 1997 ? Issue 13)

more