NIH funds new centers to expand and diversify the human reference genome

The UC Santa Cruz Genomics Institute will play a leading role in the ambitious new Human Pangenome Reference Sequence Project

David Haussler
David Haussler directs the UC Santa Cruz Genomics Institute.
Karen Miga
Karen Miga directs the Data Production Center at UCSC for the Human Pangenome Project and is the co-lead of the Telomere-to-Telomere consortium (T2T). Her career has focused on closing the largest, most repetitive gaps that remain in human reference genomes.
Benedict Paten
Benedict Paten directs the UCSC Computational Genomics Lab and is a co-leader of the Human Pangenome Reference Center.

New grants from the National Institutes of Health (NIH) totaling approximately $29.5 million will enable scientists at the University of California, Santa Cruz, and other collaborating institutions to generate and maintain a completely new and comprehensive reference sequence of the human genome that represents human genetic diversity.

The first human genome sequence, produced by the international Human Genome Project in 2000, was a landmark achievement that gave rise to the burgeoning field of genomic medicine. Improved and annotated over the years, that genome sequence (based mostly on one person's genome) has been an essential reference for making sense of new genomic data. But the current reference genome is still an incomplete sequence and woefully inadequate as a representation of human diversity and genetic variation.

The new project will address those shortcomings by creating a new “human pangenome reference” based on the complete genome sequences of 350 individuals. The project will be carried out by two new centers funded by the National Human Genome Research Institute (NHGRI), part of the NIH.

"One human genome cannot represent all of humanity. The human pangenome reference will be a key step forward for biomedical research and personalized medicine. Not only will we have 350 genomes representing human diversity, they will be vastly higher quality than previous genome sequences," said David Haussler, professor of biomolecular engineering at UC Santa Cruz and director of the UC Santa Cruz Genomics Institute.

"It has grown more and more important to have a high-quality, highly usable human genome reference sequence that represents the diversity of human populations," said Adam Felsenfeld, NHGRI program director in the Division of Genome Sciences. "The proposed improvements will serve the growing basic and clinical genomics research communities by helping them interpret both research and patient genome sequences."

The two centers—a sequencing center and a reference center—are funded by separate grants. NHGRI has awarded approximately $3.5 million per year over a five-year period to UC Santa Cruz, with major collaborators including the University of Washington in Seattle, Washington University in St. Louis, and Rockefeller University in New York, to form the Human Pangenome Sequencing Center, which will aim to sequence up to 350 diverse human genomes using state-of-the-art technologies to incorporate high-quality sequences that are more broadly representative.

NHGRI also awarded $2.5 million per year for five years to three institutions—Washington University in St. Louis (WashU), UC Santa Cruz (UCSC), and the European Bioinformatics Institute (EBI), which will coordinate with the National Center for Biotechnology Information—to form the WashU-UCSC-EBI Human Pangenome Reference Center. The center will provide a next-generation reference sequence of the human genome as a resource for the scientific community.

Human Pangenome Project

Together, these grants initiate a new Human Pangenome Reference Sequence Project, or "Human Pangenome Project."

Haussler said the project will push the capabilities of current DNA sequencing technology well beyond what has been attempted before. Directing the sequencing center will be Karen Miga, a research scientist at UCSC who co-led a team with Adam Phillippy (NHGRI) that recently achieved the first complete sequence of a human chromosome. This "telomere-to-telomere" assembly (telomeres are the very tips of the chromosomes) of a complete human X chromosome, posted on the bioRxiv server in August, shows that complete chromosome sequences with no gaps are possible using current technologies.

Nanopore sequencing, a sequencing technology pioneered at UC Santa Cruz and commercialized by Oxford Nanopore, is one of several advanced technologies enabling more complete and accurate genome sequencing that will be employed in this project. Oxford Nanopore and other leading DNA sequencing companies (Pacific Biosciences and Illumina) will be supporting the center as contributing partners, each bringing uniquely powerful technologies.

"We are going to use all of the latest and best sequencing technologies and push their capabilities to get the most complete and accurate sequences possible," Haussler said.

The other big challenge for the project will be creating a useful reference product from 350 individual genomes. "On their own, 350 separate genomes are not very useful as a reference," explained Benedict Paten, assistant professor of biomolecular engineering at UCSC and a co-leader of the Human Pangenome Reference Center.

Paten and others have been working for several years to develop a new way to represent genomes that can serve as a comprehensive reference map of human genetic variation. The idea is to weave together a collection of genome sequences into a map that shows where they are the same and where they are different. This representation must also enable computational strategies for comparing any new genome sequence to the reference map.

"Our job is to build that representation and make it easy to use," Paten said. "It's a huge challenge to make it usable and computable. But it's very important, because if you rely on one reference genome for all comparisons, you are blind to many types of variations in human genomes."

Longstanding collaborations

The new centers will build on longstanding collaborations between the scientists at the participating institutions. The lead investigators for the Human Pangenome Sequencing Center include Evan Eichler at the University of Washington in Seattle, Ira Hall at Washington University in St. Louis, and Erich Jarvis at Rockefeller University. The UCSC participants in the genome sequencing, in addition to principal investigator David Haussler, include Karen Miga, Benedict Paten, Ed Green, and Mark Akeson.

The lead investigators for the Human Pangenome Reference Center include Ting Wang and Ira Hall at Washington University in St. Louis, Benedict Paten at UCSC, and Paul Flicek at the European Bioinformatics Institute. Other participants in the project include Richard Durbin at Cambridge University, Gene Myers at Max Plank Institute of Molecular Cell Biology and Genetics in Germany, Kirsten Howe at the Wellcome Sanger Institute in England, Adam Phillippy at NHGRI, Heng Li at the Broad Institute in Boston, Eimear Kenny at the Mount Sinai BioMe Biobank, Alissa Resch at the Coriell Institute for Medical Research, and Paolo Carnevali at the Chan-Zuckerberg Initiative in Palo Alto.

"We have assembled a 'dream team' of the top scientists in genome assembly, sequencing, and comparison. It's a very ambitious project, but we have the expertise and experience to do this," Haussler said.