A new nanopore technology for direct sequencing of long strands of DNA has resulted in the most complete human genome ever assembled with a single technology.
The research, published January 29 in Nature Biotechnology, involved scientists at UC Santa Cruz; the U.S. National Human Genome Research Institute (NHGRI); the University of Nottingham, University of Birmingham, and University of East Anglia in the U.K.; and the University of Utah, University of British Columbia, and the Ontario Institute for Cancer Research in Toronto.
Using a pocket-sized, portable DNA sequencer based on nanopore sequencing technology pioneered at UC Santa Cruz, the scientists sequenced a complete human genome, in fragments hundreds of times larger than usual, enabling new biological insights. This included detecting structural variants and epigenetic modifications in the genome, as well as closing 12 gaps in the human reference genome, thereby improving its accuracy.
"The ability to get long reads is one of the strengths of this technology, and as a result this is the most contiguous human genome assembly ever done," said co-first author Miten Jain, a postdoctoral researcher in biomolecular engineering at UC Santa Cruz. Jain is one of seven lead authors who contributed equally to the paper, three of whom are at the UC Santa Cruz Genomics Institute.
Ultra-long sequences
The researchers generated a new method for sequencing “ultra-long” sequences of DNA, more than a thousand times longer than the original reads used to generate the human genome reference sequence in 2001. Recently the authors have used this method to generate the longest read ever sequenced at 1,204,840 bases in length, 8,000 times longer than a typical sequencing read.
This drastically reduces the complexity of piecing together the genome compared to previous techniques. The authors speculate that these reads and longer ones can be generated routinely in the future, enabling nanopore sequencing of human genomes as complete as the reference genome, which involved over 20 years of work and more than $2 billion of funding.
"About 8 percent of the human genome has yet to be assembled, mostly in long, complex regions with lots of repetitive DNA sequences. With repetitive sequences, if you only have short reads it's hard to piece them together, and you can't tell how much there is," Jain explained. "If you can cover that region with a few long reads, however, it's a much easier puzzle to solve."
As well as sequencing previously uncharacterized regions of the genome, the new analysis provided greater insight into regions of the genome that are responsible for functions such as immunity and tumor growth. This in turn may have a profound impact on clinical practice by, for example, enabling detection of large genome rearrangements important in the development of cancer or determining a person’s inherited repertoire of antibody genes.
Personal genomes
The ability to sequence using a portable device that only costs $1,000 may also put personalized genome sequencing into the mainstream.
“This is a landmark for genomics. The long reads that are possible with nanopore sequencing will provide us with a much clearer picture of the overall structure and organization of the genome than ever before,” said corresponding author Matt Loose of the University of Nottingham.
“If you imagine the process of assembling a genome together is like piecing together a jigsaw puzzle, the ability to produce extremely long sequencing reads is like finding very large pieces of the puzzle, which makes the process far less complex,” added Nick Loman of University of Birmingham, also a corresponding author.
The study uncovered new information about the major histocompatibility complex, a region of the genome that plays a critical role in the immune system and is used for tissue typing before a transplant. This area is particularly difficult to analyze as it contains many duplicated regions, including gene families and repeated sequences. No two individuals, apart from identical twins, have the same sequence in this region of their genome.
The researchers also used ultra-long sequences generated in the project to determine the lengths of individual telomeres for the first time directly from the sequenced data. Telomeres are the caps at the ends of each DNA strand which protect the chromosomes and play an important role in how cells age. The older a cell, the shorter the telomeres, and disruptions in the pattern of a telomere's DNA is a significant issue found in many tumors. These regions are also highly repetitive, often appearing identical, making them hard to study.
International effort
The international research effort used the Oxford Nanopore Technologies MinION sequencer. The sequencer, approximately the size of a mobile phone, sequences the DNA by detecting the change in current flow as single molecules of DNA pass through a tiny hole (a "nanopore") in a membrane.
“We hope that a pocket-size sequencer is going to give us the ability to bring sequencing much closer to the patient,” Loman said. “At the moment sequencing is quite laborious and occurs in expensively equipped laboratories, but in the future we can imagine sequencing using pocket-size devices in GP surgeries, in clinics, and even in people’s own homes. The ability to sequence and assemble even very large complex genomes may have value one day in diagnostics and monitoring the evolution of diseases such as cancer and a wide range of infections.”
In addition to Jain, coauthors of the paper affiliated with the UC Santa Cruz Genomics Institute include Karen Miga, Arthur Rand, Ian Fiddes, Hugh Olsen, and Benedict Paten. Loman and Loose led the international consortium. This work was supported in part by the U.S. National Institutes of Health, the U.K. Biotechnology and Biological Sciences Research Council, and the Canadian Institutes of Health Research.