Surprising 'ultra-conserved' regions discovered in human genome

Researchers comparing the human genome with the genomes of other species have discovered a surprising number of matching DNA sequences in a variety of vertebrate species, including the mouse, rat, dog, and chicken. The fact that these sequences have remained unchanged over long periods of evolutionary history indicates that they are biologically important, but for now their functions are largely a mystery.

Published May 6 by Science Express (the online edition of the journal Science), these findings are the joint work of Gill Bejerano, a postdoctoral researcher at the University of California, Santa Cruz; David Haussler, professor of biomolecular engineering at UCSC and a Howard Hughes Medical Institute investigator; UCSC research scientist W. James Kent; and a team of researchers from the University of Queensland in Australia.

By scanning the human, rat, and mouse genomes for matching regions of 200 or more DNA bases (As, Cs, Gs, and Ts), the researchers found 481 regions that were completely unchanged. All of the unchanged regions, referred to as "ultra-conserved elements," were also found in the dog and chicken genomes, and two-thirds of them were found in the fish genome. But they could not be traced beyond the fish to nonvertebrate species whose genomes have been sequenced, such as the sea squirt, fly, and worm.

"As far as we can tell, most of these ultra-conserved elements showed up during the evolution of vertebrates, perhaps during the period when land animals emerged, or a bit earlier. But their early evolutionary history is still mysterious," Haussler said.

Although they have been conserved meticulously through hundreds of millions of years of evolution, only a small fraction of these elements code for proteins. Protein coding, whereby DNA code directs the production of a specific protein, is how most genes carry out their functions. But fewer than a quarter of the ultra-conserved elements overlap coding regions of the human genome, and in most of those cases they overlap only a short span of the coding region and extend beyond it to noncoding areas.

Nevertheless, most of the 481 ultra-conserved elements appear to be associated in some way with genes, if not overlapping them then residing near genes or in the noncoding portions of genes. Furthermore, they tend to be associated with parts of the genome that are involved in regulating the expression of genes in various ways.

"These parts of the genome are far more conserved than we would have imagined. We think these segments evolved in the past, then froze into place and were inherited unchanged from then on," Bejerano said.

More than half of the ultra-conserved elements that overlap coding regions are associated with genes that take more than one form, depending on how they are transcribed to RNA. Through a process known as alternative splicing, different parts of a gene may be spliced out under different circumstances, so that a single gene can produce several different proteins. Bejerano thinks the association of ultra-conserved elements with alternatively spliced genes is significant.

"It's a cautious hypothesis that these elements may cause some type of interaction to determine what part of the gene will be spliced out," he said.

The ultra-conserved elements that do not overlap with any coding region tend to be found in regions of the genome that are associated with gene regulation, the transcription of DNA to RNA, or the binding of regulatory proteins to the DNA.

"There was some speculation among biologists as to whether we would find new kinds of things when we sequenced the complete genomes of animals, including our own genome, or if it would just be more of the same kinds of things that were already known. Well, it's not just more of the same," Haussler said.

The discovery of ultra-conserved elements in the human genome came from investigating the 5 percent of the genome known to be highly conserved between the human, mouse, and rat. Most of this highly conserved DNA--3.5 percent of the genome--is in noncoding regions and has no known purpose. Bejerano and his colleagues sought to find functional elements within that 3.5 percent.

"We began by looking for meaningful families within these conserved, noncoding elements," he said.

Some of them fell into clusters of at least two elements with a common genomic ancestor. These were most likely duplications within one genome that were retained and modified over time. In one such family, the different elements within the human genome were 80 to 90 percent similar, while each human element was 96 percent similar to the corresponding elements in the mouse and rat genomes.

Because 96 percent agreement between the similar elements in two species was surprisingly high, the researchers wondered if they could find other areas of such agreement. They did, and eventually looked for 100 percent agreement, which uncovered the 481 ultra-conserved segments that were the focus of this research.

Looking beyond the human, mouse, and rat genomes for earlier evidence of these ultra-conserved elements, the researchers found that 97 percent of them can be aligned with similar regions in the chicken with 95 percent agreement, even though only 4 percent of the human genome can be aligned with the chicken genome at all. Human and chicken lines are thought to have diverged about 300 million years ago. In the fugu fish, which diverged from the human line more than 400 million years ago, two-thirds of the ultra-conserved regions could be aligned with 77 percent similarity.

Because they were not able to trace the ultra-conserved segments to even more distant species, the authors speculate that these particular parts of the genome represent innovations in the genomes of chordate species that evolved rapidly at first, then became effectively frozen in birds and mammals.

"These ultra-conserved elements are long, they evolved rather rapidly, and they are now evolutionarily frozen. We don't know of a biomolecular mechanism that would explain them," Haussler said.

When the researchers compared individuals within the human population, they also found little variation in these ultra-conserved elements. To determine this, the team combed the conserved elements for genome variations called single nucleotide polymorphisms (SNPs), which are changes in individual genomes that are often used in genetic testing to distinguish one human from another. They found 20-fold fewer SNPs in the conserved regions than would have occurred if SNPs were randomly sprinkled throughout the genome.

Interestingly, the rate of change that has occurred in these regions when compared with other species also appears to be 20-fold less than expected, supporting the idea that the conserved elements evolve 20 times slower than does the genome as a whole.

The only other part of the genome having a level of conservation approaching that of these ultra-conserved elements is the DNA that codes for ribosomes and their actions in the cells. Ribosomes are complex molecular machines made of RNA and proteins. They translate the genetic code to carry out protein synthesis in all cells. According to Bejerano, ribosomal sequences are highly conserved because they are essential to all forms of life.

"Ribosomes are crucial. If anything goes wrong with them, the organism will not survive," he said.

The DNA sequences that code for ribosomal RNA contain long stretches of bases that are perfectly conserved throughout evolution. Unlike the ultra-conserved elements uncovered in this study, though, ribosomal RNA is ancient and is common to all species.

"The ultra-conserved elements are not nearly as old as the previously known conserved elements in the ribosomal RNA, which is truly ancient and fundamental to all branches of life," Haussler noted.

In addition to Bejerano, Haussler, and Kent, the coauthors on the new paper include University of Queensland researchers Michael Pheasant, Igor Makunin, Stuart Stephen, and John Mattick.

The work at UC Santa Cruz on ultra-conserved elements in the human genome was supported by the National Human Genome Research Institute (NHGRI), the National Cancer Institute, and the Howard Hughes Medical Institute.