UCSC awarded $5 million grant for genome research

The ENCODE project is building a parts list of biologically functional elements in the human genome.

The National Human Genome Research Institute (NHGRI) today announced a $5 million grant to the University of California, Santa Cruz, as part of a four-year project to build a "parts list" of biologically functional elements in the human genome. Under the grant, a team led by James Kent, associate research scientist at UCSC's Center for Biomolecular Science and Engineering (CBSE), will establish a Data Coordination Center for the ENCyclopedia Of DNA Elements (ENCODE) project.

In its pilot phase, ENCODE yielded provocative new insights into the organization and function of the human genome (see earlier story). Kent's group handled data coordination efforts during the pilot phase, which focused on just 1 percent of the genome. Now, NHGRI is scaling up the ENCODE project to survey the entire genome, awarding more than $80 million in grants to support this next phase.

"The data will be coming from about a dozen different labs, and they'll be generating data on a pretty large scale," Kent said. "Our role is to be good librarians for the data--somewhere between a resource library and Google. So we'll be capturing data from diverse sources, putting it into a database, and building tools that will make it easy for people to find what they're looking for."

Kent's team includes CBSE director David Haussler, professor of biomolecular engineering, as well as software developers Kate Rosenbloom and Rachel Harte, project manager Donna Karolchik, quality assurance manager Robert Kuhn, graduate student Daryl Thomas, and postdoctoral scholar Ting Wang.

Kent and Haussler played a crucial role in the international Human Genome Project, assembling the finished human genome sequence and making it publicly available to researchers worldwide through the UCSC Genome Browser. Kent created the Genome Browser, which now incorporates a wide range of information on the genomes of many different organisms and is a valuable tool used by biomedical researchers throughout the world.

While the sequencing of the human genome was a major scientific achievement, it was just the first step toward the ultimate goal of using genomic information to diagnose, treat, and prevent disease. In recent years, researchers have made major strides in using DNA sequence data to help find genes, which are the parts of the genome that code for proteins. The protein-coding component of these genes, however, makes up just a small fraction of the human genome--about 1.5 percent. There is strong evidence that other parts of the genome have important functions, but very little information exists about where these other functional elements are located and how they work. The ENCODE project aims to address this critical goal of genomics research.

In June, the ENCODE research consortium published a set of landmark papers in the journals Nature and Genome Research that found the organization, function, and evolution of the genome to be far more complicated than most had suspected. For example, while researchers have traditionally focused on studying genes and their associated proteins, the ENCODE data indicate the genome is a very complex, interwoven network in which genes are just one of many types of DNA sequences with functional impact.

"Some of the most exciting parts of the project are telling us how the genome as a whole works together and how genes are regulated," Kent said. "Only about 2 percent of the genome is producing proteins and other molecules that we know are directly functional. It looks like as much or more of the genome is concerned with the control processes that keep the parts working smoothly together."

The ENCODE data will be incorporated into the UCSC Genome Browser as well as other existing tools for exploring and studying the genome. Kent's team will also develop tools specifically designed to handle ENCODE data. On the browser, different tracks enable users to view different kinds of information about a particular sequence in the genome. But the diversity of the ENCODE data, in addition to the shear quantity, presents a challenge.

"We don't want to end up with a thousand different tracks, which would be too much to sort through on the browser. So part of the work is to let users see the data at various levels and be able to dig down to deeper levels of information when they find an area that's particularly interesting," Kent said.

In addition to the Data Coordination Center, CBSE researchers are also working with ENCODE investigators at other institutions on different aspects of the project.

A related project funded by NHGRI, the model organism ENCODE (modENCODE) project, also involves UCSC researchers. Whereas the main ENCODE project focuses on the human genome, modENCODE aims to identify all of the functional elements in the genomes of two widely used model organisms--the roundworm Caenorhabditis elegans and the fruit fly Drosophila melanogaster. Susan Strome, professor of molecular, cell, and developmental biology, is part of a $7 million modENCODE project to study DNA packaging in the roundworm.

More information about the ENCODE project is available on the project's web site.