Scientists from around the world have announced a new challenge to find the best algorithms for detecting all of the abnormal RNA molecules in a cancer cell. This is a community effort, inviting all scientists and enthusiasts to participate in a collaborative crowd-sourced benchmarking effort.
Based on the success of other recent challenges, the new Somatic Mutation Calling RNA challenge will use a cloud model in which contestants submit their algorithms, not their results, to the evaluation. It will be the first challenge to make use of the new NCI Cloud Pilots, which will provide access to co-located data and shared tools as well as some free compute credits that can be used by participants and for final scoring. The challenge was launched on Thursday, June 30, 2016. Registration is available at the Sage Bionetworks Synapse website.
Genomic rearrangements in cancer cells produce fusion transcripts which may give rise to chimeric protein products not present in normal cells. In addition, cancer cells can express alternate forms of encoded messages that give rise to protein variants different from normal tissue. These chimeras and protein variants can serve as robust diagnostic markers or drug targets. Moreover, ongoing research efforts are beginning to unveil the potential clinical relevance of these variant RNA products. Increasing the “alterome” of tumors by fully characterizing their RNA landscapes will expand our understanding of cancer mechanisms, provide new biomarkers, and reveal possible new RNA-based therapeutics, thus improving personalized patient treatment.
“Predicting RNA species in a cancer cell is a particularly challenging task,” said Josh Stuart, professor of biomolecular engineering at UC Santa Cruz and one of the challenge leaders. “RNA expression reflects much of the deranged complexity of the underlying cancer cell DNA and then adds another level of derangement on top of that.”
RNA sequencing
The goal of the SMC-RNA Challenge is to identify the best methods for detecting rearrangements in RNA sequencing (RNA-seq) data. Sub-challenges are focused on detecting and quantifying mRNA fusions and isoforms. Methods will be evaluated with both in silico and spiked-in data. Two key questions will be addressed: What is the best way to estimate the abundances of a set of known RNA isoforms, and what is the best way to predict the presence of novel gene fusions? Both of these questions will involve in silico generated and wet lab spiked-in RNA sequencing data.
Like the SMC-Het challenge, contestants will contribute their code as self-contained virtual machines that can be run by the challenge administrators. The contestant code will then be executed on one of the three NCI Cloud Pilots that have been established to facilitate analysis of large scale cancer genomics datasets.
The Cancer Genomics Cloud Pilots are designed to explore innovative methods for accessing and computing on large genomic data. They aim to bring data and analysis together on a single platform by creating a set of data repositories with co-located computational capacity and an application programming interface (API) that provides secure data access. The goals of the cloud pilots are to democratize access to NCI-generated genomic and related data and to create a cost-effective way to provide computational support to the cancer research community. Three contracts were awarded to develop the cloud pilots, to the Broad Institute, the Institute for Systems Biology, and Seven Bridges Genomics. Each of these groups is developing infrastructure and a set of tools to access, explore, and analyze molecular data.
“The NCI is intrigued by the potential of the DREAM challenge. Leveraging the cloud pilot concept to enable crowdsourcing to improve cancer transcript detection and quantification shows the kind of significant impact the cloud-based infrastructure can have,” said Tanja Davidsen, a biomedical informatics program manager at the National Cancer Institute.
Cloud compute
The challenge will initially leverage cloud compute available from the Institute for Systems Biology and then expand to include those provided by Seven Bridges and the Broad Institute. SMC-RNA is based on the containerized software and portable workflow descriptions. As such, upon completion, any compatible cloud system also will be able to replicate the execution and evaluation of all submitted code.
To motivate a high level of collaboration, Sage Bionetworks’ Synapse platform provides leaderboards, the ability for teams to dynamically form and re-form as the Challenge proceeds, and a discussion forum where participants can share ideas. As an added incentive, all individuals and teams that submit a final model will be invited as consortium coauthors on an overview paper of the challenge that will be submitted to Nature Biotechnology as the official journal partner of the challenge. Top performers will receive travel awards and speaking invitations to the 2017 DREAM Conference.
“It is an exciting development to see several technologies converge on this challenge so elegantly,” said Kyle Ellrott , researcher with the OHSU Knight Cancer Institute, assistant professor at the OHSU School of Medicine, and one of the challenge leaders. “The cloud pilots are available to provide access to scalable compute to large datasets. With the SMC-Het, and now SMC-RNA, we employ an evaluation mechanism that produces reproducible bioinformatics methods. The results of these challenges can be used to solve important problems in cancer genomics. And at the end of the challenge any of the submitted techniques could be made available to the users the cloud pilots for them to apply to their own data. It is truly a dynamic combination that is set to accomplish great things.”
The Ontario Institute for Cancer Research (OICR) is the central coordinating agency of the DREAM Challenge, led by Dream Challenge director Dr. Paul Boutros of OICR.