The browser team at the UC Santa Cruz Genomics Institute has launched a new landing page for resources related to the COVID-19 pandemic, including the SARS-CoV-2 Genome Browser and lung gene expression datasets on the UCSC Cell Browser.
SARS-CoV-2 is the novel coronavirus that causes the disease COVID-19, which is now a global pandemic. The UCSC Genome Browser team posted the SARS-CoV-2 genome assembly on the browser in early February and has now posted the first release of novel coronavirus annotation data. These latest data were primarily sourced from outside groups and include different kinds of information such as gene annotations, variant data, and locally produced multiple genome alignments.
The UCSC Genome Browser is an interactive web-based interface that aims to facilitate genome research by offering data visualization, genome annotations, and other tools. It is used by researchers all over the world to study genetic data.
The UCSC Cell Browser is an interactive viewer for single-cell gene-expression data. To help researchers studying the effects of COVID-19 in the lungs, the browser team has added multiple lung datasets to the Cell Browser.
The SARS-CoV-2 Genome Browser now includes the following data annotation tracks:
-
NCBI Genes – This track shows genes annotated on the SARS-CoV-2 genome released by the National Center for Biotechnology Information (NCBI) on 2/16/20.
-
Nextstrain Genes – This track shows genes annotated by Nextstrain.org in relation to their collection and processing of SARS-CoV-2 variant data from the Global Initiative on Sharing All Influenza Data (GSAID).
-
UniProt Protein Annotations – A collection of tracks that show protein sequence annotations from the UniProt/SwissProt database, mapped to genomic coordinates. All data has been curated from scientific publications by the UniProt/SwissProt staff. This data is comprised of 11 data tracks.
-
Immune Epitope Database and Analysis Resource (IEDB) Epitopes – This track indicates the immune epitope predictions for B cells, CD4 T-cells and CD8 T-cells, using varying software packages.
-
RT-PCR Primers – This track shows RT-PCR Primers in viral detection kits aligned to the SARS-CoV-2 genome from six different sources, including government agencies from the US, China, Japan, and Thailand.
-
Nextstrain Variants – This track displays all single-nucleotide variants in the thousands of SARS-CoV-2 genome sequences from GISAID collected and processed by Nextstrain.org. This track can be used to examine variation, protein changes, and sequence conservation among SARS-CoV-2 sequences.
-
Nextstrain Clades – This track shows the location of variants that distinguish each of the branches of interest defined by Nextstrain.org and can be used in conjunction with their tree and map diagrams to examine viral lineage.
-
Multiz Alignment & Conservation (44 Strains with bats as hosts) – This track shows multiple alignments of 44 virus sequences, aligned to the SARS-CoV-2 reference sequence SARS-CoV-2/NC_045512.2. It also includes measurements of evolutionary conservation using two methods (phastCons and phyloP) from the PHAST package, for all 44 virus sequences.
Additional information is available on the Genome Browser news site. The SARS-CoV-2 Genome Browser and data annotation tracks are funded by generous individual donors including Pat & Rowland Rebele.