Documentation
Data
The content of SARS-CoV-2 databases has been compiled from several publicly available sequence databases available through INSDC (International Nucleotide Sequence Database Collaboration). Besides, we have used publicly available data from Johns Hopkins University Coronavirus Resource Centre and literature databases such as Europe PMC and PubMed for curation.
Sequence Data
The sequence data, including assembly and nucleotide and protein sequences, has been downloaded from ENA. The sequences can be downloaded from the “Download “pages.
Contextual database
The contextual database has been compiled from available metadata parsed from BioProject (Study), BioSample (Sample), Assembly (Analysis) and Run (Experiment) files. The parsed contextual data has been curated to ensure consistency across the database. Phylogenetic linage has been predicted using the Pangolin2.0 (https://github.com/cov-lineages/pangolin).
BLAST databases
The BLAST database has been created in-house and can be downloaded from the “Download” pages.
Data repository
The sequences can be downloaded from the “Download” page.
REST API
An API gives programmatic access to the data. For more details, see here.