Covid dashboard

Anton Kovalsky
  • Updated

This article contains additional details about the workflows found in the COVID-19 workspace. You'll find a table outlining example inputs and outputs for each of the workflows listed in the workspace dashboard below. For more information about this workspace, see COVID-19 workspaces, data, and tools in Terra.

Getting Data into your workspace 

genbank_ingest
Pull fasta files and metadata from NCBI Virus for SARS-CoV-2 and formats metadata for NextStrain workflows.
Input Output
Google_maps_api_key Seqs_fasta
User_email seqs_metadata

 

Fetch_sra_to_bam

This workflow downloads .fastq files from SRA, given an SRA_ID as input. Specifically, the output of the workflow will produce the unaligned BAM file that is needed for viral assembly.

Inputs Outputs

Sra_accession

biosample_accession

 

library_id

 

run_date

 

sample_collected_by

 

sample_collection_date

 

sample_geo_loc

 

sample_strain

 

sequencing_center

 

sequencing_platform

 

sequencing_platform_model

 

sra_metadata

  reads_uBAM

 

Sequence Data processing and assembly

Fastq_to_ubam

This workflow accepts paired-end or single-end fastq files and converts them to uBAM files.

Input

Output

fastq_1

unmapped_bam
fastq_2  
library_name  
platform_name  
platform_unit  
readgroup_name  
run_date  
sample_name  
sequencing_center  

 

Demux_deplete

Picard-based demultiplexing and basecalling from a tarball of a raw bcl directory, followed by QC metrics and depletion.

Input Output
   

 

Assemble_refbased
This takes a raw read file (uBAM) and assembles a viral genome by aligning to a reference genome.
Input Output

reads_unmapped_bams

 

reference_fasta

 

 

Calling Lineages and Clades

Sarscov2_lineages

Call NextClade and PANGOlin on a single SARS-CoV-2 genome
Input Output
genome_fasta  

 

Sarscov2_nextclade_multi

Create NextClade visualizations on many SARS-CoV-2 genomes

Input Output
basename auspice_json
genome_fastas nextclas_json
  nextclade_tsv

NextStrain Phylogenetic Analysis 

Sarscov2_nexstrain

aligns assemblies, build trees and produces a json representation for NextStrain visualziation
Input Output
build_name  
builds_yaml  

 

genbank_curate

Pulls fasta files and metadata from NCBI Virus and formats metadata to use in Nexstrain workvlows
Input Output
   

All-in-one Workflows 

Diagram of the Sarscov2_illumina_full workflow. The diagram shows four colored circled that are connected together in a line, with the labels 'demux_deplete', 'sarscov2_lineages', 'sarscov2_nextclade', and 'sarscov2_genbank.' The text above these circles reads, 'Full SARS-COV-2 analysis workflow starting from raw illumina flowcell (.tar.gz) and metadata and performing assembly, spike-in analysis, qc, lineage assignment, and packaging for data release.'

 

Sarscov2_illumina_full

This workflow is a combination of several workflows, as illustrated in the diagram above.

Input

Explanation

samplesheet

Custom formatted tsv samplesheet 

spikein_db

Spike-in fasta file

author_template_sbt

List of authors for submission to NCBI

spuid_namespace

Defaulted to Broad_GCID

amplicon_bed_prefix

Amplicon primers to trim in reference coordinate space

biosample_attributes

The post-submission attributes file that is generated by NCBI after BioSample submission

flowcell_tgz

gs:// path to flowcell.tar.gz

instrument_model

Controlled vocabulary text field for instrument model

reference_fasta

Reference genome to align reads to

sra_title

Title that will appear in NCBI SRA packed file (free text)

sample_rename_map

Two column tsv that links the sample id assigned to sample upon receipt in lab to a reformatted sample id acceptable to NCBI

submitting_lab_name

 

account_name

 

bioproject

 

ftp_config_js

 

id_salt

 

 

sarscov2_sra_to_genbank

Full SARS-CoV-2 analysis workflow starting from SRA data and metadata and performing assembly, spike-in analysis, qc, lineage assignment, and packaging assemblies for data release.
Input Output
submitting_lab_name  
account_name  
author_template_sbt  
spuid_namespace  
submission_name  
submission_uid  
amplicon_bed_default  
reference_fasta  
spikein_db  
SRA_accessions  

 

sarscov2_genbank

Workflow to prepare SARS-CoV-2 assemblies for Genbank submission. This includes QC checks with NCBI's VADR tool and filters out genomes that do not pass its tests.
Input Output
submitting_lab_name gisaid_fasta
account_name gisaid_meta_csv
assemblies_fasta num_input
spuid_namespace num_successful
submission_name num_weird
submission_uid submission_xml
assembly_stats_tsv submission_zip
author_sbt_defaults_yaml submit_ready
author_sbt_j2_template vadr_output
biosample_attributes weird_genbank_xml
  weird_genbank_zip
  weird_gisaid_fasta
  weird_gisaid_meta_csv

 

 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.