This article contains additional details about the workflows found in the COVID-19 workspace. You'll find a table outlining example inputs and outputs for each of the workflows listed in the workspace dashboard below. For more information about this workspace, see COVID-19 workspaces, data, and tools in Terra.
Getting Data into your workspace
genbank_ingest | |
Pull fasta files and metadata from NCBI Virus for SARS-CoV-2 and formats metadata for NextStrain workflows. | |
Input | Output |
Google_maps_api_key | Seqs_fasta |
User_email | seqs_metadata |
Fetch_sra_to_bam | |
This workflow downloads .fastq files from SRA, given an SRA_ID as input. Specifically, the output of the workflow will produce the unaligned BAM file that is needed for viral assembly. |
|
Inputs | Outputs |
Sra_accession |
biosample_accession |
library_id |
|
run_date |
|
sample_collected_by |
|
sample_collection_date |
|
sample_geo_loc |
|
sample_strain |
|
sequencing_center |
|
sequencing_platform |
|
sequencing_platform_model |
|
sra_metadata |
|
reads_uBAM |
Sequence Data processing and assembly
Fastq_to_ubam | |
This workflow accepts paired-end or single-end fastq files and converts them to uBAM files. |
|
Input |
Output |
fastq_1 |
unmapped_bam |
fastq_2 | |
library_name | |
platform_name | |
platform_unit | |
readgroup_name | |
run_date | |
sample_name | |
sequencing_center |
Demux_deplete | |
Picard-based demultiplexing and basecalling from a tarball of a raw bcl directory, followed by QC metrics and depletion. |
|
Input | Output |
Assemble_refbased | |
This takes a raw read file (uBAM) and assembles a viral genome by aligning to a reference genome. | |
Input | Output |
reads_unmapped_bams |
|
reference_fasta |
Calling Lineages and Clades
Sarscov2_lineages |
|
Call NextClade and PANGOlin on a single SARS-CoV-2 genome | |
Input | Output |
genome_fasta |
Sarscov2_nextclade_multi |
|
Create NextClade visualizations on many SARS-CoV-2 genomes |
|
Input | Output |
basename | auspice_json |
genome_fastas | nextclas_json |
nextclade_tsv |
NextStrain Phylogenetic Analysis
Sarscov2_nexstrain |
|
aligns assemblies, build trees and produces a json representation for NextStrain visualziation | |
Input | Output |
build_name | |
builds_yaml |
genbank_curate |
|
Pulls fasta files and metadata from NCBI Virus and formats metadata to use in Nexstrain workvlows | |
Input | Output |
All-in-one Workflows
Sarscov2_illumina_full | |
This workflow is a combination of several workflows, as illustrated in the diagram above. |
|
Input |
Explanation |
samplesheet |
Custom formatted tsv samplesheet |
spikein_db |
Spike-in fasta file |
author_template_sbt |
List of authors for submission to NCBI |
spuid_namespace |
Defaulted to Broad_GCID |
amplicon_bed_prefix |
Amplicon primers to trim in reference coordinate space |
biosample_attributes |
The post-submission attributes file that is generated by NCBI after BioSample submission |
flowcell_tgz |
gs:// path to flowcell.tar.gz |
instrument_model |
Controlled vocabulary text field for instrument model |
reference_fasta |
Reference genome to align reads to |
sra_title |
Title that will appear in NCBI SRA packed file (free text) |
sample_rename_map |
Two column tsv that links the sample id assigned to sample upon receipt in lab to a reformatted sample id acceptable to NCBI |
submitting_lab_name |
|
account_name |
|
bioproject |
|
ftp_config_js |
|
id_salt |
|
sarscov2_sra_to_genbank |
|
Full SARS-CoV-2 analysis workflow starting from SRA data and metadata and performing assembly, spike-in analysis, qc, lineage assignment, and packaging assemblies for data release. | |
Input | Output |
submitting_lab_name | |
account_name | |
author_template_sbt | |
spuid_namespace | |
submission_name | |
submission_uid | |
amplicon_bed_default | |
reference_fasta | |
spikein_db | |
SRA_accessions |
sarscov2_genbank |
|
Workflow to prepare SARS-CoV-2 assemblies for Genbank submission. This includes QC checks with NCBI's VADR tool and filters out genomes that do not pass its tests. | |
Input | Output |
submitting_lab_name | gisaid_fasta |
account_name | gisaid_meta_csv |
assemblies_fasta | num_input |
spuid_namespace | num_successful |
submission_name | num_weird |
submission_uid | submission_xml |
assembly_stats_tsv | submission_zip |
author_sbt_defaults_yaml | submit_ready |
author_sbt_j2_template | vadr_output |
biosample_attributes | weird_genbank_xml |
weird_genbank_zip | |
weird_gisaid_fasta | |
weird_gisaid_meta_csv |