This article contains additional details about the workflows found in the [COVID-19] workspace. You'll find a table outlining example inputs and outputs for each of the workflows listed in the workspace dashboard below.
Getting Data into your workspace
genbank_ingest | |
Pull fasta files and metadata from NCBI Virus for SARS-CoV-2 and formats metadata for NextStrain workflows. | |
Input | Output |
Google_maps_api_key | Seqs_fasta |
User_email | seqs_metadata |
Fetch_sra_to_bam | |
This workflow downloads .fastq files from SRA, given an SRA_ID as input. Specifically, the output of the workflow will produce the unaligned BAM file that is needed for viral assembly. |
|
Inputs | Outputs |
Sra_accession |
biosample_accession |
library_id |
|
run_date |
|
sample_collected_by |
|
sample_collection_date |
|
sample_geo_loc |
|
sample_strain |
|
sequencing_center |
|
sequencing_platform |
|
sequencing_platform_model |
|
sra_metadata |
|
reads_uBAM |
Sequence Data processing and assembly
Fastq_to_ubam | |
This workflow accepts paired-end or single-end fastq files and converts them to uBAM files. |
|
Input |
Output |
fastq_1 |
"gs://test/pass1_fastq.gz" |
fastq_2 | "gs://test/pass2_fastq.gz" |
library_name | veroSTAT-IKO-illumina |
platform_name | Illumina |
platform_unit | M01472 |
readgroup_name | A |
run_date | 04082020 |
sample_name | Sample1 |
sequencing_center | Broad Institute |
Demux_deplete | |
Picard-based demultiplexing and basecalling from a tarball of a raw bcl directory, followed by QC metrics and depletion. |
|
Input | Output |
Assemble_refbased | |
This takes a raw read file (uBAM) and assembles a viral genome by aligning to a reference genome. | |
Input | Output |
Calling Lineages and Clades
Sarscov2_lineages |
|
Call NextClade and PANGOlin on a single SARS-CoV-2 genome | |
Input | Output |
Sarscov2_nextclade |
|
Create NextClade visualizations on many SARS-CoV-2 genomes |
|
Input | Output |
NextStrain Phylogenetic Analysis
Sarscov2_nexstrain |
|
aligns assemblies, build trees and produces a json representation for NextStrain visualziation | |
Input | Output |
genbank_curate |
|
Pulls fasta files and metadata from NCBI Virus and formats metadata to use in Nexstrain workvlows | |
Input | Output |
All-in-one Workflows
Sarscov2_illumina_full | |
This workflow is a combination of several workflows, as illustrated in the diagram above. |
|
Input |
Output |
samplesheet |
Custom formatted tsv samplesheet |
spikein_db |
Spike-in fasta file |
author_template_sbt |
List of authors for submission to NCBI |
spuid_namespace |
Defaulted to Broad_GCID |
amplicon_bed_prefix |
Amplicon primers to trim in reference coordinate space |
biosample_attributes |
The post-submission attributes file that is generated by NCBI after BioSample submission |
flowcell_tgz |
gs:// path to flowcell.tar.gz |
instrument_model |
Controlled vocabulary text field for instrument model |
reference_fasta |
Reference genome to align reads to |
sra_title |
Title that will appear in NCBI SRA packed file (free text) |
sample_rename_map |
Two column tsv that links the sample id assigned to sample upon receipt in lab to a reformatted sample id acceptable to NCBI |
sarscov2_sra_to_genbank |
|
Input | Output |
sarscov2_genbank |
|
Input | Output |