Covid dashboard

Anton Kovalsky
  • Updated

This article contains additional details about the workflows found in the [COVID-19] workspace. You'll find a table outlining example inputs and outputs for each of the workflows listed in the workspace dashboard below.

Getting Data into your workspace 

genbank_ingest
Pull fasta files and metadata from NCBI Virus for SARS-CoV-2 and formats metadata for NextStrain workflows.
Input Output
Google_maps_api_key Seqs_fasta
User_email seqs_metadata

 

Fetch_sra_to_bam

This workflow downloads .fastq files from SRA, given an SRA_ID as input. Specifically, the output of the workflow will produce the unaligned BAM file that is needed for viral assembly.

Inputs Outputs

Sra_accession

biosample_accession

 

library_id

 

run_date

 

sample_collected_by

 

sample_collection_date

 

sample_geo_loc

 

sample_strain

 

sequencing_center

 

sequencing_platform

 

sequencing_platform_model

 

sra_metadata

  reads_uBAM

 

Sequence Data processing and assembly

Fastq_to_ubam

This workflow accepts paired-end or single-end fastq files and converts them to uBAM files.

Input

Output

fastq_1

"gs://test/pass1_fastq.gz"
fastq_2 "gs://test/pass2_fastq.gz"
library_name veroSTAT-IKO-illumina
platform_name Illumina
platform_unit M01472
readgroup_name A
run_date 04082020
sample_name Sample1
sequencing_center Broad Institute

 

Demux_deplete

Picard-based demultiplexing and basecalling from a tarball of a raw bcl directory, followed by QC metrics and depletion.

Input Output
   

 

Assemble_refbased
This takes a raw read file (uBAM) and assembles a viral genome by aligning to a reference genome.
Input Output
   

 

Calling Lineages and Clades

Sarscov2_lineages

Call NextClade and PANGOlin on a single SARS-CoV-2 genome
Input Output
   

 

Sarscov2_nextclade

Create NextClade visualizations on many SARS-CoV-2 genomes

Input Output
   

NextStrain Phylogenetic Analysis 

Sarscov2_nexstrain

aligns assemblies, build trees and produces a json representation for NextStrain visualziation
Input Output
   

 

genbank_curate

Pulls fasta files and metadata from NCBI Virus and formats metadata to use in Nexstrain workvlows
Input Output
   

All-in-one Workflows 

Screen_Shot_2021-03-26_at_12.00.09_PM.png

 

Sarscov2_illumina_full

This workflow is a combination of several workflows, as illustrated in the diagram above.

Input

Output

samplesheet

Custom formatted tsv samplesheet 

spikein_db

Spike-in fasta file

author_template_sbt

List of authors for submission to NCBI

spuid_namespace

Defaulted to Broad_GCID

amplicon_bed_prefix

Amplicon primers to trim in reference coordinate space

biosample_attributes

The post-submission attributes file that is generated by NCBI after BioSample submission

flowcell_tgz

gs:// path to flowcell.tar.gz

instrument_model

Controlled vocabulary text field for instrument model

reference_fasta

Reference genome to align reads to

sra_title

Title that will appear in NCBI SRA packed file (free text)

sample_rename_map

Two column tsv that links the sample id assigned to sample upon receipt in lab to a reformatted sample id acceptable to NCBI

 

sarscov2_sra_to_genbank

 
Input Output
   

 

sarscov2_genbank

 
Input Output
   

 

 

 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.