This article includes data storage and analysis cost estimates for selected featured workspaces. Actual time and cost will vary depending on the size of your dataset and whether you use preemptible VMs.
Ultima Genomics whole genome germline
This workspace contains a fully reproducible example workflow for pre-processing germline whole-genome sequence data derived from the Ultima Genomics Platform.
Estimated time and cost to run on sample data
Workflow Configuration | Sample Name | Sample Size | Time | Cost $ |
---|---|---|---|---|
Ultima_Genomics | downsampled_NA12878 | ~3.00 GB | 3 hr 46 min | 0.98 |
Ultima_Genomics | 004731-UGAv3-30-CTGCCAGACTGTGA | 55.62 GB | 26 hrs | 15.56 |
CNest - Terra
This workspace runs CNest, a copy number estimator and variant caller developed for large scale analysis of copy number from NGS data.
It primarily uses read depth information to generate robust copy number estimates for individual samples and is most appropriate for use in very large cohorts (minimum of 1000 samples).
Estimated time and cost to run on sample data
Sample Name | Sample Size | Time | Cost $ |
---|---|---|---|
NA12878 | 64.89 GB | 3:05:00 | 0.68 |
GATK4 Germline Preprocessing Variant Calling Joint Calling
This tutorial workspace contains notebooks and workflows for pre-processing and SNP and Indel variant calling.
Estimated time and cost to run on sample data
Sample Name | Sample Size | Time | Cost $ |
---|---|---|---|
NA12878_24RG_small | 3.11 GB | 1:28:00 | 0.19 |
NA12878 | 64.89 GB | 22:35:00 | 5.23 |
downsampled-1kgp-50-exomes | 32.13 GB | 02:07:00 | 7.65 |
Human-Pangenome-Giraffe-DeepVariant-AnVIL-ASHG-Jan22
This workspace demonstrates the Giraffe/DeepVariant pipeline for calling germline variants using the Human Pangenome Reference Consortium's (HPRC) year one pangenome. This workspace is a demonstration of using a pangenome from the HPRC in AnVIL and Terra.
Estimated time and cost to run on sample data
Input Coverage | Time | Cost $ |
---|---|---|
35X | 10 hours | $15.75 |
GEM Showcase
This workspace demonstrates a gene-environment interaction analysis pipeline on Terra using the software program GEM (Gene-Environment interaction analysis for Millions of samples).
Estimated time and cost to run on sample data
Analysis | Sample size | # variants | Time (CPU hrs) | Cost $ |
---|---|---|---|---|
1KG genome-wide interaction study | 1656 | 13.5M | 1.94 | 0.40 |
DRAGEN-GATK whole genome germline pipeline
This workspace contains a fully reproducible example workflow for whole-genome germline sequence data pre-processing using the DRAGEN-GATK mode of the Whole Genome Germline Single Sample (WGS) Pipeline.
Estimated time and cost to run on sample data
Workflow Configuration | Sample Name | Number of Entities | Sample Size | Time | Cost $ |
---|---|---|---|---|---|
Functional Equivalence | NA12878 | 24 | ~3.00 GB | 4 h 12 min | 0.90 |
Maximum Quality | NA12878 | 24 | ~3.00 GB | 4 h 7 min | 0.90 |
Functional Equivalence
This workflow evaluates functional equivalence to allow researchers to be able to combine results from multiple sources into larger datasets. Functional equivalence ensures that genomic data from different sources, processed with different pipelines, can be used interchangeably without risking batch effects.
Estimated time and cost to run on sample data
Sample set | No. Replicates | Time | Cost $ |
---|---|---|---|
HG002 | 3 | 90 min | 3.03 |
GATK4 RNA Germline Variant Calling
This workspace demonstrates how to call germline short variants (SNPs/Indels) from RNAseq data using GATK v4.1 and related tools.
Estimated time and cost to run on sample data
Sample Name | Sample Size | Time | Cost $ |
---|---|---|---|
NA12878 | 3.09 GB | 9:32:00 | 0.49 |
TRUST4
Tcr Receptor Utilities for Solid Tissue (TRUST) is a computational tool to analyze TCR and BCR sequences using unselected RNA sequencing data, profiled from solid tissues, including tumors. TRUST4 performs de novo assembly on V, J, C genes including the hypervariable complementarity-determining region 3 (CDR3) and reports consensus of BCR/TCR sequences. See the TRUST4 workspace.
Estimated time and cost to run on sample data
Sample | Format | Read Pairs | Time | Cost $ |
---|---|---|---|---|
FZ-116 | BAM | 86M | 47m | $0.05 |
FZ-116 | FASTQ | 86M | 1h 22m | $0.09 |
Peat-Demo
Demo of how to use Peat (external link) to save overhead by grouping jobs into fewer WDL scatter branches. To compare scatter with and without Peat, this workspace has two simple demo workflows using WDL scatter: one with, and one without Peat.
Scatter Without Peat
Performs a simple job (writing a line to a file) many times via simple WDL scatter, then additionally concatenates all files into a single output file.
Estimated time and cost to run on sample data (without Peat)
n_jobs | time | link | cost $ |
---|---|---|---|
1000 | 0:30 | link | 2.99 |
1200 | 0:57 | link | 3.81 |
1500 | 1:00 | link | 5.08 |
2000 | 1:14 | link | 6.51 |
Scatter With Peat
Performs the same job, but using Peat to run multiple jobs on each WDL scatter branch, then additionally concatenates all files into a single output file.
Estimated time and cost to run on sample data (with Peat)
n_jobs | n_groups | time | link | cost $ |
---|---|---|---|---|
1000 | 50 | 0:11 | link | 0.16 |
1200 | 50 | 0:32 | link | 0.24 |
1500 | 50 | 0:33 | link | 0.16 |
2000 | 50 | 0:29 | link | 0.16 |
Intro to HCA data on Terra
This tutorial workspace is a step-by-step guide to importing, accessing, and analyzing standardized cell-by-gene count matrices (Loom format) from the Human Cell Atlas (HCA) Data Portal using community-supported single-cell analysis tools.
Estimated time and cost to run on sample data
Workflow/Notebook | Timing | Notes | Cost ($) |
---|---|---|---|
Cumulus workflow | 18 min | Runs on entire matrix (5 donors) | 0.16 |
Bioconductor notebook | ~ 28 min | Runs on matrix subset (1 donor) | 0.09 |
Pegasus notebook | ~ 5 min | Runs on matrix subset (1 donor) | 0.02 |
Scanpy notebook | ~ 8 min | Runs on matrix subset (1 donor) | 0.03 |
Seurat notebook | ~ 21 min | Runs on matrix subset (1 donor) | 0.07 |
InferCNV
A fully reproducible example workflow for inferring copy number from single-cell RNA sequencing data. See the InferCNV workspace.
Estimated time and cost to run on sample data
Time | Cost ($) |
---|---|
30 minutes |
< $0.01 |
CRDC-Dynamic-Queries-for-NIH-Genomic-Data-Commons-Projects
This workspace shows you how to take a query result from the NCI Genomic Data Commons (GDC) data portal and use it as the input to a workflow (or Notebook) in Terra.
Estimated time and cost to run on sample data
file Name | Time | Cost $ |
---|---|---|
htseq_counts.txt.gz | 4m | <0.01 |
CTAT mutations
A fully reproducible example workflow for detecting variants from RNA sequencing data. Go to the workspace.
Estimated time and cost to run on sample data
Sample Name | Time | Cost $ |
---|---|---|
test | 3 hours, 17 minutes | $0.18 |
GATK Structural Variation on Single Samples
This integrated structural variation detection and resolution pipeline calls many forms of structural variation in whole genome sequencing data obtained from a single sample. The pipeline will identify, genotype, and annotate structural variation. .Go to the workspace.
Estimated time and cost to run on sample data
Sample Name | Sample Size | Time | Cost $ |
---|---|---|---|
NA12878 | 18.17 GiB | 23hrs | ~$7.71 |
Whole Genome Analysis Pipeline
This workspace contains fully reproducible example workflows for whole genome sequence data pre-processing, germline short variant discovery, and joint variant calling, as used for production by the Genomics Platform at the Broad Institute and recommended for research purposes.
Estimated time and cost to run on sample data
Sample Name | Time | Cost $ |
---|---|---|
WGS_JointGenotyping | 04:05:00 | $7.93 |
CHIP Detection Mutect2
This workspace builds on the GATK4 somatic variant WDL workflow, Mutect2, enabling investigators to perform variant calling and filtering for CHIP data in a consistent and reproducible manner. Users who would benefit from this workspace include investigators interested in the biological implications of CHIP including its role in both malignant and non-malignant disease.
Estimated time and cost to run on sample data
Sample Name | Sample Size | Time | Cost $ |
---|---|---|---|
SRS000030 (Mutated) | 59.84 GB | 1:50:00 | $0.10 |
SRS000035 | 37.53 GB | 1:49:00 | $0.10 |
bisulfite-seq-tools-grch38
Workflows in this workspace can be used for alignment and quality control analysis for DNA methylation protocols including Whole Genome Bisulfite Sequencing (WGBS), Reduced Representation Bisulfite Sequencing (RRBS), and Hybrid Selection Bisulfite Sequencing (HSBS).
Note: This workspace is pre-configured for GRCh38.
Viral Insertion Detection
This workspace demonstrates a proof-of-concept approach to viral insertion detection. It includes a pipeline for identifying viral reads found in a host organism and detecting potential insertion sites in a host's genome.
Estimated time and cost to run on sample data
Sample Name | Sample Size | Time | Cost $ |
---|---|---|---|
1 | 753 KB | 0:21 | 0.03 |
Exome Analysis Pipeline
This workspace contains fully reproducible example workflows for exome sequence data pre-processing, germline short variant discovery, and joint variant calling, as used for production by the Genomics Platform at the Broad Institute and recommended for research purposes.
Estimated time and cost to run on sample data
Sample Name | Number of Entities | Sample Size | Time | Cost $ |
---|---|---|---|---|
NA12878 | 2 | 8.08 GB | 06:21:00 | ~$0.64 |
Trinity
A fully reproducible example workflow for RNA-Seq de-novo assembly using Trinity. Go to the workspace.
Estimated time and cost to run on sample data
Number of reads | Time | Cost $ |
---|---|---|
10 million | 130 minutes | $1.40 |
50 million | 360 minutes | $4.56 |
HCA_Optimus_Pipeline
The Optimus pipeline, developed in collaboration with the Human Cell Atlas Data Coordination Platform (HCA DCP) and the BRAIN Initiative Cell Census Network (BICCN), processes 3 prime single-cell or single-nucleus transcriptome data from the 10x Genomics v2 or v3 assay. This workspace currently describes v5.5.0
of the Optimus pipeline and provides fully reproducible examples of the workflow.
Estimated time and cost to run on sample data
Sample Set Name | Set Size | Sample Set R1.fastq Size | Sample Set R2.fastq Size | Time | Cost $ |
---|---|---|---|---|---|
neurons2k_mouse | 6 entities | 88.26 MB | 277.58 MB | 1:22:00 | 0.09 |
pbmc4k_human | 2 entities | 26.84 MB | 59.58 MB | 1:14:00 | 0.16 |
pbmc_human_v3 | 2 entities | 106.95 MB | 220.04 MB | 1:36:00 | 0.11 |
ENCODE-Tutorial-May-2020
Learn how to search, analyze, and visualize ENCyclopedia Of DNA Elements (ENCODE) data. The resources in this workspace cover binning ENCODE ChIP-seq datasets into non-overlapping 5 kB bins and determining the signal enrichment in each bin. More information about the ENCODE project can be found at https://www.encodeproject.org (external link).
Estimated time and cost to run on sample data
Workflow Name | Time to Run 1 file | Time to Run 100 files | 1 file (range) | 100 files |
---|---|---|---|---|
PBS-bam | 10-15 minutes | 15-30 minutes | $0.03 | < $3.15 |
Cumulus
This workspace is a showcase of Cumulus(external link), a cloud-based single-cell/single-nucleus data analysis framework. It uses a large-scale single-cell dataset, and demonstrates Cumulus on both workflow and interactive analysis.
Estimated time and cost to run on sample data
Step | CPU | Memory | Time | Cost $ |
---|---|---|---|---|
cellranger_workflow | 32 * 8 | 120 GB * 8 | 1h34min | $2.65 |
cumulus | 32 | 200 GB | 22min | $0.17 |
2019 ASHG Reproducible GWAS (v2)
This workspace reproduces the steps in a genome wide association study (GWAS), using 1,000 Genomes Project¹ (phase 3) genotypes and simulated phenotypes.
The analysis is structured in two parts
- Explore phenotypes and population structure (Jupyter Notebook - Hail/Python)
- Test for genetic associations using mixed-models and generate summary visualizations (WDL workflow)
Estimated time and cost to run on sample data
Sample Size | # Variants | Time | Cost $ |
---|---|---|---|
2,500 samples | 22,000 | 8m | $0.49 |