If you're new to running GATK on a cloud-based platform, or new to Terra, this information will help get you started. From pre-processing raw sequencing data through variant calling and joint calling, showcase workspaces provide fully reproducible workflows for critical use-cases and include extensive documentation and sample data to practice on.
For slides and notebooks-based tutorials from GATK workshops, see this blog post.
Click here for GATK workshop videos on YouTube.
- What are the GATK Best Practices workflow workspaces?
- What input files does a workflow accept?
- How do I link data to a workspace and run a GATK workflow?
- How do I change the input/output files a workflow uses?
What are GATK Best Practice showcase workspaces?
GATK Best Practices showcases are curated template workspaces that demonstrate all the components of a complete project workspace: analysis tools, sample data, and extensive documentation. They feature pipelining workflows for use-cases including preprocessing genomics data, variant discovery for germline and somatic SNPs and Indels, copy number and structural variations. You can explore the featured workspaces in read-only mode, or clone a copy to your own billing project (or free credits) and try running on the included sample or on your own data.
To learn more about GATK Best Practices, see this documentation.
- Germline SNPs and Indels variant discovery (hg38)
- Exome analysis pipeline
- Somatic CNVs GATK4
- Somatic SNVs - Indels GATK4
- RNA Germline Variant Calling GATK4
- Whole Genome Analysis Pipeline
- CNN variant filter
- Germline SNPs and Indels variant discovery (hg37)
- Mitochondrial SNPs and Indels variant discovery
What input file types does a workflow accept?
Most of the Broad's GATK workflows accept unaligned BAM files (uBAM). For the exact specifications, see this document.
Note: If your data files are not in unmapped BAM format, check out this sequence file conversion workspace. It contains workflows for converting the following formats for use in GATK analysis tools:
- Interleaved FASTQ to paired FASTQ
- Paired FASTQ to unmapped BAM
- BAM to unmapped BAM
- CRAM to BAM
How do I link data to a workspace for processing by a workflow?
Your workflows need to know where to find the input data stored in the Cloud. You can enter the complete file paths for a single input in the workflow card, or use the data table to store metadata for your input file, which also helps keep data organized.
For step-by-step instructions on how to populate the workspace data table, a video, and practice exercises, see this article.
How to run a workflow on data in a data table
To run a workflow on data in the data table, select the data (arrow in screenshot), choose the "Open With" button, and select the "Workflow" option (circled):
Note that showcase workflows are preconfigured to write metadata for output files to your data table, which streamlines downstream analysis.
How do I change the input/output files a workflow uses?
Showcase workspaces are pre-configured to run on sample data included in the workspace. To run on your own data, you will need to update your workspace data table to include your own data. You can use the Terra interface to change input or output file names or locations. See this article for step-by-step instructions, a video tutorial, and a practice exercise.