If you're new to running GATK on a cloud-based platform, or new to Terra, this information will help get you started. From pre-processing raw sequencing data through variant calling and joint calling, showcase workspaces provide fully reproducible workflows for critical use-cases and include extensive documentation, and sample data to practice on.
For slides and notebooks-based tutorials from GATK workshops, see this blog post.
Click here for GATK workshop videos on YouTube.
- What are the GATK Best Practices workflows?
- What input files does a workflow accept?
- How do I change the input/output files a workflow uses?
- How do I link data to a workflow for processing or analysis?
What are GATK Best Practice showcases?
GATK Best Practices showcases cover workflows for preprocessing genomics data and for variant discovery use cases that include germline and somatic SNPs and Indels, copy number and structural variations.
See this documentation for more details on GATK Best Practices.
- Pre-processing (hg37)
- Pre-processing (hg38)
- Germline SNPs and Indels variant discovery (hg37)
- Germline SNPs and Indels variant discovery (hg38)
- Somatic SNVs and Indels variant discovery
- Somatic CNV variant discovery
- CNN variant filter
- Mitochondrial SNPs and Indels variant discovery
- Five dollar genome
What input file types does a workflow accept?
Most of the Broad's GATK workflows accept unaligned BAM files. For the exact specifications, see this document.
Note: If your data files are not in unmapped BAM format, check out this sequence file conversion workspace. It contains workspaces for converting the following formats for use in GATK analysis tools:
- Interleaved FASTQ to paired FASTQ
- Paired FASTQ to unmapped BAM
- BAM to unmapped BAM
- CRAM to BAM
How do I change the input/output files a workflow uses?
Showcase workspaces are pre-configured to run on sample data included in the workspace, but you may need to change the input file names . You can use the Terra UI to change input or output file names or locations. See this article for step-by-step instructions, a video tutorial, and a practice exercise.
How do I link data to a workspace for processing by a workflow?
Your workflows need to know where to find the input data stored in the Cloud. You can store metadata for your input files in the workspace data table, which helps keep data organized (especially if your workflows are configured to write output files to the data table...).
For step-by-step instructions, a video, and practice exercises, see this article.