If you're new to running GATK on a cloud-based platform, or new to Terra, this information will help get you started. From preprocessing raw sequencing data through variant calling and joint calling, showcase workspaces provide fully reproducible workflows for critical use cases and include extensive documentation and sample data to practice on.
What are GATK Best Practice showcase workspaces?
Curated template workspaces with all the components of a complete project workspace
- Fully reproducible workflows
- Sample data
- Extensive documentation
Use case examples
- Preprocessing genomics data
- Variant discovery for germline and somatic SNPs and Indels
- Copy number and structural variations
You can explore the featured workspaces in read-only mode, or clone a copy to your own billing project (or $300 GCP getting started credits) and try running on the included sample or on your own data.
To learn more, see GATK's best practices documentation.
GATK Best Practices Showcase workspaces
- Germline CNVs - GATK4
- Variant Calling Spark Multicore
- Variant Functional Annotations with Funcotator
- GATK4 Germline Preprocessing Variant Calling Joint Calling
- Somatic CNVs GATK4
- Somatic SNVs - Indels GATK4
- RNA Germline Variant Calling GATK4
- Whole Genome Analysis Pipeline
- CNN variant filter
- Mitochondrial SNPs and Indels variant discovery
What input file types does a workflow accept?
Most of the Broad's GATK workflows accept unaligned BAM files (uBAM). Read through the dashboard for input file description. For the exact specifications, see this document.
Data files not in unmapped BAM (uBAM) format?If your data files are not in unmapped BAM format, check out this sequence file conversion workspace. It contains workflows for converting formats for use in GATK analysis tools.
How does the workflow get the input data?
Your workflows need to know where to find the input data stored in the Cloud. You can enter the complete file paths for a single input in the workflow configuration form, or use the data table to store metadata for your input file. We recommend organizing data with tables. To understand why, watch Why use data tables (6:35 minutes on YouTube).
For more information about the steps to use controlled-access input data, see See Linking authorization/accessing controlled data on external servers.
For step-by-step instructions on how to populate the workspace data table, a video, and practice exercises, see Managing data with workspace tables.
How to run a workflow on data in a data table
To run a workflow on data in the data table, first select the data table that contains the data you want to use.
Select the row with the data to analyze (highlighted with an arrow in the screenshot below), click on the three vertical dots at the upper right, choose the Open With option , and select Workflow (circled).
Where is the generated data stored?Data from a workflow analysis is stored in the workspace cloud storage (Google Bucket or Azure blob storage container) by default. Showcase workflows are preconfigured to write the URI metadata for output files to the same data table that contains the input files, which streamlines downstream analysis.
How do I change the input/output files a workflow uses?
Showcase workspaces are preconfigured to run on sample data included in the workspace. To run on your own data, you need to update your workspace data table to include your own data. To learn how to modify, add, or delete a data table, see Making, modifying, and deleting data tables in Terra. You can use the Terra interface to change input or output file names or locations. See How to set up a workflow analysis for step-by-step instructions, a video tutorial, and a practice exercise.
Additional GATK resources
- For slides and notebooks-based tutorials from GATK workshops, see this blog post.
- Click here for GATK workshop videos on YouTube
Please sign in to leave a comment.