On March 21, 22, 26, and 27, 2019, members of the Broad Institute community participated in a Genome Analysis Toolkit (GATK) workshop as part of the BroadE workshop series. The workshop focused on the core steps involved in calling variants with the Broad’s GATK, using the “Best Practices” developed by the GATK team. Participants learned why each step is essential to the variant discovery process, the operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of their dataset.
This workshop is notable because it was the first time that the GATK workshop was conducted on Terra!
Workshop synopsis
Best Practices for variant calling with the Genome Analysis Toolkit
This workshop focuses on calling germline short variants and somatic short variants and copy number alterations with Broad's Genome Analysis Toolkit (GATK), using best practices developed by the DSP Methods development team, who develop GATK. The developers will give talks explaining the rationale, theory, and real-world applications of the GATK Best Practices. You will learn why each step is essential to the variant-calling process, what key operations are performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset. If you are an experienced GATK user, you will gain a deeper understanding of how the GATK works under the hood and how to improve your results further, especially with respect to the latest innovations.
The hands-on GATK tutorials in this workshop will be conducted on Terra, a new platform developed at Broad in collaboration with Verily Life Sciences for accessing data, running analysis tools and collaborating securely and seamlessly.
Workshop sessions and materials
Day 1: Introduction to GATK Best Practices
0. Introduction to the workshop
Geraldine van der Auwera, Associate Director, Outreach & Communications
Materials: Slides; Video
1. Introduction to high-throughput sequencing data: Understanding the origin and shape of the data
Mark Fleharty, Computational Scientist
Materials: Slides; Video
2. Introduction to data preprocessing: Mapping and cleaning up sequencing data
Yossi Farjoun, Associate Director, Computational Research Methods
Materials: Slides; Video
3. Introduction to variant discovery: Basic concepts, variant types, and their respective workflows
Megan Shand, Senior Computational Associate
Materials: Slides; Video
4. Introduction to pipelining platforms: How we run workflows
Ruchi Munshi, Senior Software Product Manager
Materials: Slides; Video
5. Introductory case study: Tetralogy of Fallot
Anton Kovalsky, Science Writer
Materials: Slides; Video; Terra workspace
Day 2: Germline short variant discovery
0. Introduction to germline short variant discovery: Key considerations and workflow logic
Laura Gauthier, Associate Director, Germline Computational Methods
Materials: Slides; Video
1. Variant calling with HaplotypeCaller: Basic operation and algorithm
James Emery, Software Engineer
Materials: Slides; Video
2. Joint variant calling: GVCF-based workflow using GenomicsDB and GenotypeGVCFs
Geraldine van der Auwera, Associate Director, Outreach & Communications
Materials: Slides; Video
3. Germline variant discovery tutorial
Kate Noblett, Senior Project Coordinator
Materials: Germline variant discovery tutorial workspace; Video
4. Variant filtering by Variant Quality Score Recalibration: Assessing accurate confidence scores to each putative mutation call
Sam Friedman, Machine Learning Scientist
Materials: Slides; Video
5. Genotype refinement workflow: Using additional data to improve genotype calls and likelihoods
Takuto Sato, Senior Computational Associate
Materials: Slides; Video
6. Callset evaluation: Comparing statistics between your callset and external resources
Rori Cremer, Software Engineer
Materials: Slides; Video
Day 3: Somatic variant discovery
0. Introduction to somatic variant discovery: Key considerations and workflow logic
Lee Lichtenstein, Associate Director, Somatic Computational Methods
Materials: Slides; Video
1. Somatic SNVs and Indel variant discovery
Andrey Smirnov, Software Engineer
Materials: Slides; Video
2. GATK Mutect 2 tutorial
Adelaide Rhodes, Senior Computational Associate
Materials: Somatic variant discovery tutorial workspace
3. Somatic copy number alterations
Steve Huang, Computational Scientist
Materials: Slides; Video
Day 4: Additional hands-on practice workspaces
0. Pipelining with WDL and Cromwell (Using this empty workspace, you'll practice starting a workspace from scratch)
Dan Billings, Principal Software Engineer
Materials: Workspace; Worksheet; Slides; Video
1. WDL puzzles
Kate Knoblett, Senior Project Coordinator
Materials: Worksheet
2. How to access and analyze genomics data in real time with BigQuery and a Jupyter Notebook
Allie Hajian, Science Writer
Materials: Workspace; Slides; Video
2. Understanding and using Docker containers
Adelaide Rhodes, Senior Computational Associate
Materials: Slides; Video; Worksheet (note that due to a technical issue we couldn't bring you the video from this section, but we've provided a video from a presentation Adelaide did on the same topic a few weeks later)