[2019 Mar] GATK Workshop for BroadE

On March 21, 22, 26, and 27, 2019, members of the Broad Institute community participated in a Genome Analysis Toolkit (GATK) workshop as part of the BroadE workshop series. The workshop focused on the core steps involved in calling variants with the Broad’s GATK, using the “Best Practices” developed by the GATK team. Participants learned why each step is essential to the variant discovery process, the operations performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of their dataset.

This workshop is notable because it was the first time that the GATK workshop was conducted on Terra!

Workshop synopsis

Best Practices for variant calling with the Genome Analysis Toolkit

This workshop focuses on calling germline short variants and somatic short variants and copy number alterations with Broad's Genome Analysis Toolkit (GATK), using best practices developed by the DSP Methods development team, who develop GATK. The developers will give talks explaining the rationale, theory, and real-world applications of the GATK Best Practices. You will learn why each step is essential to the variant-calling process, what key operations are performed on the data at each step, and how to use the GATK tools to get the most accurate and reliable results out of your dataset. If you are an experienced GATK user, you will gain a deeper understanding of how the GATK works under the hood and how to improve your results further, especially with respect to the latest innovations.

The hands-on GATK tutorials in this workshop will be conducted on Terra, a new platform developed at Broad in collaboration with Verily Life Sciences for accessing data, running analysis tools and collaborating securely and seamlessly.

Workshop sessions and materials

Day 1: Introduction to GATK Best Practices

0. Introduction to the workshop
Geraldine van der Auwera, Associate Director, Outreach & Communications
Materials: Slides; Video

1. Introduction to high-throughput sequencing data: Understanding the origin and shape of the data
Mark Fleharty, Computational Scientist
Materials: Slides; Video

2. Introduction to data preprocessing: Mapping and cleaning up sequencing data
Yossi Farjoun, Associate Director, Computational Research Methods
Materials: Slides; Video

3. Introduction to variant discovery: Basic concepts, variant types, and their respective workflows
Megan Shand, Senior Computational Associate
Materials: Slides; Video

4. Introduction to pipelining platforms: How we run workflows
Ruchi Munshi, Senior Software Product Manager
Materials: Slides; Video

5. Introductory case study: Tetralogy of Fallot
Anton Kovalsky, Science Writer
Materials: Slides; Video; Terra workspace

Day 2: Germline short variant discovery

0. Introduction to germline short variant discovery: Key considerations and workflow logic
Laura Gauthier, Associate Director, Germline Computational Methods
Materials: Slides; Video

1. Variant calling with HaplotypeCaller: Basic operation and algorithm
James Emery, Software Engineer
Materials: Slides; Video

2. Joint variant calling: GVCF-based workflow using GenomicsDB and GenotypeGVCFs
Geraldine van der Auwera, Associate Director, Outreach & Communications
Materials: Slides; Video

3. Germline variant discovery tutorial
Kate Noblett, Senior Project Coordinator
Materials: Germline variant discovery tutorial workspace; Video

4. Variant filtering by Variant Quality Score Recalibration: Assessing accurate confidence scores to each putative mutation call
Sam Friedman, Machine Learning Scientist
Materials: Slides; Video

5. Genotype refinement workflow: Using additional data to improve genotype calls and likelihoods
Takuto Sato, Senior Computational Associate
Materials: Slides; Video

6. Callset evaluation: Comparing statistics between your callset and external resources
Rori Cremer, Software Engineer
Materials: Slides; Video

Day 3: Somatic variant discovery

0. Introduction to somatic variant discovery: Key considerations and workflow logic
Lee Lichtenstein, Associate Director, Somatic Computational Methods
Materials: Slides; Video

1. Somatic SNVs and Indel variant discovery
Andrey Smirnov, Software Engineer
Materials: Slides; Video

2. GATK Mutect 2 tutorial
Adelaide Rhodes, Senior Computational Associate
Materials: Somatic variant discovery tutorial workspace

3. Somatic copy number alterations
Steve Huang, Computational Scientist
Materials: Slides; Video

Day 4: Additional hands-on practice workspaces

0. Pipelining with WDL and Cromwell (Using this empty workspace, you'll practice starting a workspace from scratch)
Dan Billings, Principal Software Engineer
Materials: Workspace; Worksheet; Slides; Video

1. WDL puzzles
Kate Knoblett, Senior Project Coordinator
Materials: Worksheet

2. How to access and analyze genomics data in real time with BigQuery and a Jupyter Notebook
Allie Hajian, Science Writer
Materials: Workspace; Slides; Video

2. Understanding and using Docker containers
Adelaide Rhodes, Senior Computational Associate
Materials: Slides; Video; Worksheet (note that due to a technical issue we couldn't bring you the video from this section, but we've provided a video from a presentation Adelaide did on the same topic a few weeks later)

Additional resources

The Data Biosphere
- A Data Biosphere for Biomedical Research

Terra Resources
- Documentation https://support.terra.bio/hc/en-us
- Ask questions through the button in the upper left hamburger menu, or on the community forum
- Make a feature request here

Running workflows on Terra
- Configure a Tool to run on your data

Terra's Jupyter Notebooks Environment
- Part I - Key Components
- Part II - Key Operations
- Dos and Don'ts - How not to lose data output files or collaborator edits in a notebook

Jupyter Notebooks Resources
- Jupyter Notebooks 101
- Jupyter Notebooks for data Science (extensions, widgets, and more!)
- Jupyter notebooks cheat sheet
- Mastering markdown
- Markdown cheat sheet

R Resources
Data wrangling, visualization, and analysis
- R for Data Science
- Cheat Sheets for commonly used R packages
- Tidyverse
Developing and (finding the best) R packages
- Advanced R
- R packages
- Finding the best R package amongst the available options: https://www.rdocumentation.org/

BigQuery Resources
- Comprehensive BigQuery documentation
- BigQuery best practices (controlling costs, optimizing Query performance, optimizing storage)
- See the giant list of analytical functions on the right-hand side nav bar here
- Using client libraries (and your favorite programming language) with BigQuery .
- BigQuery YouTube videos (from the Google Cloud Platform developers)

Google Cloud Platform
- Understanding and controlling cloud costs
- Controlling Cloud costs - sample use cases
- Google Cloud Platform (GCP) for Bioinformatics

[2019 Mar] GATK Workshop for BroadE

Workshop synopsis

Workshop sessions and materials

Additional resources

The Data Biosphere
- A Data Biosphere for Biomedical Research

Terra Resources
- Documentation https://support.terra.bio/hc/en-us
- Ask questions through the button in the upper left hamburger menu, or on the community forum
- Make a feature request here

Running workflows on Terra
- Configure a Tool to run on your data

Terra's Jupyter Notebooks Environment
- Part I - Key Components
- Part II - Key Operations
- Dos and Don'ts - How not to lose data output files or collaborator edits in a notebook

Jupyter Notebooks Resources
- Jupyter Notebooks 101
- Jupyter Notebooks for data Science (extensions, widgets, and more!)
- Jupyter notebooks cheat sheet
- Mastering markdown
- Markdown cheat sheet

R Resources
Data wrangling, visualization, and analysis
- R for Data Science
- Cheat Sheets for commonly used R packages
- Tidyverse
Developing and (finding the best) R packages
- Advanced R
- R packages
- Finding the best R package amongst the available options: https://www.rdocumentation.org/

Google Cloud Platform
- Understanding and controlling cloud costs
- Controlling Cloud costs - sample use cases
- Google Cloud Platform (GCP) for Bioinformatics

Chrome
- Setting up Chrome Profiles

Was this article helpful?

That’s great, can you tell us why? (Click all that apply)

Thanks for your feedback, help us improve by telling us what you think could be better (click all that apply)

Comments

Workshop synopsis

Workshop sessions and materials

Additional resources

The Data Biosphere - A Data Biosphere for Biomedical Research

Terra Resources - Documentation https://support.terra.bio/hc/en-us - Ask questions through the button in the upper left hamburger menu, or on the community forum - Make a feature request here

Running workflows on Terra - Configure a Tool to run on your data

Terra's Jupyter Notebooks Environment - Part I - Key Components - Part II - Key Operations - Dos and Don'ts - How not to lose data output files or collaborator edits in a notebook

Jupyter Notebooks Resources - Jupyter Notebooks 101 - Jupyter Notebooks for data Science (extensions, widgets, and more!) - Jupyter notebooks cheat sheet - Mastering markdown - Markdown cheat sheet

R Resources Data wrangling, visualization, and analysis - R for Data Science - Cheat Sheets for commonly used R packages - Tidyverse Developing and (finding the best) R packages - Advanced R - R packages - Finding the best R package amongst the available options: https://www.rdocumentation.org/

Google Cloud Platform - Understanding and controlling cloud costs - Controlling Cloud costs - sample use cases - Google Cloud Platform (GCP) for Bioinformatics

Chrome - Setting up Chrome Profiles

Was this article helpful?

That’s great, can you tell us why? (Click all that apply)

Thanks for your feedback, help us improve by telling us what you think could be better (click all that apply)

The Data Biosphere
- A Data Biosphere for Biomedical Research

Terra Resources
- Documentation https://support.terra.bio/hc/en-us
- Ask questions through the button in the upper left hamburger menu, or on the community forum
- Make a feature request here

Running workflows on Terra
- Configure a Tool to run on your data

Terra's Jupyter Notebooks Environment
- Part I - Key Components
- Part II - Key Operations
- Dos and Don'ts - How not to lose data output files or collaborator edits in a notebook

Jupyter Notebooks Resources
- Jupyter Notebooks 101
- Jupyter Notebooks for data Science (extensions, widgets, and more!)
- Jupyter notebooks cheat sheet
- Mastering markdown
- Markdown cheat sheet

R Resources
Data wrangling, visualization, and analysis
- R for Data Science
- Cheat Sheets for commonly used R packages
- Tidyverse
Developing and (finding the best) R packages
- Advanced R
- R packages
- Finding the best R package amongst the available options: https://www.rdocumentation.org/

Google Cloud Platform
- Understanding and controlling cloud costs
- Controlling Cloud costs - sample use cases
- Google Cloud Platform (GCP) for Bioinformatics

Chrome
- Setting up Chrome Profiles