Practice your WDL scripting skills by solving a WDL puzzle. This tutorial shows you how to access and solve a WDL puzzle workflow, upload it to the Broad Methods Repository, export it to Terra, and run it using sample data in the WDL-puzzles tutorial workspace.
Overview
If you've ever created a WDL workflow and ran it successfully in Terra, you know how rewarding it is to see your work finally produce results. This is a pleasure that even novice WDL writers should experience!
But... beginners may not feel ready to write a full script that can pull in real data.
That's why this tutorial uses WDL puzzles- workflows with missing pieces- to help beginners hone their WDL writing skills. Once solved, these puzzle workflows can be uploaded to Terra and set up to run off a sample data table, giving an end-to-end experience of creating and running workflows in Terra.
Follow the step-by-step instructions below to get started cloning the WDL-puzzles workspace, choosing your puzzle, and running your workflow.
Cloning the WDL puzzles workspace
Before you begin, create your own editable copy (clone) of the WDL-puzzles workspace.
- Click the round circle with three dots in the upper right corner of the workspace
- Select Clone
- Rename your copy with something memorable (we recommend initials and date)
- Choose your Billing project
- Do not select an Authorization Domain. These are only required when using restricted-access data
- Select the Clone Workspace button to make your own cop
Choosing and downloading a WDL puzzle
You can choose one of two puzzles: the "easy-puzzle" or the "advanced-puzzle." The easy puzzle is a modified version of the commonly-used "Hello world!" script. To complete the puzzle, you must identify and fill in a single missing input variable.
The advanced puzzle is a ValidateBam workflow used to validate BAM files by calling the Picard tool ValidateSamFile. To complete the puzzle, you must identify and fill multiple missing input and output variables.
The puzzles are hosted in the WDL-puzzles public Google Bucket; download the puzzles directly from the bucket or from the Workspace Data table (instructions below).
Once you decide which level puzzle you'd like to try, click on the corresponding dropdown below for detailed instructions on downloading and solving the puzzle.
Easy WDL puzzle
Easy WDL puzzle
Background
For this puzzle, you will fill in a missing input for a modified version of the commonly used ‘Hello World!” workflow. This workflow is designed to take a string input (a phrase, place, person’s name, etc.) and output a greeting message specific to the input you choose.
This puzzle’s WDL script has two parts:
- A workflow definition (called HelloInput) that defines an input, calls a task, and writes an output.
- A task called WriteGreeting that uses the input defined in the workflow definition to echo the phrase “Hello <input>!” The echo phrase is then written to a text file output.
To get started, follow the instructions below.
Download the easy puzzle WDL file
- Navigate to the cloned copy of the WDL-puzzles workspace (master copy here: https://app.terra.bio/#workspaces/help-gatk/WDL-puzzles)
- Go to the Data page
- Select the Workspace Data table
- Select the link for the easy-puzzle
- Choose Download
- Open the easy-puzzle.wdl file with your favorite text editor (commonly-used editors include Sublime and Atom, but any text editor will work!)
Complete the puzzle
Using the instructions below, fill in the missing input variable name.
- Examine the easy-puzzle WDL and find the input section of the HelloInput workflow
Notice the input variable (a string) is missing and has been replaced with “...”. - Fill in the missing variable name for the string
Hint: You can use the workflow “call” section as well as the WriteGreeting task to identify the appropriate variable name. - Check your answer using the Answer Key below. You'll also see a correct copy of the workflow script if you look at the script for easy-puzzle-solved workflow in the workspace.
Answer Key:
workflow HelloInput {
input {
String name
}
Save the WDL
After you’ve completed the exercise, save the WDL with a unique name.
Advanced WDL puzzle
Advanced WDL puzzle
Background:
For this puzzle, you will fill in missing task inputs (Part 1) and workflow outputs (Part 2) that are needed to run a WDL workflow that validates BAM files. Specifically, the workflow takes in a BAM or SAM file and runs a task that uses the ValidateSamFile tool to output a validation report demarcating errors in file formatting, alignments, and tags.
To get started, find your favorite text editor and follow the step-by-step instructions below.
Download the advanced puzzle WDL file
- Navigate to the cloned copy of WDL-puzzles workspace (https://app.terra.bio/#workspaces/help-gatk/WDL-puzzles)
- Go to the Data page
- Select the Workspace Data table
- Select the link for the advanced-puzzle
- Choose Download
- Open the advanced_puzzle.wdl file with your favorite text editor (commonly-used editors include Sublime and Atom, but any text editor will work!)
Part 1: Define task inputs correctly
Tasks use input variables that can be defined in both the workflow definition and in the task itself.
Using the instructions below, find the puzzle’s missing input variables and correct them.
- Scroll down the WDL to find the ValidateBAM task
- Examine the task input {} section
Notice the task inputs are broken up into two sections: Command parameters and Runtime parameters. The input variable names are missing in both sections and have been replaced with “...”.
There are four missing Command parameter variable names: a file, an optional string (demarcated by "String?"), and two required strings. There is one missing Runtime parameter variable (a string). - Fill in the missing variable names
Hint: Use the Workflow Definition’s input {} section and the ValidateBAM task’s command {} section to identify the missing variable names. For the optional string variable name, try to find a section of the command that already has a default value. - Check your answer using the answer key below. You’ll also find a copy of the correct script in the workspace advanced_puzzle_solved workflow.
Answer Key (Part 1)
task ValidateBAM {
input {
File input_bam
String output_basename
String? validation_mode
String gatk_path
# Runtime parameters
String docker
}
Part 2: Define workflow definition outputs correctly
Running a GATK command isn’t terribly meaningful if you don’t create an output. This workflow's task creates a validation report that lists problems with file formatting, faulty alignments, incorrect flag values, etc., but this file will not be a workflow output unless we specify it in the workflow definition outputs.
Using the instructions below, assign a name for the validation_report variable.
- Scroll to WDL Workflow Definition section, which defines the ValidateBamsWf
- Examine the workflow definition’s Output {} section
Notice the output is an array of files that is assigned the variable name “validation_reports”, but the assignment for the variable is missing and replaced with “...”. - Fill in the missing assignment for the validation_reports variable
- Check your answer using the Answer Key below. You’ll also find a copy of the correct script in the workspace advanced_puzzle_solved workflow
Answer Key (Part 2)
output {
Array[File] validation_reports = ValidateBAM.validation_report
}
Save the WDL
After you’ve completed the exercise, save the WDL with a unique name.
Uploading your solved WDL puzzle to the Broad Methods Repository
- Navigate to the Broad Methods Repository
- Select Create New Method
- Type a unique namespace (like a folder for your WDLs) and a unique workflow name
- Select Load from file... to upload your WDL OR copy and paste the script into the box
- Select Upload on the bottom of the dialogue box
When you upload a WDL, the Methods Repository will run a validation step in the background using a WDL syntax checker called WOMtool. If the puzzle is not correctly solved, you'll see an error. You can edit the WDL directly from the dialogue box and re-upload.
Exporting your WDL to your workspace
- From the WDL summary page in the Broad Methods Repository, select Export to Workspace
- Select Use Blank Configuration in the dialogue box
- Select a Destination Workspace
- Select Export to Workspace
- Proceed to the Workspace by selecting Yes on the dialogue box
Setting up and running your WDL using a workspace data table
You can set up and run the WDL puzzle workflows on real data provided on the WDL-puzzles workspace Data page. Select the instructions for your puzzle below to get started.
Easy WDL puzzle
Easy WDL puzzle
The easy puzzle workflow has one input variable ("name") and one output variable ("command") to which we need to assign attributes. Specifically, we must choose a name to use as input to our HelloInput workflow and tell Terra where we want to write the output.
We could assign a name manually by typing it as a string on the workflow Inputs configuration. But if we had to run our workflow to echo multiple names (say thousands of names), we'd want to use a data table like the hello_world_name table that's on the workspace Data page. By using the data table, we not only have the option to iterate, we can also keep track of our outputs, which will be an individual text file for each name we run.
Step-by-step instructions:
- Go to the Workflows tab and select the easy puzzle you imported from the Methods Repository
- Scroll to Step 1 on the Workflow set up page
- Select the hello_world_name table as the root entity
- Click Select Data
- Choose "Process all 3 rows"
- Select OK
- Go to the Inputs tab
- Assign the "name" variable to an attribute
Specifically, you want to use the hello_name column of the hello_world_name data table which contains the list of names that we want to use as input. To specify a column in a root entity data table, use the "this." syntax and then type the name of the column. For this use-case, type:
this.hello_name
- Select Save
- Go to the Outputs tab and assign the command variable to an attribute
This can be a new column in the root entity data table using the "this." syntax. Your column can have any name. For example, you could specify a new column called "my_output" by typing:
this.my_output
- Select Save
- Select the Run Analysis icon
- Click Launch
- You'll be redirected to the Job History page. When your workflow successfully runs, you will see a green checkmark under the Status column.
- Confirm the outputs in the hello_world_name data table
Your hello_world_name data table should now have a my_output column. - Click the stdout link
- In the Preview, you should see your workflow command
Congratulations! You successfully solved and ran your WDL puzzle workflow!
Advanced WDL puzzle
Advanced WDL puzzle
The advanced puzzle workflow has one input variable ("bam_array") and one output variable ("validation_reports") to which we need to assign values. Specifically, we must choose which BAM files to use as input and tell Terra where we want to write the output.
We could assign an array of BAM files manually by typing them as strings on the workflow Inputs configuration. But if we had to run our workflow to validate multiple sets (say hundreds of BAM files each), we'd want to use a data table like the bam_set table that's on the workspace Data page. By using the data table, we not only have the option to iterate, but we can also keep track of our outputs, which will be an array of validation reports for each set.
Step-by-step instructions:
- Go to the Workflows tab and select the advanced puzzle you imported from the Methods Repository
- Scroll to Step 1 on the Workflow set up page
- Select the bam_set table as the root entity
- Click Select Data
- Select "Choose existing sets"
- Click the checkbox next to Set1
- Select Ok
- Go to the Inputs tab
- Assign the "bam_array" variable to an attribute, specifically the BAM files in the data table
To do this, we first need to specify the column in the root entity data table that contains the BAM file IDs using the "this." syntax (this.bams), and then the column in the bam data table that contains the cloud locations for each BAM file (the file_path column). The final attribute is:this.bams.file_path
- Select Save
- Go to the Outputs tab and assign the validation_reports variable to an attribute
This can be a new column in the root entity data table. Your column can have any name. For example, you could specify a new column called "my_output" by typing:
this.my_output
- Select Save
- Select the Run Analysis icon
- Click Launch
- You'll be redirected to the Job History page
When your workflow is successful, you will see a green checkmark next to the run. - Confirm the outputs in the data table. Your bam_set data table should now have a my_output column.
- Click the items in the my_output column- you should see Google Bucket location for each BAM file.
Congratulations! You successfully solved and ran your WDL puzzle workflow!
Next Steps
To learn more about WDL, check out the resources below:
OpenWDL
Learn how to stay in touch and participate with the global WDL community. WDL global WDL community.
learn-wdl
Try an open source WDL course on GitHub that includes video tutorials as well as example WDL scripts and resources.