Overview: Getting started with WDL

Kate Herman
  • Updated

The Workflow Description Language (WDL, pronounced 'widdle') is a community-driven, human-readable language for data processing workflows. Whether you want to use WDL in Terra or develop your own WDL workflows, we have the resources to get you started!

WDL: A community-driven data processing language

WDL is a human-readable and writable language originally developed for the Broad Institute's genomic analysis pipelines. Since its inception, it has been widely adopted and developed by a global community of researchers, programmers, engineers and analysts.

Read more about the WDL Community and how you can participate at the OpenWDL website. WDL is open-source, allowing any person to contribute ideas and improvements to the language, in addition to tutorial resources. 

WDL and Terra: Partners in pipelining

WDL workflows allow you to run whole genomic pipelines in Terra. The Terra platform is specifically designed to integrate a WDL workflow's input and output information directly into the platform. This integration is what allows you to readily configure a workflow from your Terra workspace.

To get started importing and running WDL workflows on Terra, read about Pipelining.

Developing WDL workflows: "Hello, learn-wdl!"

If you're an experienced developer looking to get started with WDL, we highly recommend the learn-wdl tutorial on GitHub. This community resource is open-source and uses a series of video guides developed by Lynn Langit to walk you through a variety of WDL script examples.

When you arrive at the learn-wdl GitHub page, start with the README and the Introductory Video.  

 

Navigating the learn-wdl GitHub

The learn-WDL GitHub repository has a series of folders containing the workflow examples used in the learn WDL videos. Read on for details about what you can find in each folder.

learn_wdl.png

1_script_examples

Find steps for setting up your Google Compute Engine Virtual Machine (GCE VM) and running your first "hello-world" scripts. This folder also includes examples of the different language patterns you may use when writing a WDL script. 

2_pipeline_examples

Read a brief overview of pipeline patterns and try the accompanying pipeline scripts as you watch the video tutorials.

3_genomic_tool_pipelines

Try commonly used workflows for genomic analyses. This folder includes the WDLs, input configuration files (JSONs), and sample data needed for each analysis.

4_sample_data

Check out example data divided by data type (BAM, VCF, FASTA, etc.). You will use this data as you work through the video tutorials.

5_reference_material

Learn more about WDL with reference material covering:

  • Language concepts, including workflow keywords, style guides, and templates
  • Setting up your Dev environment
  • External WDL resources, including the WDL 1.0 spec and a Nextflow vs. WDL Quickstart Guide
  • Courses for additional Cloud Resources (i.e. Google Cloud Platform, Galaxy, etc.)

Contribute to learn-wdl

WDL resources are always expanding and improving. Because learn-wdl is open-source, any WDL user can contribute ideas for improvements, file issues, or add additional resource material. 

New to computing? No problem! Try these high-level overviews

For anyone interested in learning more about WDL workflows, we curate several guides that provide a high-level overview of the basic WDL script components. These guides are split into two sections: Writing a WDL and Running a WDL. 

Update September 2020: We are in the process of integrating and updating our WDL tutorials. In the meantime, you may find that some GATK commands are out of date, or that the WDL information is incomplete. If you encounter any issues you can't solve, please let us know.

Writing a WDL

First, we'll introduce the building blocks of WDL and how they fit together to form the base structure of a WDL script. Then we'll show how to add variables so that input files and parameters can be specified outside of the script itself, which will allow you to use the same script for different runs without modification. Next, we'll cover how to add plumbingi.e. how you can chain together the components that perform units of work in different ways to form sophisticated pipelines. Finally we'll look at how to validate syntax, which sounds boring but is really helpful since it will tell you quickly whether your WDL script is runnable or not. Because nobody likes starting a run only to see it fail because it's missing a semi-colon somewhere.

Running a WDL

This is going to be short and sweet. We'll show you how to generate a JSON template for specifying inputs (spoiler: it's super easy) and fill it out. Then we'll present the main options for executing your WDL script, focusing on the execution engine we use for Terra, which is called Cromwell. If you had asked us two years ago if we'd ever have an execution engine that we could use both in development and in production, locally and on the cloud, we would have said when pigs fly...

 
 

Next steps

We will continue to add WDL resources to this document as they are developed. If you have a resource that you would like us to highlight, please post in our community forum.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.