Configuring a workflow to run on your data means defining or customizing the parameters the WDL needs to run. These include Inputs - such as reference files, compute parameters, and input data file names and locations - and Outputs. Some workflows (such as those in a Featured Workspace) will be preconfigured, but many will not. Even if a workflow is preconfigured, there may be times when you will not want to use the default values. For example, you may want to adjust the name of an input or output file to be more meaningful to you or to match the input for a second WDL, or change the reference file that comes along with the workflow.
This article will walk you through finding and modifying attributes of your workflow manually using the Terra interface. A future article will outline how to use a json to help configure files so you do not have to do it manually (especially useful if you anticipate using much the same configurations many times).
- Workflow inputs and outputs
- What are attributes?
- Setting inputs and outputs in the Terra UI
- Setting direct links for inputs and outputs
- Successful runs require matching attributes
- Verifying outputs
- Next steps
- Watch a video
- Practice configuring and running a workflow
Workflow inputs and outputs
Clicking on the workflow name (on the card in the Workflows tab of the workspace) will yield a lot of detail about the inner workings of the WDL. For example, the Inputs and Outputs buttons will tell you variable names and attribute types for every subtask within the workflow. If the workflow is already configured - as Featured Workspaces and some curated workflows are - the Inputs attributes (the actual integer, string, or file) will be filled in (see screenshot below):
Note: Each time you Launch a workflow, a unique submission ID is assigned. This submission ID is also the name of the output folder in the workspace Google bucket. Outputs from multiple submissions of the same workflow in the same workspace will not be overwritten since they are in different submission ID folders.
What are attributes?
Attributes are the integers, strings, or files, the variables represent. They are in the last column of the Inputs button (see screenshot above). File-type attributes can be part of the Data Table or the Workspace Data (references), or hard-coded in with complete paths (i.e. gs://kdjvHFHL).
- Attribute from the Data Table has the prefix "this." in the inputs or outputs attribute column (see the fourth row - "this.sample_id" - in the screenshot above). For a primer on populating the workspace Data Table to link to your data, click here.
- Attribute from Workspace Data has the prefix "workspace." (such as the fifth row - "workspace.ref_fasta" in the screenshot)
- File or string-type attribute in quotes are hard-coded into the WDL (such as the sixth row - "200" - in the screenshot above). Note that you can hard-code input data files by including the whole path and file name in quotes (i.e. "gs://fc-2c4a1a1e-92fe-4c1b-876d-58b9222f9aae/NA12878.cram").
Setting inputs and outputs in the Terra UI
Within the workflow's Inputs and Outputs buttons, you will need to add (or modify, if the workflow is preconfigured) all of these values by typing in the attributes in the right-hard column and saving. See the screenshots below for an example of what your Inputs will look like before and after you have configured the workflow.
Don't forget to hit the Save button after filling in the attributes!
You will go through a similar process for the Outputs. By default, output data will be written to the workspace Google bucket.
Note that if you want to write metadata for the output files to the workspace data table, you will need to use the format "this.filename" for your Output attribute.
Below are screenshots of your Outputs before and after manually filling them in. Don't forget to save!
Setting direct links for inputs and outputs
If you don't want to use the workspace data table, you can hardwire your workflows with direct links to input data in a Google bucket (don't forget to Save the changes!!). The downside to coding in direct links? They're fixed (so you might have to rewrite them often, if you move or change your data) and they do not update your workspace data table. See an example of a hardcoded input file in the screenshot below:
Successful runs require matching attributes
Example 1: Matching sample_ids in the Data Table
In the Workflow inputs, the variable "SampleName" corresponds to the attribute "this.sample_id" (see screenshot):
The string prefix "this." tells us that the variable is in the Data Table. So the sample_id in the Table of the Data tab will be used as the SampleName in the task CramToBamFlow. Looking at the Data tab for the workspace (screenshot below) confirms this. The string "sample_id" is the header for the samples in the Data Table. Because it matches the attribute, the workflow will be able to find the right input file when launched, and the run will succeed.
Example 2: Matching Data Inputs from the Data Table
In the workflow Inputs, the file-type variable "InputCram" has the attribute "this.cram" (see screenshot below):
In other words, the input for the CramToBamFlow is a cram file in the data table. You know it's in the data table because the attribute starts with "this." and the table header matches the workflow attribute. When it runs, the workflow will know to refer to the data table to find the metadata location for the actual input data file:
Verifying output files
If your output attributes have the format "this.stuff", the workflow will write to the workspace data table. You will see additional metadata for these output files in the data table after a successful run.
For example, after running the WDL in the Quickstart practice workspace, you can see that the sample table now contains three extra columns of output metadata. The metadata references files in the workspace Google bucket; outputBai, outputBam, and output_validation_report (see screenshot below).This data is now easily available for use by other WDLs in your workspace.
Whether or not you write to the data table, you can find the output files in your workspace Google bucket by clicking on "Files" in the left column of the Data tab:
To see a video tutorial of configuring a workflow, click here
To practice modifying configs and running workflows (Exercise 2), click here
Note that to run the exercises you will need to clone this workspace to your own billing project.
Isn't there an easier way??
It's tedious (not to mention error prone) to type in every attribute by hand. For this reason, json files can vastly simplify the process. We'll cover json config files, how to find them and use them, in an article coming soon!