Configuring a workflow means customizing the parameters the WDL needs to run to meet your specific needs: defining Inputs - including reference files, compute parameters, and input data file names and locations - and Outputs.
Some workflows (such as those in a Featured Workspace) will be preconfigured, but many will not. Even if a workflow comes preconfigured, there may be times when you will not want to use the default values. For example, you may want to adjust the name of an input or output file to be more meaningful to you or to match the input for a second WDL, or change the reference file that comes along with the workflow.
This article will walk you through modifying attributes of your workflow manually using the Terra interface. A future article will outline how to use a json to configure files so you do not have to do it manually (especially useful if you anticipate using much the same configurations many times).
- Workflow inputs and outputs
- What are attributes?
- Setting inputs and outputs in the Terra UI
- Using full paths for inputs and outputs
- Successful runs require matching attributes
- Verifying outputs
- Next steps
- Practice configuring and running a workflow
Workflow inputs and outputs
Clicking on the workflow name (on the card in the Workflows tab of the workspace) will provide details about the inner workings of the WDL. The Inputs and Outputs tabs are where to find variable names and attribute types for every subtask within the workflow. If the workflow is already configured - as Featured Workspaces and some curated workflows are - the attributes will be filled in (see screenshot below) for the required variables:
What are attributes?
Attributes are the integers, strings, or files, the variables represent. They are in the last column of the Inputs tab (see screenshot above). File-type attributes can reference the data or workspace data tables, or hard-coded in with complete paths (i.e. gs://kdjvHFHL).
- Attributes defined in the data table have the prefix "this." (see the fourth row - "this.sample_id" - in the screenshot above). For a primer on populating the workspace Data Table to link to your data, click here.
- Attributes from the workspace data table have the prefix "workspace." (such as the fifth row - "workspace.ref_fasta" in the screenshot)
- File or string-type attribute in quotes are hard-coded into the WDL (such as the sixth row - "200" - in the screenshot above). Note that you can hard-code input data files by including the whole path and file name in quotes (i.e. "gs://fc-2c4a1a1e-92fe-4c1b-876d-58b9222f9aae/NA12878.cram").
Setting inputs and outputs in the Terra UI
Within the workflow's Inputs and Outputs buttons, you will need to add (or modify, if the workflow is preconfigured) all of these values by typing in the attributes in the right-hard column and saving. See the screenshots below for an example of what your Inputs will look like before and after you have configured the workflow.
Don't forget to hit the Save button after filling in the attributes!
You will go through a similar process for the Outputs.
By default, output data will be written to the workspace Google bucket. Note that if you want to write metadata for the output files to the workspace data table, you will need to use the format "this.filename" in the Output attribute.
Don't forget to save!
Setting direct links for inputs and outputs
If you don't want to use the workspace data table, you can use direct links to input data in a Google bucket for workflow attributes (don't forget to Save the changes!!). The downside to coding in direct links is that they're fixed (so you might have to rewrite them often, if you move or change your data) and they do not update your workspace data table. See an example of a hardcoded input file in the screenshot below:
Successful runs require matching attributes
Example: Matching sample_ids in the Data Table
It's straightforward to check that the attributes in the data table and the workflow card match. In the Workflow inputs, the variable "SampleName" corresponds to the attribute "this.sample_id" (see screenshot):
The string prefix "this." tells us that the variable is in the Data Table. So the sample_id in the Table of the Data tab will be used as the SampleName in the task CramToBamFlow. Looking at the data table (screenshot below) confirms that the string "sample_id" is the header for the samples. Because it matches the attribute, the workflow will be able to find the right input file when launched, and the run will succeed.
Verifying output files
If your output attributes have the format "this.your_filename", the workflow will write output metadata to the "your_filename" column of the data table. You'll see the additional metadata for these output files in the data table after a successful run.
For example, after running the WDL in the Quickstart practice workspace, you can see that the sample table now contains three extra columns of output metadata. The metadata references files in the workspace Google bucket; outputBai, outputBam, and output_validation_report.This data is now easily available for use by other WDLs in your workspace.
Whether or not you write to the data table, you can find the output files in your workspace Google bucket by clicking on "Files" in the left column of the Data tab:
Note about output file folders: Each time you Launch a workflow, a unique submission ID is assigned. This submission ID is also the name of the output folder in the workspace Google bucket. Outputs from multiple submissions of the same workflow in the same workspace will not be overwritten since they are in different submission ID folders.
To see a video tutorial of configuring a workflow, click here
To practice modifying attributes and running workflows (Exercise 2), click here
Note that to run the exercises you will need to clone this workspace to your own billing project.
Isn't there an easier way??
It's tedious (not to mention error prone) to type in every attribute by hand. For this reason, json files can vastly simplify the process. We'll cover json config files, how to find them and use them, in an article coming soon!