At a basic level, configuring a Tool to run on your data means defining or customizing the parameters a WDL needs to run. These include Inputs - such as reference files, compute parameters, and input data files - and Outputs. Some Tools (such as those that come from a Featured Workspace) will be preconfigured, but many will not. Even if a Tool is preconfigured, there may be use cases when you will not want to use the default values. For example, you may want to adjust the name of an input or output file to be more meaningful to you or to match the input for a second WDL, or change the reference file that comes along with the Tool.
This article will walk you through finding and modifying attributes of your tool manually using the Terra GUI. A future article will outline how to use a json to help configure files so you do not have to do it manually (especially useful if you anticipate using much the same configurations many times).
Tool inputs and outputs
Clicking on the Tool name (on the card in the Tools tab of the workspace) will yield a lot of detail about the inner workings of the WDL. The Inputs and Outputs buttons will tell you the input and output variable names and attribute type for every subtask within the workflow. If the Tool is already configured, such as in Featured Workspaces and some curated Tools, the Inputs attributes (the actual integer, string, or file) will be filled in (see screenshot below):
What are attributes?
Attributes are integers, strings, or files the variable represents. File-type attributes can be part of the Data Table or the Workspace Data (references), or even hard-coded in.
- Any attribute from the Data Table has the prefix "this." (such as "this.sample_id" in the screenshot above). For a primer on setting up a Data Table to link to your data, click here.
- Any attribute from Workspace Data has the prefix "workspace." (such as "workspace.ref_fasta" in the screenshot)
- Any file or string-type attribute in quotes is hard-coded into the Tool (such as "3500 MB" in the screenshot above). Note that you can hard-code input data files by including the whole path and file name in quotes (i.e. "gs://fc-2c4a1a1e-92fe-4c1b-876d-58b9222f9aae/NA12878.cram").
Successful runs require matching attributes
Example 1: Matching sample_ids in the Data Table
In the Tool inputs, the variable "SampleName" corresponds to the attribute "this.sample_id" (see screenshot):
The string prefix "this." tells us that the variable is in the Data Table. So the sample_id in the Table of the Data tab will be used as the SampleName in the task CramToBamFlow. Looking at the Data tab for the workspace (screenshot below) confirms this. The string "sample_id" is the header for the samples in the Data Table. Because it matches the Tools attribute, the Tool will be able to find the right input file when launched, and the run will succeed.
Example 2: Matching Data Inputs from the Data Table
In the Tools Inputs, the file-type variable "InputCram" has the attribute "this.cram" (see screenshot below):
In other words, the input for the CramToBamFlow is a cram file in the Data Table. You know it's in the Data Table because the attribute starts with "this." and the Table header matches the Tool attribute. When it runs, the tool will know to refer to the Data Table to find the metadata location for the actual input data file:
Setting inputs and outputs
Within the Tools Inputs and Outputs buttons, you will need to add (or modify, if the Tool is preconfigured) all of these values by typing in the attributes in the right-hard column and saving. See the screenshots below for an example of what your Tool Inputs will look like before and after you have configured the Tool. Don't forget to hit the Save button after filling in the attributes!
You will go through a similar process for the Tool Outputs. By default, output data will be written to the workspace Google bucket.
Note that if you want to write output files to the data Model (so the metadata will be written to the Data Table in your workspace), you will need to use the format "this.filename" for your Output attribute.
Below are screenshots of your Outputs before and after manually filling them in. Don't forget to save!
Direct links for inputs and outputs
If you don't want to use the Data Model, you can hardwire your Tools with direct links to input data in a Google bucket (don't forget to Save the changes!!). The downside to direct links? They're permanent (so you might have to rewrite them often) and they do not update your workspace Data Model (so any outputs will have to be hardcoded into the next WDL you use as inputs). See an example of a hardcoded data file in the screenshot below:
Verifying output files
If you've made your output attributes of the format "this.", the Tool will write to the Data table. You will see additional metadata for these output files in the Data Table after a successful run.
For example, after running the WDL in the Quickstart practice workspace, you can see that the sample table now contains three extra columns of output metadata. The metadata references files in the workspace Google bucket; outputBai, outputBam, and output_validation_report (see screenshot below).This data is now easily available for use by other WDLs in your workspace.
Whether or not you write to the Data Table, you can find the output files in your workspace Google bucket by clicking on "Files" in the left column of the Data tab:
To see a video tutorial of configuring a Tool, click here
To practice modifying configs and running Tools (Exercise 2), click here
Note that to run the exercises you will need to clone this workspace to your own billing project.
Isn't there an easier way??
It's tedious (not to mention error prone) to type in every attribute by hand. For this reason, json files can vastly simplify the process. We'll cover json config files, how to find them and use them, in an article coming soon!