Welcome to part 2 of the Workflows Quickstart. You'll learn how to set up the workflow in a blank configuration form and you'll analyze two single entities of genomic data from the data table. When you run the workflow, Terra will generate a set of the two samples to use for further back-to-back analysis in Part 3.
The tutorial uses the same file format conversion workflow from Part 1, where everything was already set up to run. In Part 2, we'll walk through in more details the process of setting up the workflow in Terra.
Learning objectives - Time and cost to completeWhat you will learn
You'll learn the parts of a workflow configuration form, including general options for running the workflow. You'll learn how to set up the workflow to read input data from the table and write links to generated data files to the same table. You'll see how Terra generates a set of input data you can use to run downstream analysis.
How much will it cost? How long will it take?
The exercise should take no more than fifteen minutes (unless your submission is in the queue a long
time) and cost a few pennies.
HINT: Right click to open the tutorial demo in a new tab
Step 1: Select workflow and input data
Overview
For this second Workflows Quickstart exercise, you'll process two samples from the input table with the same CRAM-to-BAM workflow as in part 1. In this case, the form will be mostly blank, and you will go through the setup process from scratch.
Terra will run the two workflows in parallel and generate a set of those two samples that you will later use for a follow-up workflow: running back-to-back workflows (i.e. turning workflows into pipelines).
Step-by-step instructions
1.1. Go to the Workflows page and select the Part-2_CRAM-to-BAM workflow card.
1.2. Confirm that the "Run workflow(s) with inputs defined by data table" radio button is checked and the root entity type is sample.
1.3. Choose the Select Data button. You will be directed to the Select Data form (below).
1.4. Click the Choose specific samples to process radio button.
1.5. Check the box next to the samples NA12878 and my_sample.
Notice that when you choose more than one sample to process, Terra automatically generates a set that includes the subset you've chosen. This makes it easier to repeat an analysis, or run back-to-back analyses on the same subset.
1.6. To change the name Terra gives the set, type in an easily memorable name in the field.
1.7. Confirm your selection by clicking the blue OK button.
Step 2. Specify workflow inputs (attributes) from data table
What are input attributes?
Attributes are the integers, strings, or files that correspond to input variables in the workflow. These were pre-configured in the form for Part 1.
How do you specify input data?
You'll specify input data by filling in the Attributes fields in the setup form for all required variables from the input (root entity) table or the workspace data table.
Step-by-step instructions
2.1. Go to the first required variable that is blank - "InputCram" - and click inside the attribute field. You'll see a dropdown menu of all the inputs available in both the sample table and the workspace data table.
The drop down menu lists all the columns in the "sample" data table as well as all the workspace-level resource files in the workspace data table. You can usually figure out which to choose from the variable name (second column).
How to specify input data from the root entity tableThe this.something
format tells the workflow "go to the root entity type table and look in the 'something' column to find the input for this variable."
For example, this.CRAM
tells the WDL two important bits of information about the input files.
a. this.
means "go to the root entity table (the sample table, in this case)
b. CRAM
after the period means go to the CRAM column in the table for this file.
2.2. Select this.CRAM
from the dropdown menu i.e. the data file in the CRAM column of the input table).
2.3. Go to the next blank variable, SampleName, and follow the same process. See the hint below for help with choosing from the dropdown.
-
Sometimes, especially if you didn't write the WDL yourself, you will have to make an educated guess at the attribute that matches the variable name.
The
sample_id
is the unique ID (name) for each sample in the sample table.Thus,
this.sample_id
is the correct attribute to use for this variable. The "this.sample_id" format tells the WDL to find the value for theSampleName
variable in thesample_id
column of the root entity table.
2.4. Repeat for each variable with a blank attribute field. Use the variable name (second column) to help figure out what attribute to choose from the dropdown.
Some variables will be from the input table (these start with "this.
") and some will be global variables from the workspace data table (these start with "workspace.
").
Workspace-level resources (i.e. reference files, Docker images, etc.)
Attributes in the dropdown that begin with workspace.
are from the workspace data table. These workspace-level resources can include Docker images, reference FASTA files or other inputs that are used for analyzing any entities.
Even if you aren't familiar with the reference file in the dropdown, you can take a guess, based on the variable name, what is the right one to select in the dropdown. Click to see the example below.
-
Sometimes, especially if you didn't write the WDL yourself, you will have to make an educated guess at the attribute that matches the variable name. You usually don't have to know what exactly the file it to do this!
For example,
RefDict
is a kind of reference file used for converting file formats.Even if you don't know exactly what it is, looking at the dropdown, it's a good bet that
workspace.ref_dict
is the correct attribute to use for this variable. The "workspace.ref_dict" format tells the WDL to find the value for theRefDict
variable in theref_dict
column of the workspace data table.
2.5. Save your inputs by clicking the blue "Save" button at the top right of the form.
Step 3: Write output file paths to the data table
Generated data files are stored in the workspace bucket by default. You have the option in the configuration form to write links to the files back to the same table that contains the input data. Writing to the data table keeps generated data organized and associated with the input data.
Formatting requirement for attributes from a table You'll use the same "this.something" formatting (from the dropdown) as you did for inputs. Note that if the "something" column does not exist in the data table, the workflow will create one.
3.1. Start in the Outputs tab of the setup form.
3.2. For the first output variable, "outputBai", go to the attribute field and type in "this." + a column name for your output files in the table.
You may not want to choose from the dropdown When you click into the blank field, you will see a dropdown with all the columns that exist in the "sample" data table. This includes the original columns in the data table (i.e. the ones you used for input variables) as well as the columns the workflow created for the outputs generated in Part 1.
To create a new column in the data table for the generated output, type the new column name into the attribute field, not pick from the dropdown!
3.3. Type in a different output name than the default from part 1 for this run.
3.4. Save your output attributes by selecting the blue SAVE button at the top right.
The Run Analysis button should turn blue (if it doesn't, you might need to go back and fill in an attribute or click the Save button).
Global and cost-saving runtime options (optional)
There are several other options you can configure on the setup form. Note that the default options for anything else on the setup form are fine when running the quickstart. To learn more about these, see Workflow setup: VM and other options.
Step 4. Launch and monitor analysis
Now that you have everything set up, you are ready to submit the job and let Terra take care of the details of running the workflow in the background on a cloud VM.
4.1. Click on Run Analysis.
4.2. In the new form, click Launch to finalize your submission.
Notice that this will launch two analyses in parallel, one for each sample.
When your jobs are submitted, you'll be redirected to the Job History page to monitor your submission.
To see job status updates, refresh the page.
5. What to expect when your workflow completes
Hopefully your submission will succeed. If so, congratulations on setting up the workflow and running on input data in a table. If it failed (and especially if it failed immediately), check to make sure you selected the right input variables from the dropdown.
5.1. Check (refresh) your your Job History page. It should look like the screenshot below.
5.2. Check the sample table in your Data page. It should look similar to this, with additional columns for the generated BAM and BAM index files.
5.3. Click on the sample_set table. This new table is a set that includes the two samples you ran from the sample table.
Now that you've defined a set, you can choose to run this analysis again or to run a downstream analysis on this group by choosing the set when you choose the data to run on (we'll do that in Part 3!).
What's in the sample set? Where's the data? If you click on the "2 samples" you will see that this column includes the names of the samples only, not the data files associated with those samples.
Links to the data files are in the "sample" table. When you run an analysis on a set of single entities, you need to set up the workflow to find the data (Part 3).
Congratulations! You've completed Part 2 of the Workflows Quickstart! |