Workflows Quickstart Part 2 - Configure workflow to run on your own data

Allie Hajian
  • Updated

Welcome to the Workflows Quickstart Tutorial Part 2 - You'll learn how to set up the workflow in the configuration form to analyze two single entities of genomic data from the data table. When you run the workflow, Terra will generate a set of the two samples that you can use for further analysis.

The tutorial uses the same file format conversion workflow from Part 1, where everything was already set up to run. In Part 2, we'll walk through the workflow setup from scratch in the UI. 

G0_warning-icon.png


Learning objectives _ time and cost to complete

 

What you will learn
You'll learn the parts of a workflow configuration form, including general options for
running the workflow. You'll learn how to configure the workflow to read input data from
the table and write links to generated data files to the same table. You'll see how Terra
generates a set of input data you can use to run downstream analysis. 

How much will it cost? How long will it take? 
The exercise should take no more than fifteen minutes (unless you are in the queue a long
time) and cost a few pennies.

HINT: Right click to open the tutorial demo in a new tab

1. Set (confirm) workflow options

1.1. Go to the Workflows tab and click on the "Part2_CRAM_to_BAM" workflow to reveal the configuration form.

1.2. Verify default options
Most of the defaults are fine (in order from the top left down): you don't need to change the version of the workflow (dropdown menu), you want to run on inputs defined by the data table (radio button), and the "Root entity type" is "sample" (see the Step 1 dropdown in the screenshot below).

Workflow-QuickStart_Part2_Configuration-form_Screen_shot.png

1.3. Select Data (highlighted above in orange)
Notice that the workflow will run on all samples in the workspace data table by default and generate a set of that group of samples. This is useful if you want to run this or a downstream workflow on the same group of samples.

To choose some particular subset of samples, or to change the name Terra gives the set of samples, click on the blue "Select Data" button. You'll be taken to the Select Data form:

Workflows-QuickStart_Part2_Select-data-set_Screen_shot.png
1.3.1. Select the "Choose specific rows to process" radio button and select both samples in the data table by clicking on the squares at left. This would be the place where you could select particular rows of data. 

1.3.2. Give your set a memorable name

1.3.3. Confirm your selection by clicking the "OK" button

The next step tells the workflow exactly to find inputs in the "root entity" data table.

2. Set up workflow Inputs to read from data table

The Inputs tab is where you will give the workflow all of the parameters and attributes it needs. Note that some are optional inputs but you will need to complete anything with a yellow warning icon next to it. Workflows-QuickStart_Part2_Configure-inputs_Screen_shot.png 
You need to give the WDL instructions about where to find each variable in the table. You will do this with a particular format in the attribute field. 

2.1. Go to the first required variable that is blank - "InputCram". 

2.2. Start typing this. in the "Attributes" column and you will see a drop-down menu appear.

G0_tip-icon.png


What does the Input attribute formatting mean? 

 

The "this.something" format tells the workflow "go to the root entity type table and
look in the 'something' column to find the input for this variable."

For example, this.CRAM tells the WDL 
a. this. means "go to the root entity table (the sample table, in this case)
b. CRAM after the period means go to the CRAM column in the table for this file.

The drop-down includes all the columns available in the "sample" data table. 

2.3. Select "this.CRAM" from the dropdown

2.4. Go to the next blank variable, "SampleName" and follow the same process
The SampleName variable corresponds to the "sample_id" option in the dropdown: it's the unique identifier for that sample from the first column of the sample table).

The "this.sample_id" format tells the WDL to find the value for the
SampleName variable in the sample_id column of the root entity table  

2.5. Save your inputs by clicking the blue "Save" button at the top right of the form. 

Next you'll configure workspace-level attributes, like reference or index FASTA files. You'll do this in the Inputs section of the configuration form, too.

G0_tip-icon.png


Workspace-level attributes (i.e. workspace data)

 

There is a dedicated place in the workspace data page for workspace-level metadata.
Usually these include reference files, docker images, etc.

For programmers, think of these as your global variables. They are values that don’t
change from sample to sample. Storing them in the workspace data table helps scale your
analysis. 

3. Set up workspace-level inputs

Workflows-QuickStart_Part2_Configure-worspace-variables_Screen_shot.png

3.1. Go to the first workspace-level variable, "RefIndex"

3.2. Start typing "workspace." in the Attributes field

The "workspace.something" format tells the WDL the value for this
variable is in the "something" row of the "workspace data" table.

3.3. Select the right attribute from the dropdown
(hint - use the variable name in the variable column to figure out what to choose in the dropdown)

3.4. Repeat for every required workspace-level variable

3.5. Save your setup by clicking the blue "Save" button at the top right of the form 

Using direct references to inputs in Google bucket (instead of the table)

If you don't want to use the data table, you can reference a file in a Google bucket directly by typing the full path name in the attribute field (Note you must include the quotation marks):

"gs://url-to-file-in-bucket"

Workflows-QuickStart_Ex2_Input-full-file-paths.png

4. Write output file paths to the data table

Generated data is stored in the workspace bucket by default. You have the option in the configuration form to write links to the output files in the same data table that contains the input data. Writing to the data table keeps generated data organized and associated with the input data.
You'll use the same "this.something" formatting used for reading Inputs. Note that if the "somethin" column does not exist in the data table, the workflow will create one.
4.1. Start in the Outputs tab.

Workflows-QuickStart_Configure_Outputs_Screen_shot.png

4.2. For the first output variable, "outputBai", go to the attribute field and type in "this." + a name for your output file.
In the dropdown, you will see all the columns from the "sample" data tableThis includes the original columns in the data table (i.e. the ones you used for Input variables) as well as the columns the workflow created for the outputs generated in Part 1.

4.3. Try typing in a different output name for this run!  

Workflows-QuickStart_Part2_Configure-outputs_Screen_shot.png

4.3. Save your output attributes by selecting the blue button at the top right. The Run Analysis button should turn blue (if it doesn't, you might need to go back and fill in an attribute, or save).

5. Launch and monitor analysis

5.1. Click on Run Analysis.

5.2. Click Launch to finalize your submission.

Workflows-QuickStart-Part2_Confirm-launch_Screen_shot.png

Notice that this will submit two analyses, one for each sample. When your jobs are submitted, you'll be redirected to the Job History page to monitor your submission. To see job status updates, refresh the page.

What to expect when your workflow completes 

Hopefully your submission will succeed. If so, congratulations on setting up the workflow and running on your own data!! 

Your Job History tab (once you refresh) will look like this:
Workflows-QuickStart_Ex2_Completed-submission_Screen_shot.png

And your sample data table will look like this (notice the additional columns for the generated BAM and BAM index files)
Workflows-QuickStart_Part2_Sample-set-outputs_Screen_shot.png

Notice the additional "sample_set" table! This is a group that includes the two samples in your "sample" table.

You can choose to run this analysis again or to run a downstream analysis on this group by choosing the set when you choose the data to run on (we'll do that in Part 3!).

G0_tip-icon.png


What's in the sample set? Where's the data? 

 

If you click on the "2 samples" you will see that this column includes the
names of the samples only, not the data files associated with those samples.

Links to the data files are in the "sample" table. When you run an analysis on a
set of single entities, you need to set up the workflow to find the data (Part 3).

 

Workflows-QuickStart_Part2_Set-table-output_Screen_shot.png

G0-smiley-icon.png


Congratulations! 

 

You've completed Part 2 of the Workflows Quickstart

Next up: Part 3 - Run downstream analysis on output data

 

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.