How to use DRS URIs in a workflow (GCP)

Allie Cliffe

Learn how to use DRS URIs as inputs to workflows in two different ways.

You can use DRS URIs as inputs to workflows in two ways: 1) via the data table, or 2) via direct paths in the workflow inputs configuration. In both cases, the workflows should access and process the data without further intervention. See example screenshots of each below. 

DRS URIs, egress, and your data's cloud locationFor data files with copies in both Google and Azure cloud storage, Terra will resolve DRS URIs to fetch the data from the cloud of your VM. In other words, when working in Terra on GCP, Terra will automatically use files stored in GCS, and when working in a Terra on Azure workspace, Terra will pull the Azure data file. This saves egress costs by avoiding copying data from a different cloud provider. This applies to some important datasets like open access 1000 genomes. 

DRS URIs in a workspace data table

DRS-URIs-Overview_Link-to-data-file-in-ga4gh_drs_uri-column_Screenshot.pngNote: The workflow configuration references the table with the format "this.object"

Closeup view

DRS-URIs-Overview_Link-to-data-file-in-table_Screenshot.png

DRS URIs entered directly as workflow input

DRS-URI-in-workflow-configuration-pane_Screenshot.png

Closeup view

DRS-URI-in-workflow_Closeup-Screenshot.png

Configuring workflows with DRS URIs (PFB or TDR prefix)

If you export your data table from a data repository or the Terra Data Repo, it includes a pfb or tdr prefix in the data table.

Configure-workflows-inputs_pfb-namespace-in-data-table_Screen shot.png

You  must include the pfb or tdr prefix when running a workflow on data from a tableThe (required) attribute syntax is this.pfb:file-type or this.tdr:file-type.

Note: This syntax will show up in the drop-down menu when you click into the attribute field (see screenshot below - click on image to enlarge).

Configure-workflow-inputs_pfb-namespace-in-dropdown_Screen_shot.png

Was this article helpful?

0 out of 0 found this helpful

Comments

2 comments

  • Comment author
    Kei Enomoto

    Hi Allie Cliffe,

    I am trying to run a workflow on TCGA WXS bam files. For example in CESC project, some files are stored in Google Cloud and the others are in DRS. In this case, should I export them separately to my workspace and run workflow on each dataset?

    And if so, will the column names in the DRS data table automatically be prefixed with "pfb" or "tdr" or do I have to do it manually?

    Thank you,

    Kei

    0
  • Comment author
    Allie Cliffe

    Hello, Kei Enomoto.  

    I think what you are asking is “can you run a workflow with inputs from a data table column that contains both 'gs://' file references and DRS file references?"
     
    If I’m understanding it correctly, then yes, it should run fine. If the column includes both DRS URIs and gs:// URIs (as the one you've circled), Terra will know to look in the right place for WDL inputs from the hyperlink in the table. You shouldn’t have to do anything as long as your authorization links are current. 
     
    All Terra should need to run the workflow is the correct column header - in the case you've circled above,  'WXS_bam_path'. You would only need to include the 'pfb' prefix if the prefix was part of the column header in the table. 
     
    Cheers,
    Allie

     

    0

Please sign in to leave a comment.