How to specify workflow/task output directory?

Post author
Zhonghui Xu

Hi,

Is there a way on Terra to tell cromwell where to send workflow/task output? It seems you would have several options in cromwell to specify the output directory, as discussed in this thread:

https://gatkforums.broadinstitute.org/wdl/discussion/12619/how-to-specify-an-output-directory-by-the-cromwell-option-or-in-the-json-input-file

But I don't find in documentation how to implement these options via Terra.

Thanks,

Zhonghui

Comments

5 comments

  • Comment author
    Jason Cerrato

    Hi Zhonghui,

    Happy to help here. In Terra, outputs are automatically written to the workspace bucket. You can specify which tables you want to write specific output to using the method configuration. For example, take a look at this configuration from the featured workspace https://app.terra.bio/#workspaces/help-gatk/Somatic-SNVs-Indels-GATK4/workflows/gatk/1-Mutect2_PON:

     

    These variables for output were defined in the WDL, and then the locations of these outputs were set in the method configuration.

    Does this make sense? I recommend reading this article, especially the section titled Configuring outputs to write to the data table: https://support.terra.bio/hc/en-us/articles/360026521831-Configure-a-workflow-to-process-your-data

    If you have any further questions about this, please let me know.

    Kind regards,

    Jason

    0
  • Comment author
    Zhonghui Xu

    Hi Jason,

    Thanks for your reply. I understand how to write output file links in the data table that point to the actual locations in the workspace bucket. The question I had was how to specify the actual bucket location (gs://) for a given output. The aim is to be able to organize output from multiple jobs in a single flatten directory. Specifically, the feature I am interested in is discussed in these cromwell github issues:

    https://github.com/broadinstitute/cromwell/issues/1641

    https://github.com/broadinstitute/cromwell/pull/4815

    From above discussion, it seems cromwell options "final_workflow_outputs_dir" and "use_relative_output_paths" may offer the solution. You may pass these options to cromwell from local run by supplying a JSON configuration file. But I don't know how to do this on Terra. If this is currently not possible on Terra, I think this (and more generally the ability to configure cromwell) could be a useful feature to have in future if feasible.

    Regards,

    Zhonghui

    1
  • Comment author
    Jason Cerrato

    Hi Zhonghui,

    You are correct that this is not currently possible in Terra. If you feel that you and other users would greatly benefit from this functionality, I recommend writing a feature request to this board: https://support.terra.bio/hc/en-us/community/topics/360000500452-Feature-Requests

    Other users who agree with your suggestion are able to voice their support and upvote your post, which the development team takes into consideration when deciding what to build.

    As far as work in Terra is concerned, the only way to write the files to a specific gs:// location would be to use gsutil cp commands in the WDL to ensure the outputs go where you specifically want them to be.

    Kind regards,

    Jason

    0
  • Comment author
    Haley J Abel

    I would definitely vote for this as a useful feature in terra.

    0
  • Comment author
    Jason Cerrato

    Hi Haley J Abel,

    I will flag up your support with the development team. If you would like to publicly support the feature request, you can do so here: https://support.terra.bio/hc/en-us/community/posts/360061260351-custom-workflow-options-to-cromwell

    Kind regards,

    Jason

    0

Please sign in to leave a comment.