Need Help?

Search our documentation and community forum

Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Terra powers important scientific projects like FireCloud, AnVIL, and BioData Catalyst. Learn more.

Nested output file path

Comments

7 comments

  • Avatar
    Jason Cerrato

    Hi Sehyun,

    It looks like the first location is a downloaded cacheCopy from a previous run, because it detected that the job has already run successfully and isn't running that step from scratch. Is there any problem with the second workflow using the file?

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Sehyun Oh

    Hi Jason,

    I don't think the problem is cacheCopy part. The output location saved in workspace data is different from actual file location. I think I probably used basename/filename in a wrong way during CombineVariants and htslib tasks of my WDL, but not sure how to fix it. Here is the screen capture of my new run with cache disabled.

    * Output from Job History (saved as workspace.pon and workspace.pon_idx):

     

    * Saved in workspace data:

     

    * But if I click `normals.merged.min5.vcf.gz`, file does not exist:


    * They are actually in a nested location: 

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Sehyun,

    I'm curious to know if you can download the TSV for the Workspace Data section and replace the path with where you know it does exist. I think should get you around the issue of the file not being found.

    If you would like some assistance diving deeper into how this happened, please share your workspace with GROUP_FireCloud-Support@firecloud.org and point us to the workflow you used, where it should have created the file, and where the result is different from what you expected. We'll be happy to take a closer look as soon as we are able.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Sehyun Oh

    Hi Jason,

    I've been manually updating the file path to get away this issue. But I'm about to share my workspace, and really want to remove this manual fix step. 

    I added the about account as a reader of the workspace, waldronlab-sehyun/Test_PON. Submission ID 166f7bf2-93da-4c19-adf6-72135af998fd is where my screen captures came from. 

    Thanks,

    Sehyun

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Sehyun,

    It looks like the files are being output with the location of the combined_vcf input prepended to the file.

    Claims to exist here:

    gs://fc-2a17e88c-5324-420d-875f-40c85570f550/166f7bf2-93da-4c19-adf6-72135af998fd/M1_PON/02ab156b-faea-4c33-9c48-4046c2c0ff5f/call-CombineVariants/normals.merged.min5.vcf.gz

    https://console.cloud.google.com/storage/browser/fc-2a17e88c-5324-420d-875f-40c85570f550/166f7bf2-93da-4c19-adf6-72135af998fd/M1_PON/02ab156b-faea-4c33-9c48-4046c2c0ff5f/call-CombineVariants?prefix=normals.merged.min5.vcf.gz&authuser=0

    Actually exists here:

    gs://fc-2a17e88c-5324-420d-875f-40c85570f550/166f7bf2-93da-4c19-adf6-72135af998fd/M1_PON/02ab156b-faea-4c33-9c48-4046c2c0ff5f/call-htslib/fc-2a17e88c-5324-420d-875f-40c85570f550/166f7bf2-93da-4c19-adf6-72135af998fd/M1_PON/02ab156b-faea-4c33-9c48-4046c2c0ff5f/call-CombineVariants/normals.merged.min5.vcf.gz

    https://console.cloud.google.com/storage/browser/fc-2a17e88c-5324-420d-875f-40c85570f550/166f7bf2-93da-4c19-adf6-72135af998fd/M1_PON/02ab156b-faea-4c33-9c48-4046c2c0ff5f/call-htslib/fc-2a17e88c-5324-420d-875f-40c85570f550/166f7bf2-93da-4c19-adf6-72135af998fd/M1_PON/02ab156b-faea-4c33-9c48-4046c2c0ff5f/call-CombineVariants?prefix=normals.merged.min5.vcf.gz&authuser=0

     

    You should be able to resolve this by using mv to save the compressed file in the current directory. Something like the following:

        command <<<
    bgzip ${combined_vcf}
    mv ${combined_vcf}.gz ./basename(${combined_vcf})
    tabix basename(${combined_vcf}).gz
    >>> runtime {
    docker: htslib_docker # miguelpmachado/htslib:1.9
    memory: "4 GB"
    } output {
    File pon = "basename(${combined_vcf}).gz"
    File pon_idx = "basename(${combined_vcf}).gz.tbi"
    }

    Can you let us know if this works for you?

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Sehyun Oh

    Hi Jason,

    Thanks for the help! I could finally fix this issue. Here is the updated WDL.

    task htslib {
        # input
        File combined_vcf
        String combined_vcf_pre = basename(combined_vcf)
    
        # runtime
        String htslib_docker
        
        command <<<
            bgzip ${combined_vcf}
            mv ${combined_vcf}.gz ./${combined_vcf_pre}.gz
            tabix ${combined_vcf_pre}.gz
        >>>
        
        runtime {
            docker: htslib_docker   # miguelpmachado/htslib:1.9
            memory: "4 GB"
        }
        
        output {
            File pon = "${combined_vcf_pre}.gz"
            File pon_idx = "${combined_vcf_pre}.gz.tbi"
        }
    }

     

    I needed to add the file extension `.gz` to the destination file path on `mv` command, but other than that minor fix, your script worked great. Thanks!

    - Sehyun

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Sehyun,

    So glad to hear it worked for you! If there's anything else we can help with, please let us know.

    Kind regards,

    Jason

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk