file localization not working

Post author
Philipp Hahnel

Hi, I've checked the other related articles on issues with file localization, and my problem doesn't seem to be amongst those. I've written a WDL to use samtools on a bam and a ref fasta. 

1. Problem: The bai does not localize, all other files are localized:

2021/11/22 19:12:43 Starting container setup. 
2021/11/22 19:12:45 Done container setup.
2021/11/22 19:12:50 Starting localization.
2021/11/22 19:13:14 Localization script execution started...
2021/11/22 19:13:14 Localizing input gs://fc-secure-faf8c8cd-0082-4cee-84a8-76472106ceed/test_positions.bed -> /cromwell_root/fc-secure-faf8c8cd-0082-4cee-84a8-76472106ceed/test_positions.bed
2021/11/22 19:13:18 Localizing input gs://fc-secure-faf8c8cd-0082-4cee-84a8-76472106ceed/18f90bdc-92fc-433c-93c9-0882d57dad55/callMpileup/ce577710-2cb5-4fc9-9dde-ca5f011484b9/call-Mpileup/script -> /cromwell_root/script
2021/11/22 19:13:20 Localizing input gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta -> /cromwell_root/gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta
2021/11/22 19:13:59 Localizing input gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.fai -> /cromwell_root/gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.fai
Copying gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta.fai...
/ [0 files][ 0.0 B/ 2.7 KiB] / [1 files][ 2.7 KiB/ 2.7 KiB]
Operation completed over 1 objects/2.7 KiB.
2021/11/22 19:14:04 Localizing input gs://fc-7320b057-72ec-4702-9c4f-662efc9af9e1/Brastianos_BrainTumor_Sample_set_2015/RP-328/Exome/BMM-13S/v4/BMM-13S.bam -> /cromwell_root/fc-7320b057-72ec-4702-9c4f-662efc9af9e1/Brastianos_BrainTumor_Sample_set_2015/RP-328/Exome/BMM-13S/v4/BMM-13S.bam
2021/11/22 19:23:23 Localization script execution complete.
2021/11/22 19:23:39 Done localization.
2021/11/22 19:23:42 Running user action: docker run -v /mnt/local-disk:/cromwell_root -v /mnt/d-c74a541aa27f13cfe59c2f998a664729:/mnt/d9e025138b28caa42dd4006fc3636661:ro --entrypoint=/bin/bash us.gcr.io/broad-gotc-prod/genomes-in-the-cloud@sha256:4fca8ca945c17fd86e31eeef1c02983e091d4f2cb437199e74b164d177d5b2d1 /cromwell_root/script
[mpileup] fail to load index for /cromwell_root/fc-7320b057-72ec-4702-9c4f-662efc9af9e1/Brastianos_BrainTumor_Sample_set_2015/RP-328/Exome/BMM-13S/v4/BMM-13S.bam

Why is the bai not localized? It's an input into the task.

2. Problem: If I make localization optional for bam, bai, fasta, and fai, it can not open the fasta. 

2021/11/22 19:16:43 Starting container setup. 
2021/11/22 19:16:45 Done container setup.
2021/11/22 19:16:49 Starting localization.
2021/11/22 19:17:20 Localization script execution started...
2021/11/22 19:17:20 Localizing input gs://fc-secure-faf8c8cd-0082-4cee-84a8-76472106ceed/test_positions.bed -> /cromwell_root/fc-secure-faf8c8cd-0082-4cee-84a8-76472106ceed/test_positions.bed
2021/11/22 19:17:23 Localizing input gs://fc-secure-faf8c8cd-0082-4cee-84a8-76472106ceed/140b7701-b79e-4dad-96b3-78ac9a9cb36a/callMpileup/8fda3d75-38fd-4cd8-938a-815085847433/call-Mpileup/script -> /cromwell_root/script
2021/11/22 19:17:25 Localization script execution complete.
2021/11/22 19:17:35 Done localization.
2021/11/22 19:17:36 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash us.gcr.io/broad-gotc-prod/genomes-in-the-cloud@sha256:4fca8ca945c17fd86e31eeef1c02983e091d4f2cb437199e74b164d177d5b2d1 /cromwell_root/script
[fai_load] build FASTA index.
[fai_build] fail to open the FASTA file gs://gcp-public-data--broad-references/hg19/v0/Homo_sapiens_assembly19.fasta

It's also beyond me why it needs to rebuild the fasta index as it's supplied.

I'm happy to share access with the testing workspace for the support team to have a look.

Best,

Philipp

Comments

18 comments

  • Comment author
    Jason Cerrato

    Hi Philipp,

    Thank you for writing in about this issue. Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.

    1. Add GROUP_FireCloud-Support@firecloud.org to the User email field and press enter on your keyboard.
    2. Click Save.

    Please provide us with

    1. A link to your workspace
    2. The relevant submission ID
    3. The relevant workflow ID

    We’ll be happy to take a closer look as soon as we can!

    Kind regards,

    Jason​ 

    0
  • Comment author
    Philipp Hahnel
    • Edited

    Hi Jason,

    thanks for coming back to me!

    I've shared the workspace with the email. The link is https://app.terra.bio/#workspaces/carterlabtest/test_mpileup

    I've set up this workspace to exactly showcase the two problems, so there is only one workflow present (mpileup) and two job submissions. Let me know if you also need me to copy the bam over to this workspace so you can run the workflow.

    Best,

    Philipp

    0
  • Comment author
    Jason Cerrato
    • Edited

    Hi Philipp,

    Thanks for sharing that workspace. Would you be willing to add my account (jcerrato@broadinstitute.org) to the authorization domain carterlab temporarily for the purposes of troubleshooting?

    Kind regards,

    Jason

    0
  • Comment author
    Philipp Hahnel

    My PI is admin of that group. He'll do that as soon as he can.

    0
  • Comment author
    Jason Cerrato

    Great! Let me know when I've been added and I'll be happy to take a closer look.

    0
  • Comment author
    Philipp Hahnel

    All right, in the interest of time, I've re-created the testing workspace without authorization domain. https://app.terra.bio/#workspaces/carterlabtest/test 

    Let me know if you can't access it!

    0
  • Comment author
    Jason Cerrato

    Hi Philipp,

    I can access that one! I'll take a closer look and get back to you as soon as I can.

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hi Philipp,

    Would you be able to give my account (jcerrato@broadinstitute.org) access to this WDL, or share the .wdl file?

    Kind regards,

    Jason

    0
  • Comment author
    Philipp Hahnel

    I gave you full access to the WDL

    0
  • Comment author
    Jason Cerrato

    Hi Philipp,

    I took a look at your two submissions and noticed something interesting. In your first submission 957f51c9-c60a-4396-a2b9-bfb8ca650fe1 workflow ID 9b4c9b17-74d3-478a-b0e3-4c49faa214de I opened the Inputs for your Mpileup task and it looks like .bam files are being fed in for both the bam and bai input.

     

    It looks like this is because of the way the task is called in the WDL, where the bam workflow input for the workflow is provided for both the bam and bai inputs for the Mpileup task.

    Changing this so that the Mpileup task bai gets the workflow's bai input should fix this issue.

    I'll see if I can find out what's going on with the fasta file.

    Kind regards,

    Jason

    0
  • Comment author
    Philipp Hahnel

    Hi Jason,

    thanks a lot for spotting this embarrassing bug! I'm curious to hear your verdict on the fasta file too!

    Best,

    Philipp

    0
  • Comment author
    Jason Cerrato

    Hi Philipp,

    I see you've set localization_optional for the ref_fasta file in your WDL. I'm wondering if this is what's causing the failure with the file, because it's being told not to localize the file to the disk.

    Some tools, like certain GATK tools, can stream files in to the command which means that it doesn't need to have the file localized in order to work. Do you know if samtools is capable of streaming in files having only the gs:// path? I wonder if turning localization_optional off will allow this to work. What do you think?

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    To be clear, you may need to remove localization_optional for all files if samtools isn't capable of streaming in the files when given their gs:// paths.

    0
  • Comment author
    Philipp Hahnel

    That's a good point. Looking at the Mutect2 WDL, one of the tasks is CramToBam, which uses samtools, and does not make localization optional. 

    Do you know where I can look into how GATK streams files into the command? I'm wondering if it's easy enough to write a wrapper for samtools to make localization optional ... but that's just curious me.

    For now, I'll use the working localization option.

    Many thanks for taking your time to look into these issues!

    Best,

    Philipp

    0
  • Comment author
    Jason Cerrato

    Hey Philipp,

    I'm not privy to how GATK streams files but I can try to find out next week and get back to you! For now localizing the files definitely seems like the right route.

    I'll follow up and let you know what I find, if anything.

    Have a great rest of your week!

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hey Philipp,

    I've learned that GATK streams in files using java’s NIO package. You can search through the gatk git repo to find issue tickets and PR that are associated with “nio” for examples.

    I hope this helps you find what you need!

    Kind regards,

    Jason

    1
  • Comment author
    Philipp Hahnel

    Thanks, Jason! I'll take a look. 

    0
  • Comment author
    Maika

    I am not familiar with how GATK streams files, but I can try to find out next week. I would suggest localizing the files for now.

    I'll follow up and let you know if I find anything.

    Enjoy the rest of your week!

    Kind regards!!!

    Maika

    0

Please sign in to leave a comment.