Running ExomeGermlineSingleSample across multiple samples Fails at CheckContamination
I’m having trouble with running the ExomeGermlineSingleSample_v2.4.7 workflow in parallel across multiple samples with a single analysis run.
The workspace I’m using is here:
https://app.terra.bio/#workspaces/santi-test/Santi
I’ve shared the Workspace with GROUP_FireCloud-Support@firecloud.org
I have a set of uBAMs defined in a data table named “sample”. To keep it simple, this table just has three columns: sample_id, collection_data, uBAM. “Collection data” is just a play metadata field. uBAM is the path to the unmapped BAM file in the cloud. I have one unmapped BAM per sample.
My goal is to create a gVCF for each uBAM. Since I want one gVCF for each uBAM, I believe the right thing to do here is run the workflow with “sample” as the root entity type. this should create a separate job for each uBAM.
I note that when I select multiple samples from the sample table, I see a message that this is going to create a new sample_set table which will be added to my workspace (but the GUI still indicates that “sample” will be the root entity type).
My JSON syntax for the sample_and_unmapped_bams variable is
{ "sample_name": this.sample_id, "base_file_name": this.sample_id, "flowcell_unmapped_bams": this.uBAM, "final_gvcf_base_name": this.sample_id, "unmapped_bam_suffix": ".bam" }
When I run the analysis, it looks like things are going as expected- If I select 2 sample from my sample table to submit, I spawn two jobs. From the job manager, it looks like the JSON code worked properly as the sample_and_unmapped_bams variable was set to, e.g.:
sample_and_unmapped_bams{ "base_file_name": "DC_29", "final_gvcf_base_name": "DC_29", "flowcell_unmapped_bams": [ "gs://fc-86f8893d-747d-4cfd-8fe8-694a8315fc38/DC_29_fastqtosam.bam" ], "sample_name": "DC_29", "unmapped_bam_suffix": ".bam" }
Each job fails at the CheckContamination Task:
Task UnmappedBamToAlignedBam.CheckContamination:NA:1 failed. Job exit code 1. Check gs://fc-86f8893d-747d-4cfd-8fe8-694a8315fc38/5e4f9967-2b4a-4038-b7aa-8a121426eb47/ExomeGermlineSingleSample/2aee927a-7f5e-4f13-bf9a-e19a6290dbd7/call-UnmappedBamToAlignedBam/UnmappedBamToAlignedBam/86a5e603-fe9e-40d9-9f85-e84a1a0f4aca/call-CheckContamination/stderr for more information. PAPI error code 9. Please check the log file for more details: gs://fc-86f8893d-747d-4cfd-8fe8-694a8315fc38/5e4f9967-2b4a-4038-b7aa-8a121426eb47/ExomeGermlineSingleSample/2aee927a-7f5e-4f13-bf9a-e19a6290dbd7/call-UnmappedBamToAlignedBam/UnmappedBamToAlignedBam/86a5e603-fe9e-40d9-9f85-e84a1a0f4aca/call-CheckContamination/CheckContamination.log.
The CheckContamiination.log file indicates that a required file (something generate by the workflow, I believe) is not found. here is the last bit:
2021/11/27 00:15:15 Starting delocalization.
2021/11/27 00:15:16 Delocalization script execution started...
2021/11/27 00:15:16 Delocalizing output /cromwell_root/memory_retry_rc -> gs://fc-86f8893d-747d-4cfd-8fe8-694a8315fc38/5e4f9967-2b4a-4038-b7aa-8a121426eb47/ExomeGermlineSingleSample/2aee927a-7f5e-4f13-bf9a-e19a6290dbd7/call-UnmappedBamToAlignedBam/UnmappedBamToAlignedBam/86a5e603-fe9e-40d9-9f85-e84a1a0f4aca/call-CheckContamination/memory_retry_rc
2021/11/27 00:15:19 Delocalizing output /cromwell_root/rc -> gs://fc-86f8893d-747d-4cfd-8fe8-694a8315fc38/5e4f9967-2b4a-4038-b7aa-8a121426eb47/ExomeGermlineSingleSample/2aee927a-7f5e-4f13-bf9a-e19a6290dbd7/call-UnmappedBamToAlignedBam/UnmappedBamToAlignedBam/86a5e603-fe9e-40d9-9f85-e84a1a0f4aca/call-CheckContamination/rc
2021/11/27 00:15:20 Delocalizing output /cromwell_root/stdout -> gs://fc-86f8893d-747d-4cfd-8fe8-694a8315fc38/5e4f9967-2b4a-4038-b7aa-8a121426eb47/ExomeGermlineSingleSample/2aee927a-7f5e-4f13-bf9a-e19a6290dbd7/call-UnmappedBamToAlignedBam/UnmappedBamToAlignedBam/86a5e603-fe9e-40d9-9f85-e84a1a0f4aca/call-CheckContamination/stdout
2021/11/27 00:15:22 Delocalizing output /cromwell_root/stderr -> gs://fc-86f8893d-747d-4cfd-8fe8-694a8315fc38/5e4f9967-2b4a-4038-b7aa-8a121426eb47/ExomeGermlineSingleSample/2aee927a-7f5e-4f13-bf9a-e19a6290dbd7/call-UnmappedBamToAlignedBam/UnmappedBamToAlignedBam/86a5e603-fe9e-40d9-9f85-e84a1a0f4aca/call-CheckContamination/stderr
2021/11/27 00:15:23 Delocalizing output /cromwell_root/DC_29.preBqsr.selfSM -> gs://fc-86f8893d-747d-4cfd-8fe8-694a8315fc38/5e4f9967-2b4a-4038-b7aa-8a121426eb47/ExomeGermlineSingleSample/2aee927a-7f5e-4f13-bf9a-e19a6290dbd7/call-UnmappedBamToAlignedBam/UnmappedBamToAlignedBam/86a5e603-fe9e-40d9-9f85-e84a1a0f4aca/call-CheckContamination/DC_29.preBqsr.selfSM
Required file output '/cromwell_root/DC_29.preBqsr.selfSM' does not exist.
Any suggestions Kylee Degatano ?? I'm happy to try anything else.
Comments
2 comments
Hi Don Conrad,
Thanks for writing in. We'll take a look at your issue and get back to you as soon as we can.
Best,
Samantha
Hi Don Conrad,
It seems like you may be running into the issue mentioned in this old GATK thread: https://sites.google.com/a/broadinstitute.org/legacy-gatk-forum-discussions/2020-01-07-2019-07-10/24408-bug-in-gatk-exome-pipeline---verifybamid-step.
Per the post: "It [VerifyBamID] reports warning if the number of polymorphic markers are less than 1,000 or less than 10% of provided marker."
Can you try the suggestion in the thread and see if that resolves the error?
Best,
Samantha
Please sign in to leave a comment.