output directory is missing? /cromwell_root/script: line 103: 22 Killed salmon quant
I wrote a workflow that uses Salmon. One of the arguments to salmon is --output, ie the directory salmon will write to. Before calling salmon I call
```
mkdir salmon.out
```
salmon runs for about 10 mins and then exits. The return code is 137. I am following up with the Salmon team to figure out what that error code means.
I notice that after the terra job completes the execution directory does not include the salmon.out directory. This is really strange. Salmon should create ./salmon.out/logs/salmon_quant.log
Before running on terra, I test using 'Crowell run' one of the servers at my university. The only differences I am aware of is 1) in run mode Cromwell does use the runtime parameters and 2) I used reads from a different project
I based my wdl on https://api.firecloud.org/ga4gh/v1/tools/mxhe:salmon_quant_array/versions/9/plain-WDL/descriptor There are some other workflows that use Salmon. They are all very similar
my workflow is in salmon_paired_reads.wdl, In the salmon_paired_reads.log file I see
```
[2021-03-29 16:21:11.509] [jointLog] [info] Loading dense pufferfish index.
-----------------------------------------
| Loading contig table | Time = 553.34 s
-----------------------------------------
size = 45242875
-----------------------------------------
| Loading contig offsets | Time = 14.76 s
-----------------------------------------
-----------------------------------------
| Loading reference lengths | Time = 2.0848 s
-----------------------------------------
/cromwell_root/script: line 103: 22 Killed salmon quant -i $refIndexDir --libType A -1 "/cromwell_root/fc-secure-519db2bc-049f-43a0-ab75-a2eb9c2cb059/6a6c9b92-3026-47d3-8944-60f0842c566e/samToFastqTest/5f578d2f-7e74-4402-955a-4d4623b83ead/call-samToFastq/GTEX-111CU-0526-SM-5EGHK.2.fastq.gz" -2 "/cromwell_root/fc-secure-519db2bc-049f-43a0-ab75-a2eb9c2cb059/6a6c9b92-3026-47d3-8944-60f0842c566e/samToFastqTest/5f578d2f-7e74-4402-955a-4d4623b83ead/call-samToFastq/GTEX-111CU-0526-SM-5EGHK.1.fastq.gz" -p 8 --recoverOrphans --validateMappings --gcBias --seqBias --rangeFactorizationBins 4 --output salmon.out
+ salmonRet=137
+ echo 'AEDWIP in time salmonRet='
AEDWIP in time salmonRet=
+ '[' 137 -eq 0 ']'
+ echo 'Salmon ERROR code 137'
Salmon ERROR code 137
```
line 103 in my wdl file is a comment. This is line 103 in scripts file
```
) > "$out77446922" 2> "$err77446922"
```
This is what I see in my execution bucket?
Comments
16 comments
https://api.firecloud.org/ga4gh/v1/tools/aedavids.ucsc.edu:SalmonPairedReadQuantTask/versions/5/plain-WDL/descriptor
Hi Andrew Davidson,
Thanks for writing in. Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.
Let us know the workspace name, as well as the relevant submission and workflow IDs. We’ll be happy to take a closer look as soon as we can.
Best,
Samantha
Hi Samantha
sorry I can not share the workspace. The data comes from GTEx and has subject identifiable information. I can zip up any of the files in the execution bucket if you like
Can you tell me more about "/cromwell_root/script: line 103: 22 Killed salmon quant" . Why would Cromwell kill my process? Could it be because memory was exhausted? How does terra/cromwell/docker handle swap space?
I configure the runtime to use 80Gb of disk. maybe I need to bump memory to 16GB
Kind regards
Andy
Andy
I tried running with 32 gb of memory and 80gb of disk. I still get the same error
Let me see if I can create reproduce the bug in a new workspace using samples I can share
Andy
Hi Samantha
I create a sharable workspace that does not have user identifiable information. I still have the same error
Please see the comments on the Dashboard
https://app.terra.bio/#workspaces/test-aedavids-proj/testSalmonPairedReadQuantTask
https://app.terra.bio/#workspaces/test-aedavids-proj/testSalmonPairedReadQuantTask/job_history
There is only one workflow in this workspace, salmonPairedReadQuantTask
Kind regards
Andy
Hi Andrew Davidson,
Thanks, we'll take a look and get back to you as soon as we can.
Best,
Samantha
Hi Andrew Davidson,
It looks like the workspace has not been shared with us yet. Can you please share the workspace with GROUP_FireCloud-Support@firecloud.org so we can access it?
Thanks,
Samantha
Hi Samantha
I think you should have access now. The sharing UI is not obvious. before I just cut and paste the email address and clicked on the "save" button. I missed the detail about needing to hit enter. It might be nice if the save button is inactivated to let the user know they need to do something else
Kind regards
Andy
Hi Samantha
I suspect that the reason for the error code 137 was because memory was exhausted and terra ignored my 'runtime' configuration. My ref index must be loaded into memory. It is 19 GB. Notice in the log file docker run command was called without --memory
in my workflow input section, I have salmon_quant memoryGb Int 32
here are the sections of my wdl that deal with input and runtime
```
workflow salmon_quant {
#String dockerImg = 'quay.io/biocontainers/salmon:1.4.0--hf69c8f4_0'
String dockerImg = 'quay.io/biocontainers/salmon:1.3.0--hf69c8f4_0'
#String dockerImg = 'ubuntu:latest'
Int runtime_cpu = 8
Int memoryGb = 8
Int diskSpaceGb = 40
call salmon_paired_reads {
input:
sampleId=sampleId,
refIndexTarGz=refIndexTarGz,
leftReads=leftReads,
rightReads=rightReads,
outDir=outDir,
dockerImg=dockerImg,
runtime_cpu=runtime_cpu,
memoryGb=memoryGb,
diskSpaceGb=diskSpaceGb
}
}
task salmon_paired_reads {
String dockerImg
Int runtime_cpu
Int memoryGb
Int diskSpaceGb
runtime {
disks: 'local-disk ${diskSpaceGb} HDD'
cpu: '${runtime_cpu}'
memory: '${memoryGb} GB'
docker: '${dockerImg}'
# https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms
# instances that last a maximum of 24 hours in general, and provide no availability guarantees.
# Preemptible VMs are priced lower than standard Compute Engine
# preemptible: '${runtime_preemptible}'
}
}
```
2021/03/30 00:55:09 Localizing input gs://fc-527c7176-37ea-4499-ba34-31bd05f7d80e/sel.align.gencode.v35.ucsc.rmsk.salmon.v1.3.0.sidx.tar.gz -> /cromwell_root/fc-527c7176-37ea-4499-ba34-31bd05f7d80e/sel.align.gencode.v35.ucsc.rmsk.salmon.v1.3.0.sidx.tar.gz
2021/03/30 00:57:09 Localizing input gs://fc-527c7176-37ea-4499-ba34-31bd05f7d80e/eaafb4b4-c279-43ac-b17d-d33c4506dab7/salmon_quant/1abe75d8-04cf-4865-bb97-d3e82f36ee2b/call-salmon_paired_reads/script -> /cromwell_root/script
2021/03/30 00:57:10 Localization script execution complete.
2021/03/30 00:57:58 Done localization.
2021/03/30 00:57:59 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash quay.io/biocontainers/salmon@sha256:b1b5136321e8d5849e49035cd59e5dda755ba759f4e6fe3ffe1e914444a711af /cromwell_root/script
Hi Andrew Davidson,
We're still looking into the error, but wanted to address your question regarding the output directory you're trying to create. I noticed your current WDL does not have any workflow-level outputs - you'll need to set up a workflow-level output, which will get saved to the execution directory.
You can't delocalize a directory itself, but you can either output the contents as an array, or you can tar the directory and output the tar. Here's a forum post from a user with similar question: https://support.terra.bio/hc/en-us/community/posts/360067788071-Outputting-and-inputting-a-directory. In the comments for the post, you'll see examples for the two options I mentioned.
Best,
Samantha
Thanks, Samantha
Strange it worked differently during the test on my local machine. This is great news. Salmon writes its log files in the subdirectory. Maybe We will find something that helps debug the exit code 137 issue
I will post my findings
Andy
Hi Samantha
I made a small change to my WDL. in the task I switch to '<<< >>>' syntax. It still fails how ever it seems to work better. I now get
"Task salmon_quant.salmon_paired_reads:NA:1 failed. The job was stopped before the command finished. PAPI error code 10. The assigned worker has failed to complete the operation"
https://support.terra.bio/hc/en-us/articles/360039010292-Error-message-PAPI-error-code-10
This could be "insufficient memory or disk". I looked at the log line for dockker run. It looks like it is ignoring my runtime configuration
Kind regards
Andy
Hi Andrew Davidson,
I took a look at the most recent run, and it does look like the submission is failing due to insufficient memory. Here's the error message in the log file:
Can you try increasing the memory and resubmitting the workflow?
Best,
Samantha
Hi Samantha
I think the new version of Cromwell works better. I bumped memory up to 64 GB and was able to run to completion
Andy
Hi Andrew,
Glad to hear! Let us know if you need assistance with anything else.
Best,
Samantha
Please sign in to leave a comment.