output directory is missing? /cromwell_root/script: line 103: 22 Killed salmon quant

March 29, 2021 17:50
16 comments

I wrote a workflow that uses Salmon. One of the arguments to salmon is --output, ie the directory salmon will write to. Before calling salmon I call

```
mkdir salmon.out
```

salmon runs for about 10 mins and then exits. The return code is 137. I am following up with the Salmon team to figure out what that error code means.

I notice that after the terra job completes the execution directory does not include the salmon.out directory. This is really strange. Salmon should create ./salmon.out/logs/salmon_quant.log

Before running on terra, I test using 'Crowell run' one of the servers at my university. The only differences I am aware of is 1) in run mode Cromwell does use the runtime parameters and 2) I used reads from a different project

I based my wdl on https://api.firecloud.org/ga4gh/v1/tools/mxhe:salmon_quant_array/versions/9/plain-WDL/descriptor There are some other workflows that use Salmon. They are all very similar

my workflow is in salmon_paired_reads.wdl, In the salmon_paired_reads.log file I see

```

[2021-03-29 16:21:11.509] [jointLog] [info] Loading dense pufferfish index.
-----------------------------------------
| Loading contig table | Time = 553.34 s
-----------------------------------------
size = 45242875
-----------------------------------------
| Loading contig offsets | Time = 14.76 s
-----------------------------------------
-----------------------------------------
| Loading reference lengths | Time = 2.0848 s
-----------------------------------------
/cromwell_root/script: line 103: 22 Killed salmon quant -i $refIndexDir --libType A -1 "/cromwell_root/fc-secure-519db2bc-049f-43a0-ab75-a2eb9c2cb059/6a6c9b92-3026-47d3-8944-60f0842c566e/samToFastqTest/5f578d2f-7e74-4402-955a-4d4623b83ead/call-samToFastq/GTEX-111CU-0526-SM-5EGHK.2.fastq.gz" -2 "/cromwell_root/fc-secure-519db2bc-049f-43a0-ab75-a2eb9c2cb059/6a6c9b92-3026-47d3-8944-60f0842c566e/samToFastqTest/5f578d2f-7e74-4402-955a-4d4623b83ead/call-samToFastq/GTEX-111CU-0526-SM-5EGHK.1.fastq.gz" -p 8 --recoverOrphans --validateMappings --gcBias --seqBias --rangeFactorizationBins 4 --output salmon.out
+ salmonRet=137
+ echo 'AEDWIP in time salmonRet='
AEDWIP in time salmonRet=
+ '[' 137 -eq 0 ']'
+ echo 'Salmon ERROR code 137'
Salmon ERROR code 137

```

line 103 in my wdl file is a comment. This is line 103 in scripts file

```

) > "$out77446922" 2> "$err77446922"

```

This is what I see in my execution bucket?

Comments

16 comments

Andrew Davidson
- March 29, 2021 17:59
https://api.firecloud.org/ga4gh/v1/tools/aedavids.ucsc.edu:SalmonPairedReadQuantTask/versions/5/plain-WDL/descriptor

0
Samantha (she/her)
- March 29, 2021 19:58
Hi Andrew Davidson,

Thanks for writing in. Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.
1. Add GROUP_FireCloud-Support@firecloud.org to the User email field and press enter on your keyboard.
2. Click Save.
Let us know the workspace name, as well as the relevant submission and workflow IDs. We’ll be happy to take a closer look as soon as we can.

Best,

Samantha
0
Andrew Davidson
- March 29, 2021 20:02
Hi Samantha

sorry I can not share the workspace. The data comes from GTEx and has subject identifiable information. I can zip up any of the files in the execution bucket if you like

Can you tell me more about "/cromwell_root/script: line 103: 22 Killed salmon quant" . Why would Cromwell kill my process? Could it be because memory was exhausted? How does terra/cromwell/docker handle swap space?

I configure the runtime to use 80Gb of disk. maybe I need to bump memory to 16GB

Kind regards

Andy

Andy

0
Andrew Davidson
- March 29, 2021 22:31
I tried running with 32 gb of memory and 80gb of disk. I still get the same error

0
Andrew Davidson
- March 29, 2021 22:59
Let me see if I can create reproduce the bug in a new workspace using samples I can share

Andy

0
Andrew Davidson
- March 30, 2021 02:49
Hi Samantha

I create a sharable workspace that does not have user identifiable information. I still have the same error

Please see the comments on the Dashboard

https://app.terra.bio/#workspaces/test-aedavids-proj/testSalmonPairedReadQuantTask

https://app.terra.bio/#workspaces/test-aedavids-proj/testSalmonPairedReadQuantTask/job_history

There is only one workflow in this workspace, salmonPairedReadQuantTask

Kind regards

Andy

0
Samantha (she/her)
- March 30, 2021 15:47
Hi Andrew Davidson,

Thanks, we'll take a look and get back to you as soon as we can.

Best,

Samantha

0
Samantha (she/her)
- March 30, 2021 15:49
Hi Andrew Davidson,

It looks like the workspace has not been shared with us yet. Can you please share the workspace with GROUP_FireCloud-Support@firecloud.org so we can access it?

Thanks,

Samantha

0
Andrew Davidson
- March 30, 2021 17:00
Hi Samantha

I think you should have access now. The sharing UI is not obvious. before I just cut and paste the email address and clicked on the "save" button. I missed the detail about needing to hit enter. It might be nice if the save button is inactivated to let the user know they need to do something else

Kind regards

Andy

0
Andrew Davidson
- Edited April 01, 2021 00:18
Hi Samantha

I suspect that the reason for the error code 137 was because memory was exhausted and terra ignored my 'runtime' configuration. My ref index must be loaded into memory. It is 19 GB. Notice in the log file docker run command was called without --memory

in my workflow input section, I have salmon_quant memoryGb Int 32

here are the sections of my wdl that deal with input and runtime

```

workflow salmon_quant {

#String dockerImg = 'quay.io/biocontainers/salmon:1.4.0--hf69c8f4_0'
String dockerImg = 'quay.io/biocontainers/salmon:1.3.0--hf69c8f4_0'
#String dockerImg = 'ubuntu:latest'
Int runtime_cpu = 8
Int memoryGb = 8
Int diskSpaceGb = 40

call salmon_paired_reads {
input:
sampleId=sampleId,
refIndexTarGz=refIndexTarGz,
leftReads=leftReads,
rightReads=rightReads,
outDir=outDir,

dockerImg=dockerImg,
runtime_cpu=runtime_cpu,
memoryGb=memoryGb,
diskSpaceGb=diskSpaceGb
}

}

task salmon_paired_reads {

String dockerImg
Int runtime_cpu
Int memoryGb
Int diskSpaceGb

runtime {
disks: 'local-disk ${diskSpaceGb} HDD'
cpu: '${runtime_cpu}'
memory: '${memoryGb} GB'
docker: '${dockerImg}'

# https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms
# instances that last a maximum of 24 hours in general, and provide no availability guarantees.
# Preemptible VMs are priced lower than standard Compute Engine
# preemptible: '${runtime_preemptible}'

}
}

```

2021/03/30 00:55:09 Localizing input gs://fc-527c7176-37ea-4499-ba34-31bd05f7d80e/sel.align.gencode.v35.ucsc.rmsk.salmon.v1.3.0.sidx.tar.gz -> /cromwell_root/fc-527c7176-37ea-4499-ba34-31bd05f7d80e/sel.align.gencode.v35.ucsc.rmsk.salmon.v1.3.0.sidx.tar.gz
2021/03/30 00:57:09 Localizing input gs://fc-527c7176-37ea-4499-ba34-31bd05f7d80e/eaafb4b4-c279-43ac-b17d-d33c4506dab7/salmon_quant/1abe75d8-04cf-4865-bb97-d3e82f36ee2b/call-salmon_paired_reads/script -> /cromwell_root/script
2021/03/30 00:57:10 Localization script execution complete.
2021/03/30 00:57:58 Done localization.
2021/03/30 00:57:59 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash quay.io/biocontainers/salmon@sha256:b1b5136321e8d5849e49035cd59e5dda755ba759f4e6fe3ffe1e914444a711af /cromwell_root/script

0
Samantha (she/her)
- April 01, 2021 20:49
Hi Andrew Davidson,

We're still looking into the error, but wanted to address your question regarding the output directory you're trying to create. I noticed your current WDL does not have any workflow-level outputs - you'll need to set up a workflow-level output, which will get saved to the execution directory.

You can't delocalize a directory itself, but you can either output the contents as an array, or you can tar the directory and output the tar. Here's a forum post from a user with similar question: https://support.terra.bio/hc/en-us/community/posts/360067788071-Outputting-and-inputting-a-directory. In the comments for the post, you'll see examples for the two options I mentioned.

Best,

Samantha

0
Andrew Davidson
- April 01, 2021 23:04
Thanks, Samantha

Strange it worked differently during the test on my local machine. This is great news. Salmon writes its log files in the subdirectory. Maybe We will find something that helps debug the exit code 137 issue

I will post my findings

Andy

0
Andrew Davidson
- April 02, 2021 04:35
Hi Samantha

I made a small change to my WDL. in the task I switch to '<<< >>>' syntax. It still fails how ever it seems to work better. I now get

"Task salmon_quant.salmon_paired_reads:NA:1 failed. The job was stopped before the command finished. PAPI error code 10. The assigned worker has failed to complete the operation"

https://support.terra.bio/hc/en-us/articles/360039010292-Error-message-PAPI-error-code-10

This could be "insufficient memory or disk". I looked at the log line for dockker run. It looks like it is ignoring my runtime configuration
```
2021/04/02 03:04:54 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash quay.io/biocontainers/salmon@sha256:f97b5c3cdc67e7b8f459e10e2fd8b49cf093a0f8fd52d54c8d62f464a0f2b08d /cromwell_root/script
+ cat /etc/os-release
```
Kind regards

Andy
0
Samantha (she/her)
- April 05, 2021 17:29
Hi Andrew Davidson,

I took a look at the most recent run, and it does look like the submission is failing due to insufficient memory. Here's the error message in the log file:

Can you try increasing the memory and resubmitting the workflow?

Best,

Samantha

0
Andrew Davidson
- April 05, 2021 22:03
Hi Samantha

I think the new version of Cromwell works better. I bumped memory up to 64 GB and was able to run to completion

Andy

0
Samantha (she/her)
- April 07, 2021 19:23
Hi Andrew,

Glad to hear! Let us know if you need assistance with anything else.

Best,

Samantha

0

Please sign in to leave a comment.