output directory is missing? /cromwell_root/script: line 103: 22 Killed salmon quant

Post author
Andrew Davidson

I wrote a workflow that uses Salmon. One of the arguments to salmon is --output, ie the directory salmon will write to. Before calling salmon I call

```
mkdir salmon.out
```

salmon runs for about 10 mins and then exits. The return code is 137. I am following up with the Salmon team to figure out what that error code means.

I notice that after the terra job completes the execution directory does not include the salmon.out directory. This is really strange. Salmon should create ./salmon.out/logs/salmon_quant.log

 

Before running on terra, I test using 'Crowell run' one of the servers at my university. The only differences I am aware of is 1) in run mode Cromwell does use the runtime parameters and 2) I used reads from a different project

I based my wdl on https://api.firecloud.org/ga4gh/v1/tools/mxhe:salmon_quant_array/versions/9/plain-WDL/descriptor There are some other workflows that use Salmon. They are all very similar

 

my workflow is in salmon_paired_reads.wdl, In the salmon_paired_reads.log file I see 

 

```

[2021-03-29 16:21:11.509] [jointLog] [info] Loading dense pufferfish index.
-----------------------------------------
| Loading contig table | Time = 553.34 s
-----------------------------------------
size = 45242875
-----------------------------------------
| Loading contig offsets | Time = 14.76 s
-----------------------------------------
-----------------------------------------
| Loading reference lengths | Time = 2.0848 s
-----------------------------------------
/cromwell_root/script: line 103: 22 Killed salmon quant -i $refIndexDir --libType A -1 "/cromwell_root/fc-secure-519db2bc-049f-43a0-ab75-a2eb9c2cb059/6a6c9b92-3026-47d3-8944-60f0842c566e/samToFastqTest/5f578d2f-7e74-4402-955a-4d4623b83ead/call-samToFastq/GTEX-111CU-0526-SM-5EGHK.2.fastq.gz" -2 "/cromwell_root/fc-secure-519db2bc-049f-43a0-ab75-a2eb9c2cb059/6a6c9b92-3026-47d3-8944-60f0842c566e/samToFastqTest/5f578d2f-7e74-4402-955a-4d4623b83ead/call-samToFastq/GTEX-111CU-0526-SM-5EGHK.1.fastq.gz" -p 8 --recoverOrphans --validateMappings --gcBias --seqBias --rangeFactorizationBins 4 --output salmon.out
+ salmonRet=137
+ echo 'AEDWIP in time salmonRet='
AEDWIP in time salmonRet=
+ '[' 137 -eq 0 ']'
+ echo 'Salmon ERROR code 137'
Salmon ERROR code 137

```

 

line 103 in my wdl file is a comment. This is line 103 in scripts file

```

) > "$out77446922" 2> "$err77446922"

```

This is what I see in my execution bucket?

 

Comments

16 comments

  • Comment author
    Andrew Davidson

    https://api.firecloud.org/ga4gh/v1/tools/aedavids.ucsc.edu:SalmonPairedReadQuantTask/versions/5/plain-WDL/descriptor

    0
  • Comment author
    Samantha (she/her)

    Hi Andrew Davidson,

     

    Thanks for writing in. Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.

    1. Add GROUP_FireCloud-Support@firecloud.org to the User email field and press enter on your keyboard.
    2. Click Save.

     

    Let us know the workspace name, as well as the relevant submission and workflow IDs. We’ll be happy to take a closer look as soon as we can.

     

    Best,

    Samantha

    0
  • Comment author
    Andrew Davidson

    Hi Samantha

    sorry I can not share the workspace. The data comes from GTEx and has subject identifiable information. I can zip up any of the files in the execution bucket if you like

     

    Can you tell me more about "/cromwell_root/script: line 103: 22 Killed salmon quant" . Why would Cromwell kill my process? Could it be because memory was exhausted? How does terra/cromwell/docker handle swap space?

     

    I configure the runtime to use 80Gb of disk. maybe I need to bump memory to 16GB

     

    Kind regards

     

    Andy

     

    Andy

    0
  • Comment author
    Andrew Davidson

    I tried running with 32 gb of memory and 80gb of disk. I still get the same error

    0
  • Comment author
    Andrew Davidson

    Let me see if I can create reproduce the bug in a new workspace using samples I can share

    Andy

    0
  • Comment author
    Andrew Davidson

    Hi Samantha

    I create a sharable workspace that does not have user identifiable information. I still have the same error

    Please see the comments on the Dashboard

    https://app.terra.bio/#workspaces/test-aedavids-proj/testSalmonPairedReadQuantTask

    https://app.terra.bio/#workspaces/test-aedavids-proj/testSalmonPairedReadQuantTask/job_history

    There is only one workflow in this workspace, salmonPairedReadQuantTask

    Kind regards

    Andy

    0
  • Comment author
    Samantha (she/her)

    Hi Andrew Davidson,

     

    Thanks, we'll take a look and get back to you as soon as we can.

     

    Best,

    Samantha

    0
  • Comment author
    Samantha (she/her)

    Hi Andrew Davidson,

     

    It looks like the workspace has not been shared with us yet. Can you please share the workspace with GROUP_FireCloud-Support@firecloud.org so we can access it?

     

    Thanks,

    Samantha

    0
  • Comment author
    Andrew Davidson

    Hi Samantha

    I think you should have access now. The sharing UI is not obvious. before I just cut and paste the email address and clicked on the "save" button. I missed the detail about needing to hit enter. It might be nice if the save button is inactivated to let the user know they need to do something else

    Kind regards

    Andy

    0
  • Comment author
    Andrew Davidson
    • Edited

    Hi Samantha

    I suspect that the reason for the error code 137 was because memory was exhausted and terra ignored my 'runtime' configuration. My ref index must be loaded into memory. It is 19 GB. Notice in the log file docker  run command was called without --memory 

    in my workflow input section, I have salmon_quant memoryGb Int 32

    here are the sections of my wdl that deal with input and runtime

    ```

    workflow salmon_quant {

    #String dockerImg = 'quay.io/biocontainers/salmon:1.4.0--hf69c8f4_0'
    String dockerImg = 'quay.io/biocontainers/salmon:1.3.0--hf69c8f4_0'
    #String dockerImg = 'ubuntu:latest'
    Int runtime_cpu = 8
    Int memoryGb = 8
    Int diskSpaceGb = 40


    call salmon_paired_reads {
    input:
    sampleId=sampleId,
    refIndexTarGz=refIndexTarGz,
    leftReads=leftReads,
    rightReads=rightReads,
    outDir=outDir,

    dockerImg=dockerImg,
    runtime_cpu=runtime_cpu,
    memoryGb=memoryGb,
    diskSpaceGb=diskSpaceGb
    }

    }

    task salmon_paired_reads {

    String dockerImg
    Int runtime_cpu
    Int memoryGb
    Int diskSpaceGb


    runtime {
    disks: 'local-disk ${diskSpaceGb} HDD'
    cpu: '${runtime_cpu}'
    memory: '${memoryGb} GB'
    docker: '${dockerImg}'

    # https://cloud.google.com/kubernetes-engine/docs/how-to/preemptible-vms
    # instances that last a maximum of 24 hours in general, and provide no availability guarantees.
    # Preemptible VMs are priced lower than standard Compute Engine
    # preemptible: '${runtime_preemptible}'

    }
    }

    ```

     

     

     

    2021/03/30 00:55:09 Localizing input gs://fc-527c7176-37ea-4499-ba34-31bd05f7d80e/sel.align.gencode.v35.ucsc.rmsk.salmon.v1.3.0.sidx.tar.gz -> /cromwell_root/fc-527c7176-37ea-4499-ba34-31bd05f7d80e/sel.align.gencode.v35.ucsc.rmsk.salmon.v1.3.0.sidx.tar.gz
    2021/03/30 00:57:09 Localizing input gs://fc-527c7176-37ea-4499-ba34-31bd05f7d80e/eaafb4b4-c279-43ac-b17d-d33c4506dab7/salmon_quant/1abe75d8-04cf-4865-bb97-d3e82f36ee2b/call-salmon_paired_reads/script -> /cromwell_root/script
    2021/03/30 00:57:10 Localization script execution complete.
    2021/03/30 00:57:58 Done localization.
    2021/03/30 00:57:59 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash quay.io/biocontainers/salmon@sha256:b1b5136321e8d5849e49035cd59e5dda755ba759f4e6fe3ffe1e914444a711af /cromwell_root/script

    0
  • Comment author
    Samantha (she/her)

    Hi Andrew Davidson,

     

    We're still looking into the error, but wanted to address your question regarding the output directory you're trying to create. I noticed your current WDL does not have any workflow-level outputs - you'll need to set up a workflow-level output, which will get saved to the execution directory.  

    You can't delocalize a directory itself, but you can either output the contents as an array, or you can tar the directory and output the tar. Here's a forum post from a user with similar question: https://support.terra.bio/hc/en-us/community/posts/360067788071-Outputting-and-inputting-a-directory. In the comments for the post, you'll see examples for the two options I mentioned.

     

    Best,

    Samantha

    0
  • Comment author
    Andrew Davidson

    Thanks, Samantha

    Strange it worked differently during the test on my local machine. This is great news. Salmon writes its log files in the subdirectory. Maybe We will find something that helps debug the exit code 137 issue

     

    I will post my findings 

    Andy

     

    0
  • Comment author
    Andrew Davidson

    Hi Samantha

    I made a small change to my WDL. in the task I switch to '<<< >>>' syntax. It still fails how ever it seems to work better. I now get 

     

    "Task salmon_quant.salmon_paired_reads:NA:1 failed. The job was stopped before the command finished. PAPI error code 10. The assigned worker has failed to complete the operation"

    https://support.terra.bio/hc/en-us/articles/360039010292-Error-message-PAPI-error-code-10

    This could be "insufficient memory or disk". I looked at the log line for dockker run. It looks like it is ignoring my runtime configuration

    2021/04/02 03:04:54 Running user action: docker run -v /mnt/local-disk:/cromwell_root --entrypoint=/bin/bash quay.io/biocontainers/salmon@sha256:f97b5c3cdc67e7b8f459e10e2fd8b49cf093a0f8fd52d54c8d62f464a0f2b08d /cromwell_root/script
    + cat /etc/os-release

    Kind regards

     

    Andy

    0
  • Comment author
    Samantha (she/her)

    Hi Andrew Davidson,

     

    I took a look at the most recent run, and it does look like the submission is failing due to insufficient memory. Here's the error message in the log file:

     

    Can you try increasing the memory and resubmitting the workflow?

     

    Best,

    Samantha

    0
  • Comment author
    Andrew Davidson

    Hi Samantha

    I think the new version of Cromwell works better. I bumped memory up to 64 GB and was able to run to completion

     

    Andy

    0
  • Comment author
    Samantha (she/her)

    Hi Andrew,

    Glad to hear! Let us know if you need assistance with anything else.

    Best,

    Samantha

    0

Please sign in to leave a comment.