Getting beyond Container Overrides limit of 8192

Post author
T Molosh

Hi,

I am running into, what i believe to be, an issue of scale.

I am running cromwell 4.7 with aws + docker. It seems that if i try to run more than 25 samples in my pipeline (i need to run hundreds), my final step (reporting) fails with an error:

     "Container Overrides length must be at most 8192"

upon gathering the shards, it is being passed many s3 file paths, each ~90 characters long.

#### a simplified & generalized view of the workflow
version 1.0

workflow simplified {
  input {
    Array[Array[String]] fastqs
  }
  scatter (fq in fastqs) {
    call workflow.main{
      input:
        R1 = fq[0],
        R2 = fq[1],
    }
  }
  # need to wait for main workflow to be completed
  call report.report{
    input:
      aok = FINAL.ok
  }
}

# where workflow.main ~~
version 1.0

workflow main {
    input {
      File R1
      File R2
  }
  call DS{
    input:
      R1 = R1,
      R2 = R2
  }
  call TM{
    input:
      dsR1 = DS.dsR1,
      dsR2 = DS.dsR2
  }
  call QC{
    input:
      qcR1 = DS.dsR1,
      qcR2 = DS.dsR2
  }
  call ALN{
    input:
      aln = TM.out
  }
  call FINAL{
    input:
      qc = QC.out
      aln = ALN.out
    }

    # just a flag to hold up "report"
    output {
      String ok = FINAL.out
    }
  }

Mainly i need a way to prevent the reporting task from executing before the tasks in each analysis shard are completed. I tried to just pass an array of "ok" from each shard, but it looks like the report task still gets a full s3 source of the shards. Since i cannot pass the entire s3 path, if i can just find a way to wait for the analyses to finish, i can use "aws s3 ls --recursive $AWS_CROMWELL_WORKFLOW_ROOT" to find the files i need for reporting."

How does one get beyond such limitations? Is there a way to flatten or parse the array generated from the gathered the shards, and then use something like "basename" when making the call? Is there a conditional for if the array exists (just "if(myarray){call task}" or if( -e myarray){call task}" do not work).

Thanks,
Tom

Comments

2 comments

  • Comment author
    Jason Cerrato

    Hi Tom,

    Thanks for writing in here. Because this board is specifically for support in Terra, including troubleshooting issues of Cromwell as it relates to Terra, and your issue has to do with using Cromwell 4.7 in combination with AWS, this may not be the most suitable place to get help with your issue.

    I recommend posting to http://bioinformatics.stackexchange.com/ and adding the tag Cromwell. Our Cromwell engineers monitor this forum for posts with this tag, and would be better suited to answering questions about Cromwell that are beyond the scope of its implementation in Terra.

    If you have any questions about this, please let me know!

    Kind regards,

    Jason

    0
  • Comment author
    T Molosh

    OK - Thanks Jason.

     

    Best,

    Tom

    0

Please sign in to leave a comment.