Need Help?

Search our documentation and community forum

Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Terra powers important scientific projects like FireCloud, AnVIL, and BioData Catalyst. Learn more.

Support workflows with more than 50,000 nodes

Completed

Comments

8 comments

  • Avatar
    Matt Bookman

    Two additional notes:

    1- My statement above was incorrect:

    I can then re-run the workflow with call caching enabled, commenting out the ApplyRecalibration, in order to gather the metrics.

    The CollectMetricsSharded needs the input of ApplyRecalibration, so it isn't as simple as I indicated. We will need to craft a separate workflow that takes the ApplyRecalibration as input and does the metrics collection and gathering.

    2- I also noticed that the maximum number of jobs is configurable in Cromwell and the default is 1,000,000:

    https://github.com/broadinstitute/cromwell/blob/9d0cf9d964ef1328f73b69da7e21f51f3b604bc4/engine/src/main/scala/cromwell/engine/workflow/lifecycle/execution/WorkflowExecutionActor.scala

    private val DefaultTotalMaxJobsPerRootWf = 1000000
    private val DefaultMaxScatterSize = 1000000
    private val TotalMaxJobsPerRootWf = params.rootConfig.getOrElse("system.total-max-jobs-per-root-workflow", DefaultTotalMaxJobsPerRootWf)
    private val MaxScatterWidth = params.rootConfig.getOrElse("system.max-scatter-width-per-scatter", DefaultMaxScatterSize)

    If possible, please increase the Terra configuration to 60,000 so that the joint discovery workflow can run to completion.

    0
    Comment actions Permalink
  • Avatar
    Matt Bookman

    Note that I have added a github issue for the workflow itself:

    https://github.com/gatk-workflows/gatk4-germline-snps-indels/issues/40

    If this 50,000 limit is going to stay as a hard limit, there are options within the workflow to examine.

    0
    Comment actions Permalink
  • Avatar
    Matt Bookman

    According to:

    https://support.terra.bio/hc/en-us/articles/360033659472-September-23-2019

    In Terra, each batch analysis workflow is subject to a limit on the number of jobs it can launch. In this release, the limit is increasing from 50,000 to 200,000.

    So this issue looks to have been addressed.

    0
    Comment actions Permalink
  • Avatar
    Sushma Chaluvadi

    Hello Matt,

    Just double checked our internal ticket and it does indeed look like this ticket was completed!

    0
    Comment actions Permalink
  • Avatar
    Giulio Genovese

    A job I submitted yesterday on Terra failed with the following message:

    Workflow has scatter width 38717, which is more than the max scatter width 35000 allowed per scatter!

    The scatter was not calling any task so I did not worry about this as an issue when I wrote the WDL. Thankfully it was easy to remove the scatter from the WDL and package it as a separate task. But I could not find this hard limit in the documentation. Where would developers learn about such limits?

    Giulio

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Giulio Genovese,

    Thanks for writing in. Let me check with our documentation team and Cromwell team to see if we have this documented anywhere. If we don't, I'll make sure we get it documented!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Giulio Genovese

    I looked at the code and the error is generated from the ScatterKey.scala file:

    if(scatterSize > maxScatterWidth) {
    workflowExecutionActor ! JobFailedNonRetryableResponse(this, new Exception(s"Workflow has scatter width $scatterSize, which is more than the max scatter width $maxScatterWidth allowed per scatter!"), None)
    WorkflowExecutionDiff(Map(this -> ExecutionStatus.Failed))
    }

    The MaxScatterWidth is defined in file WorkflowExecutionActor.scala as follows:

    private val DefaultTotalMaxJobsPerRootWf = 1000000
    private val DefaultMaxScatterSize = 1000000
    private val TotalMaxJobsPerRootWf = params.rootConfig.getOrElse("system.total-max-jobs-per-root-workflow", DefaultTotalMaxJobsPerRootWf)
    private val MaxScatterWidth = params.rootConfig.getOrElse("system.max-scatter-width-per-scatter", DefaultMaxScatterSize)

    So it is my understanding that by default Cromwell allows scatters with a width of 1,000,000 but somehow in Terra it is configured with a more modest limit of 35,000. Is there a way to see what configuration file is used to run the Cromwell server behind Terra?

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Giulio Genovese,

    You can find this max scatter definition for Terra here: https://github.com/broadinstitute/firecloud-develop/blob/dev/base-configs/cromwell/cromwell.conf.ctmpl#L134

    Note that you need to be a member of the broadinstitute Github organization to access the file.

    I've added a note to our internal documentation about scatter so others are made aware of this limit! Thank you for flagging this up.

    Kind regards,

    Jason

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk