Hog Factors on Terra

Post author
Alexander Bick

I have recently run several workflows on ~2000 samples which scatter across 5-20 intervals. A large number of these tasks are getting stuck in "QueuedInCromwell" status for >5 hours. I believe this may be related to limits imposed by Hog Factors in Cromwell  (https://cromwell.readthedocs.io/en/develop/cromwell_features/HogFactors/), however there is no documentation that I can find on how Terra hog factors are configured and what constraints that imposes on the total number of tasks that a user should try to run in a given workflow.

As the number of users on Terra increases and the scale of the projects increases (2,000 samples is not a particularly large data set), this may be a significant limitation on what can be accomplished within the Terra environment.



  • Comment author
    Adelaide Rhodes

    I have requested more information.

  • Comment author
    Adelaide Rhodes

    Here is the information on how the "hog factors" are configured for Terra.


    In Terra, we allow an overall system total of 75,000 concurrent jobs, and a hog factor of 19. We use google project as the hog group.  In other words, each google project gets around 4000 concurrent jobs at a time.

    The user shouldn’t worry about submitting as fast as they like, Cromwell can handle the load, it’s just choosing to slow down this user so that other users still get a responsive system.

    Incidentally, 4000 doesn’t feel very high, but is based on PAPI limitations - if we submit more than that number into the google cloud per project, we start seeing PAPI fail to cope.

    It also depends how widely scattered those 2000 samples go and how fast the run once they do get selected.

    They might have spun up 40,000 jobs and Cromwell is limiting them as expected - or it could be something else is going on.

    Spoiler alert:  it was something else going on.  We are currently troubleshooting what that might be.


Please sign in to leave a comment.