Job takes >24hr to start

Post author
Laura Gauthier

I submitted two nearly identical jobs and one finished but the other won't start. Same workflow, same method config. The first task in the workflow is a single task that requests a pretty typical VM with

    runtime {
        docker:“us.gcr.io/broad-gatk/gatk:4.2.0.0
        memory: “3 GB”
        cpu: “1”
        disks: “local-disk 100 HDD”
    }

The inputs are both ~24GB VCFs.  fa971369-71ba-43fc-bf1c-bf49ae2e46bf started immediately and finished the first task in ~4hr, but f41784ee-c980-45a3-883a-7cee939a439d has been "running" for over 24hr with nothing but the localization/delocalization and execution scripts in the output directory.  This is in the broad-firecloud-dsde-methods/1000G-high-coverage-2019%20Laura workspace, which I believe is already shared with support.

Comments

3 comments

  • Comment author
    Emily Barnes
    • Edited

    Hi Laura,

     

    Thanks for writing in with this issue! I took a look at the workflow dashboard for submission f41784ee-c980-45a3-883a-7cee939a439d and noticed there was an AwaitingCloudQuota message. This message indicates you are bumping into resource quotas, which can slow down workflows

     

    I took a look at your project's quotas and noticed that PD and CPUs have both been near capacity over the past week.

    I would suggest submitting a request to google to increase these quotas. Terra billing project owners are now able to request quota increases directly from Google. To do so, please follow the steps below:

     

    1. Navigate to Google Cloud Console: https://console.cloud.google.com/welcome?project=<projectID>, where <projectID> is your workspace's Google Project ID, found in the workspace dashboard under "Cloud Information."
    2. From the Navigation Menu (three horizontal lines) in the top left, select IAM & Admin > Quotas
    3. Select one or more quotas you'd like to increase, then click "EDIT QUOTAS"
    4. Follow the prompts on your screen to request your new quota limit increase.

    Let me know if you have any questions!

     

    Best,

    Emily

    0
  • Comment author
    Laura Gauthier

    Hi Emily,

    I'm a little surprised that this is a persistent disk quota issue.  Does cromwell always allocate persistent disks?  Then they get deleted when the task is done?

    0
  • Comment author
    Emily Barnes

    Hi Laura,

    The Persistent disk standard quota indicates how much total disk (non-SSD) you can have attached at once to your task VMs. For the example task you provided above, disks: "local-disk 100 HDD" would count towards this quota.

    Let me know if you have any questions on this!

    Best,

    Emily

    0

Please sign in to leave a comment.