submission queued for an hour.
Is there a current issue with Terra? A submission I made has been in the queued state for an hour.
Thanks, Chet
Is there a current issue with Terra? A submission I made has been in the queued state for an hour.
Thanks, Chet
Comments
34 comments
Hi all,
I've created a Known Issues post for this issue. All relevant updates will be posted there.
https://support.terra.bio/hc/en-us/community/posts/360058314871-Queued-submissions-and-inaccurate-job-status
Kind regards,
Jason
It began processing after sitting in queue for an hour and 20 minutes. Was there a known issue that caused this delay?
Hi Chet,
Thanks for writing in about this. One of our engineers checked the logs and it looks like there was a huge submission that you likely got stuck behind. Are you experiencing any queue issues at this time?
Kind regards,
Jason
Not experiencing any issues currently. Thank you for looking into this. Is there a way of monitoring the queue or reporting an estimated wait time in queue?
thanks,
Chet
Today workflows are getting stuck for hours in the queue
Hi Luda,
Thank you for reporting these wait times. The FC wait time estimator is not always accurate (which is why it is no longer present in the Terra interface)—would you be willing to let us know if the original submission you showed as having an estimated wait time of four hours actually did need four hours to start? Please provide the submission ID & workflow ID if so, and share the workspace with GROUP_FireCloud-Support@firecloud.org.
Many thanks,
Jason
My workflows are so far stuck for an hour but I see in another workspace that workflows are stuck for 2 hours. Those are different workspaces/workflows/billing projects.
My workspace is called: broad-firecloud-ibmwatson/Wu_Richters_IBM. it is already shared with GROUP_FireCloud-Support@firecloud.org
Thank you,
Luda
I am also again experiencing a workflow submission that is stuck waiting in queue.
-Chet
I also have this problem, and have been waiting 3 hours for my job to run. Workspace: broadtagteam/TAG_735_CompareArrayWGSSites Id: 6fd328de-3aab-4a3d-b2f3-5da0ee3514ab
Hi Chet and Luda,
It looks like there was a massive submission this morning that's the root cause of this queue.
Hello Jason,
Thank you for this update however my workflows are still stuck in the queue (~ 3+ hours). I will keep you posted on the progress.
Luda
I am just worried that at 8:00 pm tonight I won't be able to submit any workflows. So I have 4 hours to start my workflows and it still states in that monitor 3 hours wait, I really hope as you said it is not accurate.
Hi Luda,
Terra developer here. We're able to confirm that we had a high volume of submissions today. The system does not provide specific guarantees around when new submissions will start, but we are looking into how to make this more fair and efficient so that a single large submission does not impact other users.
Best,
Adam
Hello Adam,
Thank you for the update. I am not sure what is happening now as I do not see many jobs running. it states that only ~3K jobs are active and yet I have to wait for 4 more hours. My jobs already have been sitting for 4 hours.
Hello again Luda,
The system's global queued submission count has returned to zero, so I would expect that your submissions should be running. Please let me know if you see otherwise.
Best,
Adam
It is running. Thank you
Though some of my jobs are running, they have been running very slowly and some tasks have been queued for hours. Is there still a queue backlog?
Hi Cora,
Yesterday's queue issue was related to Rawls—in this case, the task is queued in Cromwell. One of our engineers has taken a look and has confirmed that this queueing is due to the large number of jobs submitted from your billing project.
I hope this answers your question. If I can help clarify anything else please let me know!
Kind regards,
Jason
That answered my question. Thank you for your help!
Hello Jason,
I am curious what is a large number of submissions from the same billing project (queueing is due to the large number of jobs submitted from your billing project)
My workflow is got stuck in queued in Cromwell. I see in the monitor there are 2910 active workflows. Given that most likely not all of those workflows are from the same billing project what is the upper bound on the number of workflows from the same billing project?
Thank you,
Luda
Also, I just checked our billing project and see no VMs running (I am the owner of the billing project broad-firecloud-ibmwatson). Could there be another reason for this stuck in Cromwell issue?
And just submitted the job from another billing project and it is also stuck in Cromwell.
Hi Luda,
While it may be frustrating that your jobs are taking longer than you're used to, it is normal for jobs to queue in a multi-user system that experiences variability in load.
We do not have any evidence that jobs are getting "stuck" such that they never make progress.
Best,
Adam
Hi,
My jobs were also queued for over an hour... but now the status says that they've been "running" for almost 2 hours, but I cannot see the log files or anything, and don't actually think they're running.. Is there a way to ensure they are indeed running? I should be able to see the log files and gs directories.
These issues of jobs queuing and things taking forever to start running are happening more and more frequently, and it is really frustrating, especially since I thought this system was supposed to be scalable.
Sarah
Hello Adam,
I understand that this kind of issues are bound to happen. What I am trying to understand is what causes this as I do not see an enormous amount of jobs currently running. Plus I do not think waiting for a job to start for 2+ hours is the behavior we should expect from the muti-user system.
Thank you,
Luda
Hello Adam,
So my job that takes 15 minutes to run has been stuck in Cromwell for the past 3 hours. It is faster for me to spin out my own VM and run it.
Thank you,
Luda
Hi all,
Your concerns are heard! We don't have any immediate remedies to offer, but it appears that the queue is once again on its way to resolving itself.
Hope this helps,
Adam
Thanks for that update, but what about my job that says it's running but the directory is completely empty and it's not actually running ?
Please sign in to leave a comment.