Summary
The issue was found at approximately 10:45 AM ET on March 9, 2020 and impacts users trying to submit jobs through Cromwell. Users will see jobs queued for a longer-than-expected period of time, with the Queued
or QueuedInCromwell
status. Once these jobs have started, statuses may be slow to update, but the underlying PAPI VMs are running normally and statuses will eventually become consistent.
See the Timeline section for the latest troubleshooting and resolution updates and the Impact section to understand how this could impact your use of the system.
Timeline
March 11, 2020 10:00AM ET - Issue resolution - Our engineers have confirmed that this issue is now resolved.
March 10, 2020 4:20PM ET - Hotfix released - Our engineers released a hotfix for the issue, and will follow-up with additional cleanup and configuration improvements.
March 10, 2020 10:15AM ET - Hotfix in-progress - Our engineers are working on a hotfix to resolve the issue.
March 9, 2020 ~3 PM ET - Issue mitigation - Our engineers started working on adjustments to reduce the amount of time users should see their jobs queued.
March 9, 2020 10:45 AM ET - Issue discovered - Our engineers detected heavy load on Cromwell leading to long queue times for submissions.
Impact
Users may experience longer-than-typical queue times for submissions until the issue is resolved. Additionally, jobs that do submit will have statuses that are slow to update.
For more information
Please follow this article to get the most up to date information on this incident. If you would like to be notified of all service incidents or upcoming scheduled maintenance, click Follow on this page.