Service Incident - March 1, 2021 (PipelinesApiRequestHandler & long queue times)

Jason Cerrato

Summary

We've identified an issue affecting some user workflows. User workflows are failing with the following error:

PipelinesApiRequestHandler actor termination caught by manager

Some user have also reported that their submissions are stuck in Queued status.

See the Timeline section for the latest troubleshooting and resolution updates and the Impact section to understand how this could impact your use of the system. 

Timeline 

March 1, 2021 7:25 PM ET - Issue resolution - The long queue issue is confirmed to be self-correcting in Cromwell, and all submissions should no longer be queued by late March 1. The engineers will perform a retrospective on the situation and determine whether any improvements can be made to make sure this doesn't happen again.

March 1, 2021 4:00 PM ET - Issue investigation - Some users are still seeing long queue times for their submissions. If you are experiencing this issue, please do not abort your submission. Our engineers are currently investigating the underlying cause.

March 1, 2021 1:00 PM ET - Issue resolution - The rollback is complete, and we are no longer expecting to see users running into this error message. The queue time issue is currently self-resolving, and jobs should start kicking off in normal time windows shortly.

March 1, 2021 12:37 PM ET - Update - We are seeing an indication that errors have all but fully stopped cropping up at 12:37 PM ET. We are continuing to monitor a handful of residual lingerers and will update this page once the issue is confirmed to be fully resolved.

March 1, 2021 12:16 PM ET - Issue remediation - Google has identified the underlying issue and is currently working to roll back to a working state. We are awaiting confirmation that the rollback is complete.

March 1, 2021 11:40 AM ET - Issue investigation - Our engineers are continuing to investigate the root cause of the issue and are in communication with Google.

March 1, 2021 11:05 AM ET - Issue discovered - One of our engineers received this error message for their workflow and alerted our Batch team for investigation. The Batch team started investigating right away.

Impact

Some users may see their workflows fail with the error message above. Users may also see longer queue times for workflow submissions.

For more information

Please follow this article to get the most up-to-date information on this incident. If you would like to be notified of all service incidents or upcoming scheduled maintenance, click Follow on this page

<

Was this article helpful?

5 out of 5 found this helpful

Have more questions? Submit a request

Comments

2 comments

  • Comment author
    Jason Cerrato

    Hi Maíra R. Rodrigues,

    Yes, it should be safe to relaunch at this time! Please write to us at support@terra.bio if you still experience any issues.

    Kind regards,

    Jason

    0
  • Comment author
    Maíra R. Rodrigues

    Dear Jason,

    I've got a workflow failed status due to "Unable to complete PAPI request due to system or connection error (PipelinesApiRequestHandler actor termination caught by manager"

    Is it already safe to relauch the workflow?

    Thank you in advance!

    Best,

    Maíra

     

     

    0

Please sign in to leave a comment.