Starting on October 15, workflows in Terra that use images from Docker Hub began experiencing sporadic failures due to ongoing Docker Hub instability.
Affected workflows fail with an error message referencing “key not found” or similar. Because the failures are sporadic, a workflow may run successfully the first time and then fail with the error the next time. The Terra team is unable to do anything to improve the situation with Docker Hub. We cannot give an estimate as to when normal operations will resume.
However, we can suggest a workaround: use images from other repositories in your workflows. User reports are unanimous that after removing Docker Hub images, workflows proceed as expected. Please note that your "Proxy Group" (listed under Profile) will need to have read access on the Google bucket where the GCR image is hosted. Here is an article that outlines the steps on pushing a Docker Image to GCR. It just so happens that pulling images from GCR is faster and cheaper!
Note that if your workflow contains multiple tasks and/or subworkflows, there will be more than one place to update. Even very generic, commonly used images like docker: “python:2.7” are coming from Docker Hub and will need to be replaced.
Finally, we can confirm that in cases where a Task looks to continue running though the Workflow presents as failed, no charge will be incurred.
Updates directly from DockerHub can be found here: status.docker.com