Workflow error, key not found

Post author
Justin Rhoades

I'm trying to rerun a pair set through the Mutect2-GATK4 workflow I've run several times before and am getting a workflow error that states "key not found: DockerInfoRequest(DockerImageIdentifierWithoutHash(None,None,python,2.7),List(PipelinesApiDockerCredentials(...".  I also see that the workflow shows as failed but the task in job manager shows it as still running.  My questions are 1) can you check if these failed workflows still have tasks running or not and 2) what does this error message mean and how can I debug it?  Thanks for your help!  The workspace should already be shared.

Workspace: blood-biopsy/early_stage_BC_whole_genome_analysis

Submission ID: e689bc7c-acc6-4168-bcab-a23300b8888f or 8d529a76-8e62-48b2-a79d-a7f731af7f18

 

Best,

Justin

Comments

43 comments

  • Comment author
    Ruchi Munshi

    Is the Docker you're using private? If not, I don't need your workflow or workspace, just docker image name. Thanks for your help debugging!

    0
  • Comment author
    Samuel Freeman
    • Edited

    Hi Ruchi,

    I am using the public docker image sfreeman/gitc_plusmixcr:v3 

    0
  • Comment author
    Ruchi Munshi

    Thank you so much Sam for the docker image! Just before I start reproducing your issues -- are you seeing that tasks 100% of the time fail on Terra for that particular docker image?

    0
  • Comment author
    Samuel Freeman

    Yes, 100% of my workflows using this docker fail with the "key not found" error.

    However, all of the tasks in my workflows using this docker keep running after the workflow fails and eventually complete and write their output, but the outputs are not linked to any attributes because the workflow has already failed.

    0
  • Comment author
    GE

    @Adam Nichols - Per my above comments, I strongly suspect this is *not* an issue with Docker Hub, for a few reasons:

    1) Docker Hub is no longer reporting an outage and there are no other users online reporting issues with Docker Hub at the moment.

    2) Docker has no issues on local computers on my tests, it is working perfectly.

    0
  • Comment author
    GE

    Most importantly -- the KeyNotFound error is causing specific tasks to fail 100% of the time - not sporadically as you wrote previously. At the same time, other cromwell tasks that use the same exact docker images as the tasks that fail are systematically finishing successfully 100% of the time. Systematic 100% failures of some cromwell tasks (after 10 trials) at the same time as other tasks complete successfully 100% of the time when both use the *same* docker images indicates a bug in how Terra is communicating with Docker, not a sporadic issue with Docker.

     

    0
  • Comment author
    GE

    My strong suspicion is that after the Docker hub outage, docker changed something in its codebase that is affecting how Terra communicates with docker hub. This in turn is causing a systematic bug in Terra running tasks. I urge you all to take a close look.

    @Ruchi Munshi - I can't share workspaces due to HIPAA issues. But I have sent Sushma detailed screenshot showing this. And the docker image is: broadinstitute/genomes-in-the-cloud:2.3.1-1512499786

    0
  • Comment author
    Adam Nichols

    Hi GE,

    You have repeatedly posted with great confidence about what you think the root cause is.

    The team developing the product has access to considerably more information, such as Cromwell internal logs that are full of the error "Timeout looking up docker hash".

    Please trust that our interests are aligned in finding a solution as soon as possible.

    Best,

    Adam

     

    0
  • Comment author
    Adam Nichols

    Hi all – please try again now.

    0
  • Comment author
    Sushma Chaluvadi

    Hello All,

     

    The 10/15 Docker Hub outage had an impact on Cromwell that extended into 10/16 and 10/17. This impact has now been resolved.

     

    Please confirm if you are able to successfully run Workflows.

    0
  • Comment author
    Sushma Chaluvadi

    Hello All,

     

    The 10/15 Docker Hub outage had an impact on Cromwell that extended into 10/16 and 10/17. This impact has now been resolved.

     

    Please confirm if you are able to successfully run Workflows.

    0
  • Comment author
    Justin Rhoades

    Workflows seem to be running for me again.

    0
  • Comment author
    Adam Nichols

    Hi Justin - thank you for the feedback, we apologize for the inconvenience.

    0

Please sign in to leave a comment.