Need Help?

Search our documentation and community forum

Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Terra powers important scientific projects like FireCloud, AnVIL, and BioData Catalyst. Learn more.

Job seems stuck indefinitely at the delete intermediate files step and does not complete

Comments

6 comments

  • Avatar
    Jason Cerrato

    Hi Giulio,

    Thanks for flagging this up. We'll be happy to dig into this a little more and see what's going on here. I'll be in touch with updates!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Giulio Genovese

    It looks like someone did something ... the job used to say "Started Jul 22, 1:20 AM" and now it says "Started: Today, 5:33 PM" and "Ended: Today, 5:34 PM (0h 0m)":

    And it claims to have failed while trying to read some variables in the workflow (that were previously successfully read) with read_tsv() and read_lines() as the files do not exist. The files, of course, do not exist because they are intermediate files and were deleted. It looks as if the job restarted after some of the intermediate files got deleted.

    0
    Comment actions Permalink
  • Avatar
    Khalid Shakir

    > It looks like someone did something

    > The files, of course, do not exist because they are intermediate files and were deleted. It looks as if the job restarted after some of the intermediate files got deleted.

    Exactly. I tried to restart your workflow, but the restart ran into the newly noticed issue you outlined above: restarting a workflow with partially deleted intermediates fails the workflow.

    Two things:

    • Even though the workflow is now "Failed", the workflow outputs should still be available. Let us know if you can't access the GCS paths via the Web UI and we can help you with the terra REST API for "Get workflow outputs." that should work.
    • As you noticed only some of the intermediate files were deleted by Cromwell. There are still others left in GCS. The current version of "delete intermediates" will NOT go back and try to clean up the rest. So if you want those intermediates deleted it will need to be through some other procedure for now. We have in our backlog a feature to delete intermediates for previously completed workflows, not for only just-finished workflows, but it's still a ways off.

     

    0
    Comment actions Permalink
  • Avatar
    Giulio Genovese

    Thank you Khalid. I am fine for this particular workflow. I wrote the WDL and I know exactly what it does so I know how to handle this. I am more worried about this happening to other users that will be using this WDL on their data. Do you have a guess about what went wrong while deleting intermediate files? Is it related to the large number of (small) intermediate files?

    0
    Comment actions Permalink
  • Avatar
    Khalid Shakir

    > Is it related to the large number of (small) intermediate files?

    Yes. The large number of files to delete hit a pathological combo of issues inside Cromwell. This is already being looked at by the Terra/Cromwell team with an early patch already in review, hopefully being released in the next few days.

    The related issue of restarting a workflow in the process of deleting intermediates has not been triaged, so I'm not sure yet what the team has planned for that case.

    0
    Comment actions Permalink
  • Avatar
    Giulio Genovese

    Okay! There is no immediate hurry on my side. I am very happy to know that a solution is in the making. I know I am pushing the limits of Terra a bit, but overall I have been very positively impressed by what was actually achievable in Terra and I think the users that will end up using this very complicated workflow I wrote will also agree on this. I wish a great weekend to the whole Terra/Cromwell team. :-)

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk