Inaccurate job status: Terra says job still running but job succeeded

Post author
Gwen Miller

I've run STAR on a bunch of samples, and if I look at each sample I see that the sample has run successfully yet the status of the job is "running." For example:

The task has succeeded, but the status of the job is stuck on running and none of the columns in the sample data frame have been updated with the output of the successful job.

 

I'm not sure if it's related, but I also noticed that the "Last Changed" column hasn't been updated since yesterday around noon (Mar 16 12:29 PM) for any of the samples I ran.

Comments

11 comments

  • Comment author
    Jason Cerrato

    Hi Gwen,

    Thanks for flagging this up. If this workspace does not exist under an authorization domain, can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace (see the icon with the three dots at the top-right)?

    1. Add GROUP_FireCloud-Support@firecloud.org to the User email field
    2. Click Add User
    3. Click Save

    We'll take a look and get back to you as soon as we can.

    Kind regards,

    Jason

    0
  • Comment author
    Gwen Miller

    Hi Jason,

    Unfortunately this workspace does have an authorization domain, so I cannot share the workspace with you. What other information would be useful to you?

    Best,

    Gwen

    0
  • Comment author
    Jason Cerrato

    Hi Gwen,

    Our engineers were able to examine this workflow and are currently investigating the bug related to the job status. The job does show as having "Success" status on the backend.

    Thank you for reporting this.

    Kind regards,

    Jason

    0
  • Comment author
    Gwen Miller

    Hi Jason,

    Any updates on the progress of this issue and potential fixes? I would like to be able to continue processing my data as soon as I can.

    Thanks,

    Gwen

    0
  • Comment author
    Jason Cerrato

    Hi Gwen,

    Word from our engineers is that you should still be able to run jobs—resolution of this bug is not required for you to continue submitting jobs. The thing you will want to make sure to do to avoid getting this incorrect status is to make sure that you don't have Delete intermediate outputs selected for workflows that don't produce output, as this is what causes the incorrect status bug that's being worked on.

    If you run a workflow without this selected, it should run without the status issue.

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hi Gwen,

    In addition, our Cromwell team will take a look at the original submission you wrote in about and make sure the status is showing the same as we're seeing on the backend, just so there's no confusion there.

    Kind regards,

    Jason

    0
  • Comment author
    Gwen Miller
    • Edited

    Hi Jason,

    Thank you for the informative updates. I'm a little confused why I'm running into this bug, given that the issue should only appear for workflows without output when Delete Intermediate Files is checked, because the workflow I ran should produce several outputs (click here for full WDL).

    You are correct that I could run new samples through this workflow regardless of the error. However, I cannot continue the process of analyzing these RNASeq samples because I need to use the outputs of the this workflow as inputs for subsequent workflows. The job is still stuck with the "running" status, and the appropriate columns in the sample TSV haven't been updated with the outputs of the workflow. This means that I can't continue with the next steps in processing the files.

    Thanks,
    Gwen

     

    0
  • Comment author
    Jason Cerrato

    Hi Gwen,

    Thanks for that clarification—it's good to know that you are expecting outputs that you aren't seeing written to the data table. We're taking a closer look to see what's going on with this job and we'll get back to you as soon as we can.

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hi Gwen,

    Our engineers have investigated your submission and have identified it as being affected by a known bug with workflows submitted using the "Delete intermediate outputs" option. The bug happens when the box is selected and there are 0 intermediate outputs to delete.

    Please note that there are no additional costs associated with this job being stuck in the "Running" state.

    If you resubmit your job using call caching and without checking the "Delete intermediate outputs" option, the job should complete successfully (assuming the workflow is written and configured correctly). If you instead decide to wait, the expected outputs will be written at the time of the bug being resolved. Our recommendation is that you re-run the job without the box checked to get unblocked as soon as possible.

    We will provide you with an update once the bug has been fixed. Please note that for these types of workflows, the "Delete intermediate outputs" option is not helpful, and won’t be in the future once the bug is fixed, as there were no intermediate outputs to delete!

    If you have any questions, please let us know!

    Kind regards,
    Jason

    0
  • Comment author
    Jason Cerrato

    Hi Gwen,

    This is a notification to let you know that the fix for the Delete intermediate outputs bug is scheduled to go out on Monday. At that time, Cromwell will restart, and the stuck job(s) will resolve their state of their own accord.

    Kind regards,

    Jason

    0
  • Comment author
    Jason Cerrato

    Hi Gwen,

    This job should be showing the correct state at this time. If you find this is not the case, please let us know!

    Kind regards,

    Jason

    0

Please sign in to leave a comment.