Job failing with exit 1 but displayed as succeeded in Terra
I have a submission that I think is running out of memory and fails with exit code 1. But Terra is still exiting gracefully and displaying that the job succeeded. Has anyone seen this before?
Also is it possible to update the status? I could not find it in the public api. Is there a private api for updating the status of a submission. I would like to force fail s submissions that failed.
Thanks,
Ilya
Traceback (most recent call last):
OSError: R fail
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
SystemExit: 1
Comments
13 comments
Ilya,
It would be easiest for us to determine what is causing this issue if we had access to your workspace! I believe you may have shared it with us previously but if not can you share the workspace and share the name/link?
Thanks!
I would rather not provide workspace names and links in this forum. Can I have a email address to send them to? I sent an email for a quota request and you could find my email there. I sent the email to:
fc-quota-requests@googlegroups.com
Thanks!
You can send an email to Terra-support@broadinstitute.zendesk.org and we will get an internal email!
Hi, I sent an email and did not receive a response. Now I got a message that my email bounced.
Hello Ilya -
My apologies for the mistake in the email - a moment of copy paste gone wrong! Can you share your workspace with the user
GROUP_FireCloud-Support@firecloud.org. We will be able to access the workspace once you add this user as a Writer atleast to your workspace. A link to the workspace is not necessary but a name of the workspace would be great. I can look into this asap.
I sent the information to the above email. Please let me know if it was received.
Thanks,
Ilya
Ilya -
The email is a user email that the workspace needs to be shared with. Here is a screenshot:
1. Click on the 3 dots to hit Share -
2. Then add the user GROUP_FireCloud-Support@firecloud.org as Writer -
This is the method by which you can *share* your workspace so that we can get access to look inside the logs and submissions that you ran.
Thanks
Done.
Ilya,
Apologies for the back and forth but I am trying to look at this workspace and I see that it has an Authorization Domain, a group of which the above user is not member, and will need to be added to in order to enter the workspace. Can you add the GROUP_FireCloud-Support@firecloud.org also to the VividGenomics group? I believe you may have added my personal email before but the user above will allow our technical team to access as well.
Done
Ilya,
We took a look at the stderr and the script and it looks like the Terra UI shows the submission as a Success because the final step `rm -rf geno-prefix pheno-file` results in the Exit code which is getting reported but the rest of the steps are failing. So since the last step is rendering as successful the UI shows it as a success.
You can remove the rm step since any file not part of the `output` section is automatically removed when the task exits. By removing this step you should accurate visual of Failed in the UI.
Makes sense. Good to know that the last exit code is the one read by Terra. Thanks you!
For what it's worth, what we do is add the following to the beginning of each bash shell code block to ensure that a Terra workflow step exits immediately with failed exit code status if a command fails:
We also add the following to the header to force piped commands to fail with a non-0 exit code if any of the commands in the pipe fail:
See more documentation on bash here:
https://www.gnu.org/software/bash/manual/html_node/The-Set-Builtin.html
Please sign in to leave a comment.