Is it possible to have a task directly modify an input file?
While working on converting UWGAC workflows from CWL into WDL as part of BioData Catalyst, I came across one that caused a permissions issue. For weeks I thought this was due to the fact I was using symlinks, but I now believe the issue is that the workflow calls an Rscript that attempts to modify the input file directly.
According to my tests, Cromwell, even in "local mode," localizes input files with rw-r--r-- permissions. This is the case whether they come from a gs:// URI or from another task. So, when Cromwell is run as root, the files can be edited directly. But on Terra, you (quite understandably!) do not have root permissions, meaning that you cannot edit input files directly. At least, that's my theory.
I created a simple Python script in a WDL that demonstrates this a little more simply than the UWGAC workflow I'm converting. The Python WDL passes when run locally, presumably because I am running as root, but it gives a permissions error and fails on Terra. https://github.com/aofarrel/upon-thine-inputs
Two questions:
1. Is my hypothesis correct? I have searched the Cromwell and openWDL spec repos, but thus far have not been able to find documentation on this.
2. For the UWGAC workflow I am converting, I cannot modify the Rscripts; we want to use the exact same Docker image that the CWLs use and the Rscript is in that image. It seems that the only possible workaround is to cp the input file into the execution directory, and point the Rscript to that new copy (which presumably has wider permissions). Terra (again, quite understandably) does not grant me mv or chmod permissions, so it seems this is the only possible workaround. But is there another way that doesn't involve duplication?
Comments
19 comments
Hi Aisling,
Thanks for writing in and detailing your issue. We'll take a closer look at this and get back to you as soon as we can!
Kind regards,
Jason
Hi Aisling,
Apologies for the wait here, we've been experiencing a higher-than-usual number of support requests. I will work to get you answers to your questions before end of day.
Kind regards,
Jason
Hi Aisling,
For question 1, when you run a workflow, Cromwell makes local copies of your input files in the Google VM it spins up and you should be able to make edits and move these files as needed. I see in your Github repo that you're getting the permission denied error. One of our engineers performed a test using the same commands in their WDL and were able to successfully make edits to that file.
They were able to perform this successfully when they were using the python:latest Docker for the task. When they tried to use your docker, they ran into the same error you did. They ran a whoami command and saw that the command was executed through topmed rather than root.
You should be able to get around this by either changing the Docker you use or by editing the Docker so it runs commands as root instead of as topmed.
For question 2, if your script requires the file to be located at a particular location you can definitely mv the file. Here is an example of running a mv command on a file within a task.
Moving the files to the execution directory will keep the same permissions -rw-r--r--, so you should be able to chmod the file as well if needed for your Rscript (assuming your Docker is set up to run as root).
I hope this helps! If you have any questions please let us know.
Kind regards,
Jason
Thank you for your response and testing, it's helpful to know this has something to do with the container permissions. However, it seems to raise even more questions. When I run the WDL locally via Cromwell in the topmed container, I can modify the file directly. Although Cromwell in local mode ignores most runtime attributes, it has to use a Docker container otherwise none of my local tests of the pipeline I'm converting would work (as they call a script in the image) Even weirder is that if I add a whoami to my WDL and run it locally, it says that I'm running as topmed, which matches what I'm running as on Terra. And yet, it seems that when run locally, the topmed user suddenly has root permissions? I'm not sure if this is even a quirk of Cromwell, as the original pipeline where I can into this error is using the exact same topmed Docker in its CWL, and the CWL version of the pipeline is able to modify files directly.
It doesn't seem to make sense for a CWL, a local WDL, and a Terra WDL that all use the same Docker image as the same topmed user, and to end up with different file permissions.
The only idea that I have is that the topmed user might have write permissions, but perhaps Terra sees that it isn't root and limits those write permissions, but would allow a root user write permissions as demonstrated by the Python Docker being able to edit the file on Terra...?
Hi Aisling,
We are investigating this further and will get back to you ASAP!
Kind regards,
Jason
Hi Aisling,
Can you add an ls -l to your command to list the input permissions and run the workflow on your local Cromwell instance so we know what permissions are required to edit the input when run locally?
Can you also confirm whether topmed has root or escalated privileges on your local system? It's possible that you are able to edit the input files locally only because topmed has escalated permissions on your system, but it doesn't on the Terra-created VM instance.
Kind regards,
Jason
I made some edits to my workflow to print this information:
https://dockstore.org/workflows/github.com/DataBiosphere/analysis_pipeline_WDL/vcf-to-gds-wdl:debug-permissions?tab=files
Two quick notes:
* The host system I run locally is monouser with root; the topmed user only exists in the container.
* My folder names locally start with "bark-bark" because I modified my Cromwell config to do that in order to ensure my more important Cromwell config edits were taking place, and because I like dogs.
When I run that locally, here's what I get:
Interestingly, when I run this modified workflow on Terra, whoami still resolves to the topmed user, and the topmed user still is in the sudo group.
I might be barking up the wrong tree here, but it seems that when Terra localizes files, those files are considered owned by root instead of the topmed user (which is considered the owner on local and, presumably, Seven Bridges when running the CWL counterpart of this). That would explain why earlier you were able to modify those files in the Python based image; that image runs as root and root owns those files. But in any Docker image that isn't root, it looks like modifying input files isn't possible on Terra?
Assuming that's correct, it's not clear to me why local-Cromwell would localize files as a different owner than Terra-Cromwell. I ran this pipeline twice locally, once pointing to local inputs, and once pointing to the same gs URIs used by Terra (the Dockstore CLI has a plugin that allows for this on local runs). In both instances the results were the same.
Hi Aisling,
Thanks for following up with those details. We're running a couple more tests and we'll get back to you as soon as we can with more information.
Kind regards,
Jason
Hi Aisling,
We believe your assessment is correct. You are able to edit the input files when the workflow is run locally because when they are localized locally they are localized under the topmed user, rather than root. When they are localized in a Terra workflow, they are localized using the root user.
Since you mentioned that the topmed user has sudo access, we were wondering if you would be able to run a sudo command to change file ownership to topmed in your workflow. Once the permissions are changed, the python script should be able to run normally on the files, and effectively run the same way it does locally. If you decide to give this a test, let us know.
I'll look to get you an answer about the difference in behavior between local Cromwell and Terra Cromwell.
Kind regards,
Jason
Hi Aisling,
The team informed me of a workaround you can use for your situation. In your WDL, you can run the command sudo su - root prior to the commands that run that modify your input files. This should allow you to make your necessary changes without more dramatically changing your code.
Kind regards,
Jason
Am I understanding the suggestion correctly? Is it this?
This results in this appearing in the logs:
...and the task failing with the same error.
Hi Aisling,
Hmm do you know at what point in the script it's failing with this message? Do you get the same result if you run a command to change the localized files to be owned by topmed prior to running your script?
Kind regards,
Jason
The R script I am running is using openfn to open a gds file. Normally this function opens in read-only mode, but the R script specifically disables this, thereby attempting to open the file in a way that grants write permissions. Terra, which localizes the files in the way it does, blocks this as the files do not have write access with regard to the topmed user. I cannot edit the Rscript nor the docker image, as the whole point of this WDL is to be 1:1 to the CWL version, which uses a particular docker image with the Rscript inside of it. However, as I mentioned before, this *does* work on local Cromwell, so clearly something is being handled differently between platforms even though the user is topmed in both cases and the files are read-only with regard to topmed in both cases.
As far as I'm aware there is no way for me to run a command to change the ownership of the localized files on Terra. Everything I've tried either gives the ttyname error or operation not permitted.
Hi Aisling,
A member of our team was able to replicate the error. It seems
sudo su -
works if you run the command interactively (thread), but runs into trouble when being run as a command in a Docker in a non-interactive mode.Instead of trying to switch to root, you can change the permissions of your input files to be accessible by anyone using chmod. So it would go into your command like this:
I have already re-written the task in question to avoid this by simply duplicating the input and running the Rscript on the duplicate, but my new WDL (exact same Docker image so permissions still apply) is not so easily fixed due to how I glob the output, so I'm revisiting this. I've tried chmod 777 before and it didn't work, but this time I tried it exactly as written here to include the pipefail. Nevertheless it still isn't working.
I am wondering if it has to do with how the files are getting modified. In the example I gave earlier, it was an Rscript modifying the files. In my new code, I am using os.rename() via inline Python. As with before, this works perfectly fine locally but sends a permission denied error on Terra, even though both cases run as the topmed user. Is it possible that execution of scripts has different permissions?
Here is my current task. The error is OSError: [Errno 13] Permission denied. The relevant stdout is screenshot below the task screenshot -- note that the permissions do seem to have been changed after the chmod, but Terra seems to be ignoring that change, somehow. I do know the problem is not that Terra cannot edit *any* files as it can modify anything that isn't an input just fine. Duplicates of inputs, no problem, but since I need to glob the output as File renamed_variants = glob("*.gds")[0], and I cannot set a non-input variable as an output in Terra, and these permission errors would likely apply to trying to delete the original input too, it seems this bug(?) is a hard blocker.
Hi Aisling,
Thanks for the update. We'll take a look and get back to you as soon as we can.
Kind regards,
Jason
Hi Aisling,
One of our engineers did a test with a modified version of your workflow. They added a
ls -lha .
step, which resulted in:/cromwell_root/
is owned by root and the permissions are automatically set for it such that anyone can read/write/execute. However, for the input files that are nested in the localized directory (seen above as fc-8bf3be10-9439-4686-b9c4-53b7ef59c956), these are owned by root and only root can write to them. When they tried creating a file in that dir using python they got the same os permissions error you did.find . -type d -exec sudo chmod -R 777 {} +
near the beginning of their command block. This command looks for any directories in the current directory (cromwell_root) and changes their permissions so that anyone can read and write. (This also leaves files in the cromwell_root alone, so stderr and stdout files still keep their original permissions).Please sign in to leave a comment.