I am running a prototype to see if how my lab can use terra.
I ran a small test and found that it took about 6 hrs to run our pipeline on 10 samples. We use preemptible containers. Overall the cost was really low
For the next part of my test, I plan to process about 400 samples. Here are some question about how best to go about this
1) my workflow does not use scatter or need scatter. My 10 sample test caused 10 containers to spin up at the same time. If I selected all 400 samples at once would this impact the availability of terra to other users? What about if I want to run on all the 17,000 gtex samples?
2) given we are using preemptible containers I assume some will fail. Is there any easy way to select and re-run them?
3) is there any easy way to know when everything is done? I do not want to have to babysit my batch jobs
4) in an ideal world the total wall clock time for running all samples would be about the same for running 10 and the overall cost would be linear. What should I expect?
Please sign in to leave a comment.