Where to check if a base image is cached Planned
Cached images are introduced to make the issue of base images being unnecessarily huge less painful.
However, it is never clear where one can track which versions are currently cached. I apologize if this documentation already exists.
Can you please ask the engineers to help provide such tracking?
Thank you!
Steve
Comments
6 comments
Hi all, I wanted to share that we hope to better support this use case in the near term. Please follow our new roadmap article to follow updates on our notebook experience improvements: https://support.terra.bio/hc/en-us/articles/31191625622811.
Hi Steve,
Thank you for writing to Terra Support!
All available images available in the dropdown when you're configuring a new cloud environment are cached. Are you looking to check if particularly old versions are cached?
Best,
Anthony
Hi Steve,
I appreciate your patience as I poked around for more information! I have some additional details to share with you.
We do not cache any of the terra-jupyter-base images as we do not use them to spin up VMs or Dataproc clusters in Terra (it is only used as a base image so we can build more specific Python, GATK, or Hail images on top of it). We do not have version tracking for these images (we are testing ways of improving on this), but these cached versions in Leonardo are specified here.
Others have encountered the timeout error you described, and I confirmed that solutions are in the works. The Interactive Analysis team is working on providing a base CPU image that is much smaller (9 GB instead of 18 GB) so that you should not run into the timeout issues. With a smaller base image, integrating with terra-docker should be a lot easier; this way, your image will always be cached.
I hope this provides some additional clarity! We hope that this fix will be available soon. Please let me know if there is anything else that I can help you with in the meantime.
Best,
Anthony
Hi Anthony,
Thanks for looking into this. And that link to the shell script is very helpful.
As an alternative, can you ask if the max timeout be increased to even longer?
=============
Now, regarding the minimization of dockers.
It's simply taking a really long time for this minimization of docker to happen. Some folks even made efforts to open PRs for that, over a year ago.
https://github.com/DataBiosphere/terra-docker/pull/346
Yet in the meantime, the base image has become larger and larger.
These base images were absolutely cached before. I know because I test-created environments from them before, and the creation took less than 5 minutes. But somehow they are no longer cached. We don't know why.
All these have led to much more frequent timeouts. I just tried creating a custom environment multiple times in hopes that I get lucky, but each time it failed.
With the base image being so large, and the fact that it's no longer cached, users effectively cannot build custom envs anymore.
Here's the evidence that the base images used to be cached
https://github.com/DataBiosphere/leonardo/blame/4e90c32557cf9a37fa7510a8c70a6a97aa15c2c9/jenkins/gce-custom-images/prepare_gce_image.sh#L20
One place to check which base images are cached (other than the dropdown menu) seems to be here (from a slack explanation by Liz Baldo):
https://github.com/DataBiosphere/leonardo/blob/4cb83ec06f3ab0c98acbcf9eca01e2c3971ab58b/jenkins/gce-custom-images/prepare_gce_image.sh#L19
Please sign in to leave a comment.