Moving from your local cluster to Terra

Post author
Anika Das

The situation:

"I already have a workflow using Jupyter notebooks that works well for me on my on-premises cluster, so I'm hoping to replicate that as closely as I can to minimize the overhead of switching platforms. The reason I'm switching is that I have found my on-premises cluster to be limiting for interactive work due to the 36 hour time limit on jobs and the variable wait times in the queue to dispatch new jobs. I really just need a computer with sufficient memory to run a Jupyter notebook server that can stay up for more than 36 hours and that is ready when I need it."

 

Do I have free reign to install any software I need?

Yes, you are free to install anything you want on the notebook environment via pip install/R or conda. There are also other ways to customize the environment by providing custom startup scripts or Docker images. This page has some information: https://support.terra.bio/hc/en-us/articles/360038125912-Understanding-and-Customizing-your-Cloud-Environment

 

Are there Google Cloud products that are incompatible with Terra?

Notebooks run on GCE VMs or Dataproc clusters; we support detachable persistent disks; and many users access data in GCS or BigQuery. 

 

Can I set up an environment in a VM image or Docker container and then run my instance off of that?

You can launch a Jupyter environment in a custom docker container. How it works is you extend a terra base image, push it to a container registry (we support GCR, Dockerhub, or GitHub Container Registry), and then plug in the image URL to Terra UI when creating a runtime.

 

Is Terra compatible with Jupyter Lab in addition to Jupyter Notebook? 

Today we officially only support Jupyter notebooks, but adding JupyterLab is on our roadmap.

 

Is it possible to install other kernels(besides Python and R notebooks), for example Julia? 

Only Python and R kernels are available by default, but it should be possible to install Julia via a custom Docker image.

 

Regarding preemptible instances, I know they are limited to max 24 hours of runtime. I want to explore whether the preemption rate is low enough that I could work on one for a day, save the VM state overnight, then resume on a new preemptible instance the next day. 

You can launch a Dataproc cluster with preemptible workers in a workspace’s interactive cloud environment, but not a single preemptible VM. Single preemptible VMs are available for running batch workflows. However, there are some ways to control cost in the interactive cloud environment: VMs are auto-paused when idle; and you can delete your VM but detach your disk to save state and only pay for storage.

 

I've heard that using a tool called Fuse you can mount a GCS bucket as a file system on your VM; do you know whether this is possible with instances running on Terra?
GCS fuse is not supported today -- we've experimented with it in the past but encountered some performance and security issues. Our users typically use client libraries to read/write data from GCS. terra-notebook-utils is a library we maintain; of course standard GCS client libs are available as well.

Comments

0 comments

Please sign in to leave a comment.