Since Terra uses a standard Jupyter Notebooks server implementation, the interface and core capabilities are more or less what you would see in any other setting. You can take advantage of the wealth of documentation and tutorials available on the Internet. Here you'll learn one thing that's truly different about how Jupyter Notebooks work in Terra versus typical local installations: How the computing environment is set up.
For more details about the key components of a Jupyter Notebook, see Part I here.
Content for this article was contributed by Matt Bookman from Verily Life Sciences based on work done in Terra for AMP PD, a public/private partnership collaborating toward biomarker discovery to advance the development of Parkinson’s Disease therapies. |
Jupyter Cloud Environments
Creating a cloud environment
Suppose a Cloud Environment (the notebook VM) does not already exist in the workspace. In that case, the notebook service will create one, along with an associated boot disk that persists until you delete the environment. This includes a Detachable Persistent Disk that can be reattached to a different Cloud Environment after deleting the existing one.
Your cloud environment's regionNotebook VMs are created in one of the us-central1
zones by default. To learn how to customize your VM's region, read Customizing where your data are stored and analyzed.
Your Cloud Environment on Terra is yours and yours aloneNo one else can view or access your notebook (a billing project owner can delete it but not open it). The reason for this is security. We store your Google credentials on the Google VM, which cannot be shared with other users.
Cloud environment costs
Cloud environments accrue costs, which fall into two categories: running a notebook and maintaining the environment's persistent disk (where the notebook's outputs are stored).
Running a notebook
Billing for your Cloud Environment begins when your Cloud Environment is created and continues until you pause or delete it, regardless of whether the VM is doing any calculations. Every time you open a notebook, a new Jupyter kernel is created. If you have multiple notebooks open and running in a single workspace, they will all consume resources (memory and CPU) in the same Cloud Environment.
The Persistent Disk
When you delete your Cloud Environment, you can choose to keep your Detachable Persistent Disk. If you do, you will incur a charge of $2.00/month (for a 50 GB disk; larger disks cost more).
Pausing a cloud environment
When you're done working and close the notebook, Terra tells Google Cloud to pause the Cloud Environment (VM) but save its state. The saved portion includes the state of the Jupyter Notebooks container, with any modifications you may have made - by installing packages, for example - and any files present on its local storage partition. You can resume working at any time with minimal effort: when you reopen the notebook, Terra restarts the VM and restores the notebook Cloud Environment to its saved state.
Save money with autopause To save you from incurring additional costs, Terra automatically saves the notebook and pauses your notebook Cloud Environment after a period of inactivity.
Inactivity includes when your computer goes to sleep unless the notebook kernel is doing operations. If your kernel is active, Terra will not pause the Cloud Environment (to prevent long-running jobs from aborting). Note: Autopause will resume after 24 hours, even if your kernel is active.
You can explicitly pause your notebook by clicking the cloud icon in the right sidebar and the pause icon under the Jupyter logo.
What happens when you pause a cloud environment?
When a notebook Cloud Environment is paused, its Compute Engine VM disappears, but the boot disk and Persistent Disk don't. When you reopen your notebook, the notebook VM is more quickly created as the disk doesn't need to be recreated and you don't need to reinstall your software.
Any running Jupyter kernel processes are gone, so the notebook state is lost. This includes calculated and other variables, including environment variables. To restore the notebook state, open the notebook and rerun the relevant cells of the notebook.
Deleting a cloud environment
If you don't need your notebook VM or cluster and want to save on the cost of the boot disk, or if you're going to pick up a new feature that requires you to rebuild your notebook VM, you can delete the Cloud Environment by doing the following:
1. Click on the Jupyter icon on the right-hand panel of any tab in your workspace:
2. Click Pause to pause the environment. This may take a few minutes.
3. Once Terra has finished pausing the environment, click Settings.
4. Scroll down to the bottom of the environment configuration menu and click Delete.
5. Choose whether to keep or delete the Persistent Disk and click Delete.
What is deleted along with the Cloud Environment?
- Boot disk
- Software that is part of the Cloud Environment (not installed on the PD)
What is deleted with the Persistent Disk?
- Installed software on the Persistent Disk
- Generated data not explicitly saved to the Workspace bucket or other external file
What to do after you delete a PDTo run your notebook after deleting the Persistent Disk, you need to reinstall any additional libraries or tools you installed previously and re-create any generated data.
What is kept when deleting the Cloud Environment (saved notebook files and data)?
Your notebooks and any data explicitly saved to your Workspace storage are still in long-term storage in the Workspace Google bucket as described in the Copy notebook output to the Workspace bucket article.
Opening a notebook
3 modes to open a notebook There are three modes in which to open a notebook:
1. Edit mode allows you to run, edit, and save changes to a notebook. Only one collaborator at a time can open a notebook in edit mode.
2. Playground mode allows you to run a notebook but not save any changes to the notebook or outputs generated by the analysis. Multiple collaborators can open a notebook at the same time in playground mode. To learn more, read Keep from overwriting shared notebooks with Playground mode.
3. Read-only mode (a.k.a. preview mode) allows you to view the contents of a notebook without running or editing it. It does not require the creation of a cloud environment, and is therefore much faster than opening a notebook in edit or playground mode. Clicking on a notebook in the Analyses tab opens the notebook in read-only mode.
What's going on under the hood?
Click each of the sections below for more in-depth information about what's going on under the hood when you open a notebook.
- If your workspace does not already have a cloud environment (aka cluster or notebook virtual machine), Terra will create one. The cloud environment includes a boot disk and a detachable persistent disk (PD).
-
After setting up the application compute, the Notebook Service will start a Docker container with all the core software that your notebook will run. Because it is a Docker, not a true VM, you can't do certain things (such as running a Docker within the Docker). Items you create inside your notebook's Docker exist as long as you don't delete the detachable Persistent Disk (or the Cloud Environment, if you don't have a detachable Persistent Disk). Inside the Docker container, your user ID is
jupyter
(orjupyter-user
for older images), and your notebook files on the detachable persistent disk are created in one of the following directories depending on whether you open the notebook in "edit" or "playground" mode. See "Edit" and "Playground" notebook modes for more information./home/jupyter/WORKSPACE_NAME/edit/NOTEBOOK_NAME.ipynb
or/home/jupyter/WORKSPACE_NAME/edit/NOTEBOOK_NAME.ipynb
-
A Jupyter extension managed by the Notebook Service copies your notebook from the workspace bucket to the Persistent Disk attached to your Cloud Environment (Notebook VM).
Note: The notebook file is copied to a workspace-specific directory in the HOME directory of the Jupyter user inside the Docker container.
-
The Notebook Server loads your notebook file from the persistent disk and starts the Jupyter kernel.
Where is everything saved?
Notebooks
When you create a notebook, the Notebook Service creates a (.ipynb) file and saves it in your workspace storage (Google Bucket).
Make sure to save your work When you edit a notebook, the changes are Changes are not saved to disk or copied to Cloud Storage unless:
1. you are in "Edit" mode and you explicitly save your notebook
or
2. the Jupyter autosave process kicks in. For Jupyter on Terra, the autosave frequency is every 5 seconds.
While there are unsaved changes, Jupyter displays a notification:
When Terra autosaves changes, you will see this notification:
Notebook outputs
When you run a code cell in a notebook, the cell execution creates an output associated with that code. The output is only stored in the Jupyter kernel runtime until you save the notebook. If you have a detachable Persistent Disk, the output is saved there.
Analysis outputs generated in a Jupyter Notebook are not copied to Workspace Cloud Storage (Google Bucket) until you explicitly save them.
For step-by-step instructions on saving outputs to the Workspace bucket, see Copying notebook output to a Google bucket.
-
When you (or Terra) save a notebook, the current copy (in the Jupyter kernel) is first saved to the notebook file on your Persistent Disk.
Then, the file on disk is delocalized (copied) to your Workspace bucket by a Jupyter extension.
Collaborating in a shared Terra workspace
You must share your workspace with a collaborator for them to see or run your notebooks. However, sharing your workspace does not entail sharing all of the resources attached to it.
Each collaborator has their own cloud environment
Your notebook VM is specific to you since each individual user has a separate notebook cluster. Any work collaborators do in the notebook will not affect the state of your own Cloud Environment.
This also means that any additional libraries and packages that you install in your Cloud Environment are specific to your environment. If you install an additional package when you run a notebook, your collaborator will have to do the same for their cloud environment.
Managing cloud environments across notebooks All Jupyter notebooks in a workspace work off of the same cloud environment. You don't have to reinstall libraries or packages for different notebooks in the same workspace.
If you pause or delete the Cloud Environment in one notebook or terminal in a workspace, it will also affect every other notebook and terminal in the same workspace.
Collaborators share notebooks
However, the notebook files in the Workspace storage (Google Bucket) are shared. The system will automatically save any changes collaborators make to the shared document in the workspace, so it's important to set expectations clearly with your collaborators about whether it's okay for them to modify the notebook or whether they should work in a separate copy.
To avoid having multiple people making conflicting changes simultaneously, Terra will "lock" the notebook document in the workspace whenever someone is actively working with it. When this happens, your collaborator can open the notebook in the read-only preview mode or "Playground" mode.
Installing software and dependencies
You may need to install libraries or tools on your notebook VM to extend the basic functionality of the kernel. Both pip install
and install.packages()
drop stuff in the $HOME/packages/
directory of the detachable persistent disk in the Docker container. Installing libraries may take a long time the first time you install them. However, because they are installed on the detachable PD, they will be available automatically when you rerun a notebook as well as when you recreate the Cloud Environment (assuming you keep the PD).