Terra uses a standard Jupyter Notebooks server implementation, so the interface and core capabilities are more or less what you would see in any other setting. As a result, you can take advantage of the wealth of documentation and tutorials available on the Internet for learning how to use the various menu options, widgets and so on that we're not going to cover in detail here.
The one thing that is truly different about how Jupyter Notebooks work in Terra versus typical local installations are the details of how the computing environment is set up. This article walks through what happens when you perform key operations with your notebooks in Terra. Understanding what is happening behind the curtain can help you avoid pitfalls while making your notebook analysis process more efficient.
For more details about key components of a Jupyter notebook, see Part I here.
|Content for this article was contributed by Matt Bookman from Verily Life Sciences based on work done in Terra for AMP PD, a public/private partnership collaborating toward biomarker discovery to advance the development of Parkinson’s Disease therapies.|
Creating a notebook
When you create a notebook, the Notebook Service creates a (.jpynb) file and saves it in your workspace bucket:
Opening a notebook
When you open a notebook, the Notebook Service executes several steps in order:
- Create a Cloud Environment (aka cluster, or notebook VM) if one doesn't already exist for your Cloud project - this includes a Boot Disk and a Detachable Persistent Disk
- Start a Docker container
- Localize the notebook to your VM or cluster
- Open the notebook in Jupyter kernel
1. Create a Cloud Environment
If a Cloud Environment (the notebook VM) does not already exist for you in the workspace, the notebook service will first create one, along with an associated boot disk (that persists until you delete the environment) and a Detachable Persistent Disk (that can be re-attached to a different Cloud Environment after deleting the existing one).
Note on billing when running a notebook: Billing for your Cloud Environment will begin when your Cloud Environment is created and will continue until you pause or delete it, regardless of whether the VM is doing any calculations. Every time you open a notebook, a new Jupyter kernel is created. If you have multiple notebooks open and running in a single workspace, they will all consume resources (memory and CPU) on the same Cloud Environment.
Note on billing for Detachable Persistent Disks: When you delete your Cloud Environment, you can choose to keep your Detachable Persistent Disk. If you do, you will incur a charge of $2.00/month (50 GB disk).
Note on region/zone of notebook VMs: Notebook VMs are created in one of the
Note that your Cloud Environment on Terra is yours and yours alone. No one else can view or access your notebook (a billing project owner can delete it but not open it). The reason for this is security. We store your Google credentials on the Google VM, which cannot be shared with other users:
2. Start Docker container
After setting up the application compute, the Notebook Service will start a Docker container with all the core software that your notebook will run. Because it is a Docker, not a true VM, you will not be able to do some things (such as run a Docker within the Docker). Things you create inside your notebook's Docker will exist as long as you do not delete the detachable Persistent Disk (or the Cloud Environment, if you don't have a detachable Persistent Disk). Inside the Docker container, your user ID is jupyter-user and your HOME directory (on the detachable persistent disk) is
3. Localize notebook
A Jupyter extension managed by the Notebook Service copies your notebook from the workspace bucket to the Persistent Disk attached to your Cloud Environment (Notebook VM).
Note that the notebook file is copied to a workspace-specific directory in the HOME directory of the Jupyter user inside the Docker container.
4. Open notebook in Jupyter kernel
The Notebook Server loads your notebook file from the persistent disk and starts the Jupyter kernel.
Saving a notebook
If you open a notebook in "Edit" mode, Terra will autosave every five seconds. When you (or Terra) save a notebook, the current copy (in the Jupyter kernel) is first saved to the notebook file on your Persistent Disk:
Then the file on disk is delocalized (copied) to your Workspace bucket by a Jupyter extension:
Editing a notebook (i.e. ipynb file)
When you edit a notebook, the changes are initially only reflected in the Jupyter kernel. Changes are not saved to disk or copied to Cloud Storage unless 1) you are in "Edit" mode and 2) you explicitly save them or the Jupyter autosave process kicks in. For Jupyter on Terra, the autosave frequency is every 5 seconds.
While there are unsaved changes, Jupyter displays a notification:
When Terra autosaves changes, you will see this notification:
Running notebook code
When you run a code cell in a notebook, the cell execution creates output associated with that code. The output is only stored in the Jupyter kernel runtime until you save the notebook. If you have a detachable Persistent Disk, the output is saved there.
Analysis outputs done in a Jupyter notebook are not copied to Cloud Storage until you explicitly save them.
For step-by-step instructions on saving outputs to the Workspace bucket, see this article.
Installing software and dependencies
You will likely need to install libraries or tools on your notebook VM to extend the basic functionality of the kernel. Both "pip install" and "install.packages()" drop stuff in $HOME/notebooks/packages/ directory of the detachable persistent disk of the jupyter-user in the Docker container. Installing libraries may take a long time the first time you install, but because they are installed on the detachable PD, they will be available automatically when you rerun a notebook (i.e. resume - or even recreate - the cloud environment).
Availability of software across workspaces and Terra Billing Projects
Each workspace has its own notebook VM and its available software. You will need to install libraries and packages - either from a notebook or the workspace terminal - separately for each workspace. You do not have to reinstall libraries or packages for different notebooks in the same workspace.
Note that if you pause or delete the Cloud Environment in one notebook or terminal in a workspace, it will also affect every other notebook and terminal in the same workspace.
However, pausing or updating the Cloud Environment in one workspace will not affect any other workspace Cloud Environments.
Note that this is different for workspaces created before September 26, 2021, when workspaces in the same Terra Billing project shared a Cloud Environment VM (per user).
Pausing a Cloud Environment
When you're done working, and close the notebook, Terra tells Google Cloud to pause the Cloud Environment but save its state. The saved portion includes the state of the Jupyter Notebooks container, with any modifications you may have made - by installing packages, for example - and any files present on its local storage partition. That way, you can resume working at any time with minimal effort: when you reopen the notebook, Terra restarts the VM and restores the notebook cloud environment to its saved state.
To save you from incurring additional costs, Terra will automatically save the notebook and pause your notebook Cloud Environment after a period of inactivity.
Inactivity includes when your computer goes to sleep, unless the notebook kernel is doing operations. If your kernel is active, Terra will not pause the cloud environment (to prevent long-running jobs from aborting). Note that autopause will resume after 24 hours, even if your kernel is active.
You can explicitly pause your notebook by selecting the pause icon in the Cloud Environment widget (top right):
What is kept when you pause a Cloud Environment (boot disk, persistent disk, software)
When a notebook Cloud Environment is paused, its Compute Engine VM goes away, but the boot disk and Persistent Disk do not. When you re-open your notebook, the notebook VM is more quickly created as the disk does not need to be recreated. You do not need to reinstall your software.
What is lost when you pause a Cloud Environment (notebook state)
Any running Jupyter kernel processes are gone, so the notebook state is lost. This includes calculated and other variables, including environment variables. To restore notebook state you will need to open the notebook and re-run the relevant cells of the notebook.
Deleting a Cloud Environment (also the boot disk)
If you do not need your notebook VM or cluster and want to save on the cost of the boot disk, or if you want to pick up a new feature that requires you rebuild your notebook VM, you can delete the Cloud Environment by doing the following:
1. Click on the gear icon at the top right of the screen:
2. Select "Delete Environment Options" at the bottom of the form.
3. Choose whether to keep or delete the Persistent Disk and click the "Delete" button.
What is deleted along with the Cloud Environment
- Boot disk
- Software that is part of the Cloud Environment (not installed on the PD)
What is deleted with the Persistent Disk
- Installed software on the Persistent Disk
- Generated data not explicitly saved to the Workspace bucket or other external file
If you delete your persistent disk, you will need to reinstall any additional libraries or tools you had installed previously as well as recreate any generated data.
What is kept when deleting the Cloud Environment (saved notebook files and data)
Your notebooks and any data explicitly saved to your Workspace bucket are still in long term storage in the Workspace bucket as described in the Copy notebook output to the Workspace bucket article.
Opening notebook (read-only)
Often you only want to read a notebook, rather than edit or run it. Opening a notebook read-only does not require creation of a VM, so it is much faster than opening a notebook for editing.
1. In the Notebooks tab, click on the three vertical dots icon for your notebook
2. Select "Open read-only":
A server-side process will render the notebook file from Cloud Storage and display in your browser:
Collaboration in a shared Terra workspace
Your notebook VM is specific to you. Each individual user will have a separate notebook cluster. As a result, any work collaborators do in the notebook will not affect the state of your own Cloud Environment.
However the Workspace bucket and the notebooks in the bucket are shared. The system will automatically save any changes collaborators make to the shared document in the workspace, so it's important to set expectations clearly with your collaborators about whether it's okay for them to modify the notebook or whether they should work in a separate copy.
These two conditions also mean that you need to share your workspace in order for a collaborator to be able to see your notebook.
Terra will "lock" the notebook document in the workspace whenever someone is actively working with it, to avoid having multiple people making conflicting changes at the same time. When this happens, your collaborator can open the notebook in the read-only preview mode, or they can open it in a special "playground" mode that allows them to make changes and run code in their own Cloud Environment, but does not save any changes to the original notebook file. This falls a bit short of the ideal collaborative experience that you could envision based on Google Docs, for example, but it provides a reasonable compromise given the constraints at play. To learn more about "Edit" versus "Playground" modes in Terra, see this article.
Opening notebooks (playground mode)
If you try to open a notebook (in your Cloud Environment) while a collaborator in the same workspace opens the same notebook (in their own Cloud Environment), Terra will only allow you to open in "Playground" mode. While in playground mode, you can run cells, but cannot save the modified notebook. Any outputs generated while in playground mode are also not saved.