Since Terra uses a standard Jupyter Notebooks server implementation, the interface and core capabilities are more or less what you would see in any other setting. You can take advantage of the wealth of documentation and tutorials available on the Internet. Here you'll learn one thing that's truly different about how Jupyter Notebooks work in Terra versus typical local installations: How the computing environment is set up.
For more details about the key components of a Jupyter Notebook, see Part I here.
|Content for this article was contributed by Matt Bookman from Verily Life Sciences based on work done in Terra for AMP PD, a public/private partnership collaborating toward biomarker discovery to advance the development of Parkinson’s Disease therapies.|
Creating a notebook
When you create a notebook, the Notebook Service creates a (.ipynb) file and saves it in your workspace storage (Google Bucket):
Opening a notebook
When you open a notebook, the Notebook Service executes several steps in order.
- Create a Cloud Environment - aka cluster or notebook virtual machine (VM) if one doesn't already exist for your Cloud project - this includes a Boot Disk and a Detachable Persistent Disk (PD)
- Start a Docker container
- Localize the notebook to your VM or cluster
- Open the notebook in Jupyter kernel
1. Create a Cloud Environment
Suppose a Cloud Environment (the notebook VM) does not already exist in the workspace. In that case, the notebook service will create one, along with an associated boot disk that persists until you delete the environment. This includes a Detachable Persistent Disk that can be reattached to a different Cloud Environment after deleting the existing one.
Billing continues as long as the notebook is running
Billing for your Cloud Environment begins when your Cloud Environment is created and continues until you pause or delete it, regardless of whether the VM is doing any calculations. Every time you open a notebook, a new Jupyter kernel is created. If you have multiple notebooks open and running in a single workspace, they will all consume resources (memory and CPU) in the same Cloud Environment.
Billing for Detachable Persistent Disks
When you delete your Cloud Environment, you can choose to keep your Detachable Persistent Disk. If you do, you will incur a charge of $2.00/month (50 GB disk).
Note on region/zone of notebook VMs
Notebook VMs are created in one of the
Your Cloud Environment on Terra is yours and yours alone
No one else can view or access your notebook (a billing project owner can delete it but not open it). The reason for this is security. We store your Google credentials on the Google VM, which cannot be shared with other users.
2. Start the Docker container
After setting up the application compute, the Notebook Service will start a Docker container with all the core software that your notebook will run. Because it is a Docker, not a true VM, you can't do certain things (such as running a Docker within the Docker). Items you create inside your notebook's Docker exist as long as you don't delete the detachable Persistent Disk (or the Cloud Environment, if you don't have a detachable Persistent Disk). Inside the Docker container, your user ID is
jupyter-user for older images), and your notebook files on the detachable persistent disk are created in one of the following directories depending on whether you open the notebook in "edit" or "playground" mode. See "Edit" and "Playground" notebook modes for more information.
3. Localize notebook
A Jupyter extension managed by the Notebook Service copies your notebook from the workspace bucket to the Persistent Disk attached to your Cloud Environment (Notebook VM).
Note: The notebook file is copied to a workspace-specific directory in the HOME directory of the Jupyter user inside the Docker container.
4. Open notebook in Jupyter kernel
The Notebook Server loads your notebook file from the persistent disk and starts the Jupyter kernel.
Saving a notebook
If you open a notebook in "Edit" mode, Terra autosaves every five seconds. When you (or Terra) save a notebook, the current copy (in the Jupyter kernel) is first saved to the notebook file on your Persistent Disk.
Then, the file on disk is delocalized (copied) to your Workspace bucket by a Jupyter extension.
Editing a notebook (i.e., ipynb file)
When you edit a notebook, the changes are initially only reflected in the Jupyter kernel. Changes are not saved to disk or copied to Cloud Storage unless 1) you are in "Edit" mode and 2) you explicitly save them, or the Jupyter autosave process kicks in. For Jupyter on Terra, the autosave frequency is every 5 seconds.
While there are unsaved changes, Jupyter displays a notification.
When Terra autosaves changes, you will see this notification.
Running notebook code
When you run a code cell in a notebook, the cell execution creates an output associated with that code. The output is only stored in the Jupyter kernel runtime until you save the notebook. If you have a detachable Persistent Disk, the output is saved there.
Analysis outputs generated in a Jupyter Notebook are not copied to Workspace Cloud Storage (Google Bucket) until you explicitly save them.
For step-by-step instructions on saving outputs to the Workspace bucket, see Copying notebook output to a Google bucket.
Installing software and dependencies
You may need to install libraries or tools on your notebook VM to extend the basic functionality of the kernel. Both
pip install and
install.packages() drop stuff in the
$HOME/packages/ directory of the detachable persistent disk in the Docker container. Installing libraries may take a long time the first time you install them. However, because they are installed on the detachable PD, they will be available automatically when you rerun a notebook as well as when you recreate the Cloud Environment (assuming you keep the PD).
Availability of software across workspaces and Terra Billing Projects
Each workspace has its own notebook VM and its available software. You need to install libraries and packages - either from a notebook or the workspace terminal - separately for each workspace. You don't have to reinstall libraries or packages for different notebooks in the same workspace.
Note: If you pause or delete the Cloud Environment in one notebook or terminal in a workspace, it will also affect every other notebook and terminal in the same workspace.
However, pausing or updating the Cloud Environment in one workspace will not affect any other workspace Cloud Environments.
Workspaces created before September 26, 2021
Note: This is different for workspaces created before September 26, 2021, when workspaces in the same Terra Billing project shared a Cloud Environment VM (per user).
Pausing a Cloud Environment
When you're done working and close the notebook, Terra tells Google Cloud to pause the Cloud Environment (VM) but save its state. The saved portion includes the state of the Jupyter Notebooks container, with any modifications you may have made - by installing packages, for example - and any files present on its local storage partition. You can resume working at any time with minimal effort: when you reopen the notebook, Terra restarts the VM and restores the notebook Cloud Environment to its saved state.
Saving money with autopause
To save you from incurring additional costs, Terra automatically saves the notebook and pauses your notebook Cloud Environment after a period of inactivity.
Inactivity includes when your computer goes to sleep unless the notebook kernel is doing operations. If your kernel is active, Terra will not pause the Cloud Environment (to prevent long-running jobs from aborting). Note: Autopause will resume after 24 hours, even if your kernel is active.
You can explicitly pause your notebook by clicking the cloud icon in the right sidebar and the pause icon under the Jupyter logo.
What is kept when you pause a Cloud Environment (boot disk, persistent disk, software)?
When a notebook Cloud Environment is paused, its Compute Engine VM disappears, but the boot disk and Persistent Disk don't. When you reopen your notebook, the notebook VM is more quickly created as the disk doesn't need to be recreated and you don't need to reinstall your software.
What is lost when you pause a Cloud Environment (notebook state)?
Any running Jupyter kernel processes are gone, so the notebook state is lost. This includes calculated and other variables, including environment variables. To restore the notebook state, open the notebook and rerun the relevant cells of the notebook.
How to delete a Cloud Environment (also the boot disk)
If you don't need your notebook VM or cluster and want to save on the cost of the boot disk, or if you're going to pick up a new feature that requires you to rebuild your notebook VM, you can delete the Cloud Environment by doing the following:
1. Click on the gear icon at the top right of the screen:
2. Select "Delete Environment Options" at the bottom of the form.
3. Choose whether to keep or delete the Persistent Disk and click the "Delete" button.
What is deleted along with the Cloud Environment?
- Boot disk
- Software that is part of the Cloud Environment (not installed on the PD)
What is deleted with the Persistent Disk?
- Installed software on the Persistent Disk
- Generated data not explicitly saved to the Workspace bucket or other external file
If you delete your persistent disk, you need to reinstall any additional libraries or tools you installed previously and re-create any generated data.
What is kept when deleting the Cloud Environment (saved notebook files and data)?
Your notebooks and any data explicitly saved to your Workspace storage are still in long-term storage in the Workspace Google bucket as described in the Copy notebook output to the Workspace bucket article.
How to open a read-only notebook
What if you want to read a notebook rather than edit or run it? Opening a notebook read-only does not require the creation of a VM. It's much faster than opening a notebook for editing.
1. In the Notebooks tab, click on the three vertical dots icon for your notebook.
2. Select Open read-only.
A server-side process will render the notebook file from Cloud Storage and display it in your browser.
Collaboration in a shared Terra workspace
Your notebook VM is specific to you since each individual user has a separate notebook cluster. Any work collaborators do in the notebook will not affect the state of your own Cloud Environment.
However, the notebook files in the Workspace storage (Google Bucket) are shared. The system will automatically save any changes collaborators make to the shared document in the workspace, so it's important to set expectations clearly with your collaborators about whether it's okay for them to modify the notebook or whether they should work in a separate copy.
These two conditions mean you must share your workspace for a collaborator to see your notebook.
To avoid having multiple people making conflicting changes simultaneously, Terra will "lock" the notebook document in the workspace whenever someone is actively working with it. When this happens, your collaborator can open the notebook in the read-only preview mode or in a special "Playground" mode that allows them to make changes and run code in their own Cloud Environment but does not save changes to the original notebook file. This falls a bit short of the ideal collaborative experience of Google Docs, for example, but provides a reasonable compromise given the constraints at play. To learn more about "Edit" versus "Playground" modes in Terra, see "Edit" and "Playground" notebook modes.
Opening notebooks (playground mode)
Suppose you try to open a notebook (in your Cloud Environment) while a collaborator in the same workspace opens the same notebook (in their own Cloud Environment). In that case, Terra only allows you to open in "Playground" mode. While in playground mode, you can run cells but not save the modified notebook. Note: Any outputs generated during playground mode are not saved.