Terra's Jupyter Notebooks environment Part II: Key operations

Allie Hajian

Since Terra uses a standard Jupyter Notebooks server implementation, the interface and core capabilities are more or less what you would see in any other setting. You can take advantage of the wealth of documentation and tutorials available on the Internet. 

Learn one thing that's truly different about how Jupyter Notebooks work in Terra versus typical local installations: The way the computing environment is set up. 

For more details about key components of a Jupyter notebook, see Part I here.

Content for this article was contributed by Matt Bookman from Verily Life Sciences based on work done in Terra for AMP PD, a public/private partnership collaborating toward biomarker discovery to advance the development of Parkinson’s Disease therapies.

Creating a notebook

When you create a notebook, the Notebook Service creates a (.ipynb) file and saves it in your workspace bucket:

O8a_May31_2019.png

Opening a notebook

When you open a notebook, the Notebook Service executes several steps in order:

  1. Create a Cloud Environment - aka cluster or notebook virtual machine (VM) if one doesn't already exist for your Cloud project - this includes a Boot Disk and a Detachable Persistent Disk (PD)
  2. Start a Docker container
  3. Localize the notebook to your VM or cluster
  4. Open the notebook in Jupyter kernel

1. Create a Cloud Environment

If a Cloud Environment (the notebook VM) does not already exist in the workspace, the notebook service will create one, along with an associated boot disk that persists until you delete the environment. This includes a Detachable Persistent Disk that can be reattached to a different Cloud Environment after deleting the existing one. 

Note on billing when running a notebook: Billing for your Cloud Environment begins when your Cloud Environment is created and continues until you pause or delete it, regardless of whether the VM is doing any calculations. Every time you open a notebook, a new Jupyter kernel is created. If you have multiple notebooks open and running in a single workspace, they will all consume resources (memory and CPU) on the same Cloud Environment.

Note on billing for Detachable Persistent Disks: When you delete your Cloud Environment, you can choose to keep your Detachable Persistent Disk. If you do, you will incur a charge of $2.00/month (50 GB disk). 

Note on region/zone of notebook VMs: Notebook VMs are created in one of the us-central1 zones.

Note: Your Cloud Environment on Terra is yours and yours alone. No one else can view or access your notebook (a billing project owner can delete it, but not open it). The reason for this is security. We store your Google credentials on the Google VM, which cannot be shared with other users:

O8b_May31_2019.png

2. Start Docker container

After setting up the application compute, the Notebook Service will start a Docker container with all the core software that your notebook will run. Because it is a Docker, not a true VM, you can't do certain things (such as run a Docker within the Docker). Items you create inside your notebook's Docker exist as long as you don't delete the detachable Persistent Disk (or the Cloud Environment, if you don't have a detachable Persistent Disk). Inside the Docker container, your user ID is jupyter (or jupyter-user for older images) and your notebook files on the detachable persistent disk are created in one of the following directories depending on whether you open the notebook in "edit" or "playground" mode. See "Edit" and "playground" notebook modes for more information.

/home/jupyter/WORKSPACE_NAME/edit/NOTEBOOK_NAME.ipynb
or
/home/jupyter/WORKSPACE_NAME/edit/NOTEBOOK_NAME.ipynb

Jupyter-Notebook-Location_2022-02-11_-_Page_1.png

3. Localize notebook

A Jupyter extension managed by the Notebook Service copies your notebook from the workspace bucket to the Persistent Disk attached to your Cloud Environment (Notebook VM).

Jupyter-Notebook-Location_2022-02-11.png

Note: The notebook file is copied to a workspace-specific directory in the HOME directory of the Jupyter user inside the Docker container. 

4. Open notebook in Jupyter kernel

The Notebook Server loads your notebook file from the persistent disk and starts the Jupyter kernel.

Opening-Notebook.png

Saving a notebook

If you open a notebook in "Edit" mode, Terra autosaves every five seconds. When you (or Terra) save a notebook, the current copy (in the Jupyter kernel) is first saved to the notebook file on your Persistent Disk:

Save-to-PD.png

Then, the file on disk is delocalized (copied) to your Workspace bucket by a Jupyter extension. 

PD-to-WS-Bucket.png

Editing a notebook (i.e., ipynb file)

When you edit a notebook, the changes are initially only reflected in the Jupyter kernel. Changes are not saved to disk or copied to Cloud Storage unless 1) you are in "Edit" mode and 2) you explicitly save them or the Jupyter autosave process kicks in. For Jupyter on Terra, the autosave frequency is every 5 seconds.

While there are unsaved changes, Jupyter displays a notification:

S20a_May31_2019.png

When Terra autosaves changes, you will see this notification:

S20b_May31_2019.png

Running notebook code

When you run a code cell in a notebook, the cell execution creates output associated with that code. The output is only stored in the Jupyter kernel runtime until you save the notebook. If you have a detachable Persistent Disk, the output is saved there. 

Analysis outputs done in a Jupyter notebook are not copied to Cloud Storage until you explicitly save them.

For step-by-step instructions on saving outputs to the Workspace bucket, see Copying notebook output to a Google bucket.

Installing software and dependencies

You may need to install libraries or tools on your notebook VM to extend the basic functionality of the kernel. Both "pip install" and "install.packages()" drop stuff in the $HOME/packages/ directory of the detachable persistent disk in the Docker container. Installing libraries may take a long time the first time you install, but because they are installed on the detachable PD, they will be available automatically when you rerun a notebook (i.e., resume - or even re-create - the Cloud Environment).

Availability of software across workspaces and Terra Billing Projects

Each workspace has its own notebook VM and its available software. You need to install libraries and packages - either from a notebook or the workspace terminal - separately for each workspace. You don't have to reinstall libraries or packages for different notebooks in the same workspace.

Note: If you pause or delete the Cloud Environment in one notebook or terminal in a workspace, it will also affect every other notebook and terminal in the same workspace.

However, pausing or updating the Cloud Environment in one workspace will not affect any other workspace Cloud Environments.  

Note: This is different for workspaces created before September 26, 2021, when workspaces in the same Terra Billing project shared a Cloud Environment VM (per user).

Pausing a Cloud Environment

When you're done working, and close the notebook, Terra tells Google Cloud to pause the Cloud Environment, but save its state. The saved portion includes the state of the Jupyter Notebooks container, with any modifications you may have made - by installing packages, for example - and any files present on its local storage partition. You can resume working at any time with minimal effort: when you reopen the notebook, Terra restarts the VM and restores the notebook Cloud Environment to its saved state.

To save you from incurring additional costs, Terra automatically saves the notebook and pauses your notebook Cloud Environment after a period of inactivity.

Inactivity includes when your computer goes to sleep, unless the notebook kernel is doing operations. If your kernel is active, Terra will not pause the Cloud Environment (to prevent long-running jobs from aborting). Note: Autopause will resume after 24 hours, even if your kernel is active. 

You can explicitly pause your notebook by selecting the pause icon in the Cloud Environment widget (top right):

Terras-Jupyter-Notebook-environment-Part-2_Delete-cloud-environment_Screen_Shot.png

 

What is kept when you pause a Cloud Environment (boot disk, persistent disk, software)?

When a notebook Cloud Environment is paused, its Compute Engine VM goes away, but the boot disk and Persistent Disk don't. When you reopen your notebook, the notebook VM is more quickly created as the disk doesn't need to be re-created. You don't need to reinstall your software.

O8h_May31_2019.png

What is lost when you pause a Cloud Environment (notebook state)?

Any running Jupyter kernel processes are gone, so the notebook state is lost. This includes calculated and other variables, including environment variables. To restore notebook state, open the notebook and rerun the relevant cells of the notebook.

Deleting a Cloud Environment (also the boot disk)

If you don't need your notebook VM or cluster and want to save on the cost of the boot disk, or if you want to pick up a new feature that requires you to rebuild your notebook VM, you can delete the Cloud Environment by doing the following:

1. Click on the gear icon at the top right of the screen:

Terras-Jupyter-Notebook-environment-Part-2_Delete-cloud-environment_Screen_Shot.png

2. Select "Delete Environment Options" at the bottom of the form.

Terras-Jupyter-Notebook-environment-Part2_Select-delete-environment-options_Screen_shot.png

3. Choose whether to keep or delete the Persistent Disk and click the "Delete" button.

Terras-Jupyter-notebook-environment-Part-2_Delete-environment-options_Screen_shot.png


What is deleted along with the Cloud Environment?

      • Boot disk
      • Software that is part of the Cloud Environment (not installed on the PD)

What is deleted with the Persistent Disk?

      • Installed software on the Persistent Disk
      • Generated data not explicitly saved to the Workspace bucket or other external file

If you delete your persistent disk, you need to reinstall any additional libraries or tools you  installed previously as well as re-create any generated data. 

What is kept when deleting the Cloud Environment (saved notebook files and data)?

Your notebooks and any data explicitly saved to your Workspace bucket are still in long-term storage in the Workspace bucket as described in the Copy notebook output to the Workspace bucket article. 

Opening notebook (read-only)

What if you want to read a notebook, rather than edit or run it? Opening a notebook read-only does not require the creation of a VM. It's much faster than opening a notebook for editing.

1. In the Notebooks tab, click on the three vertical dots icon for your notebook

2. Select "Open read-only":

S22_May31_2019.png

A server-side process will render the notebook file from Cloud Storage and display in your browser: 
O8i_May31_2019.png 

Collaboration in a shared Terra workspace

Your notebook VM is specific to you. Each individual user has a separate notebook cluster. Any work collaborators do in the notebook will not affect the state of your own Cloud Environment.

However, the Workspace bucket and the notebooks in the bucket are shared. The system will automatically save any changes collaborators make to the shared document in the workspace, so it's important to set expectations clearly with your collaborators about whether it's okay for them to modify the notebook or whether they should work in a separate copy.

These two conditions mean that you need to share your workspace for a collaborator to see your notebook.

Terra will "lock" the notebook document in the workspace whenever someone is actively working with it, to avoid having multiple people making conflicting changes at the same time. When this happens, your collaborator can open the notebook in the read-only preview mode, or they can open it in a special "playground" mode that allows them to make changes and run code in their own Cloud Environment, but does not save changes to the original notebook file. This falls a bit short of the ideal collaborative experience of Google Docs, for example, but it provides a reasonable compromise given the constraints at play. To learn more about "Edit" versus "Playground" modes in Terra, see "Edit" and "playground" notebook modes

Opening notebooks (playground mode)

f you try to open a notebook (in your Cloud Environment) while a collaborator in the same workspace opens the same notebook (in their own Cloud Environment), Terra only allows you to open in "Playground" mode. While in playground mode, you can run cells, but cannot save the modified notebook. Note: Any outputs generated during playground mode are not saved. 

Was this article helpful?

5 out of 6 found this helpful

Have more questions? Submit a request

Comments

2 comments

  • Comment author
    Thon de Boer

    What happens if you navigate away from the notebook page with any long running kernels? I think it simply kills the kernel running the notebook, no? So, I need to keep the window open as long as the kernel is running?

    0
  • Comment author
    Allie Hajian

    Thon de Boer If the kernel is active, the Autopause function gets overridden. So the expected behavior is that notebooks with running kernels will persist, even if the browser window is idle or closed. See https://support.terra.bio/hc/en-us/articles/360029761352 .

    0

Please sign in to leave a comment.