Terra's Jupyter Notebooks environment Part I: Key components

Allie Hajian
  • Updated

Terra provides infrastructure for running interactive analyses with Jupyter Notebooks. These are files that contain analysis code and embedded documentation. Read on for a deeper understanding of key components in Terra's notebook environment.

See Terra's Jupyter Notebooks environment Part II: Key operations for a secondary article about key operations in Terra's Notebooks environment. You'll learn how they impact your work, such as how much flexibility you have to customize the environment, how you access data, and how to save your analysis results.

Content for this article was contributed by Matt Bookman from Verily Life Sciences based on work done in Terra for AMP PD, a public/private partnership collaborating toward biomarker discovery to advance the development of Parkinson’s Disease therapies.

Where notebook files and their output live in Terra

This section defines terms and explains where notebook files and their output live -- while you work with the files and while you do not.

Diagram of the notebook VM, docker, and persistent disk storage inside the workspace Google project. Also in the project are the cloud storage bucket, which contains the notebook .pynb files and three compute engine VMs that are part of a dataproc cluster running workflows. The web browser and notebook service are linked to but separate from the workspace Google Project.

Jupyter kernel

The kernel is the computer program that runs while you have a Jupyter Notebook open. The kernel process maintains the runtime state of the Jupyter Notebook.

Terra supports R and Python kernels. When cells in the notebook are executed, they are interpreted by this language-specific kernel. Note: Terra generally selects the kernel automatically, although checking is always a good idea.

Notebook Service (aka Leonardo)

The Notebook Service manages the cloud environment (Compute Engine instance) you use to edit and run your notebook. In Terra, the notebook service is called "Leonardo," and the two terms are often used interchangeably.

Cloud Environment (aka "Cluster"; aka "Compute Engine VM")

When you interact with your notebook in a web browser on your computer, the characters you type and the code you execute are all sent to the Jupyter kernel process running on a Google Compute Engine virtual machine (VM) or Cloud Environment. Much of the discussion in this document involves understanding the Cloud Environment as a host for your notebooks.

In the rest of this article, your Cloud Environment refers to the Compute Engine VM running your notebooks and the associated VM boot disk and Persistent Disk. 

When you create your Cloud Environment, you create a single VM by default. However, the Terra environment supports more powerful clusters of VMs using Google Cloud Dataproc. The use of a VM cluster is an advanced topic that is outside this document's scope.

VM boot disk

A virtual machine has a disk (the boot disk) for storing data files, the operating system, or other software. Information on the boot disk is lost if you delete or update the Cloud Environment since the boot disk will be deleted as well. Terra's Persistent Disk, however, will not be automatically deleted with the cloud environment unless explicitly requested by the user.

Detachable Persistent Disk

When you create a Cloud Environment, Terra automatically creates a Persistent Disk to store libraries and packages, input files, and generated outputs, even if you delete the Cloud Environment. This persistent disk is automatically detached from a VM before deleting and reattached to a newly created VM.  

Accessing and saving notebook analysis output Save any files you want to keep to your mounted Persistent Disk at /home/jupyter or /home/jupyter-user, depending on the age of the Persistent Disk. To determine the name of the mount point, run
!echo $HOME from within your notebook.

Note: Generated files are stored in this directory by default. Actively check that files are saved there before deleting your Cloud Environment (it's unnecessary to check when you pause and resume).

To learn how and when to copy data from a notebook to your workspace bucket, see Copying notebook output to a Google bucket.

Docker containers

Docker is a (branded) container technology for packaging software for rapid deployment and reproducibility. Like a sandboxed virtual machine, Docker containers exist wholly inside the Compute Engine Virtual Machine. Docker images include all the software and tools needed for an analysis and can be quickly deployed on a Cloud Environment. To learn more about using Dockers, including custom Docker images, in Terra, see Docker tutorial: Custom cloud environments for Jupyter notebooks.  

Workspace storage (Google Bucket)

Every Terra Workspace has an associated Google Cloud Storage bucket for storing notebooks and other files long-term. In "edit" mode, notebook files (i.e., only ".ipynb" files) are automatically saved to your workspace bucket. You can save other generated files to your workspace bucket manually. See this article on Copying notebook output to a Google bucket for more details on how to do this.

Was this article helpful?

5 out of 5 found this helpful

Comments

0 comments

Please sign in to leave a comment.