Terra's Jupyter Notebooks environment Part I: Key components
FollowTerra provides infrastructure for running interactive analyses with Jupyter Notebooks, which are files that contain analysis code and embedded documentation. This article is to help enhance your ability to do interactive analyses with a deeper understanding of key components (i.e. Billing Projects) in Terra's notebooks environment.
A second article addresses key operations and how they impact your work, for example the amount of flexibility you have to customize the environment, how you will access data, and how to then save your analysis results.
Content for this article was contributed by Matt Bookman from Verily Life Sciences based on work done in Terra for AMP PD, a public/private partnership collaborating toward biomarker discovery to advance the development of Parkinson’s Disease therapies. |
Contents
Glossary of Key Notebook Components
This section defines terms, and explains where notebook files and their output live -- both when you are working with the files and when you are not.
- Jupyter kernel
- Notebook service (Leonardo)
- Notebook Cloud Environment (aka "Cluster" aka "VM")
- Cloud Environment cluster boot disk
- Detachable Persistent Disk
- Docker containers
- Cloud Storage
Jupyter kernel
The kernel is the computer program that runs while you have a Jupyter notebook open. The kernel process maintains the runtime state of the Jupyter notebook.
Terra supports R and Python kernels. When cells in the notebook are executed, they are interpreted by this language-specific kernel. Note that Terra generally selects the kernel automatically, though it is always a good idea to check.
|
|
---|---|
Unless you include a Persistent Disk in your Cloud environment, the output is stored in the kernel only and will be deleted if you stop the kernel before saving. To learn more about the Persistent Disks option, see this article. To learn more about how to copy data generated in an interactive analysis to permanent storage, see this article. |
Notebook Service (aka Leonardo)
The Notebook Service manages the compute environment and engine you use to edit and run your notebook. In Terra, the notebook service is called "Leonardo," and the two terms are often used interchangeably.
Notebook Cloud Environment (aka "Cluster"; aka "Compute Engine VM")
When you interact with your notebook in a web browser on your own computer, the characters you type and code you execute are all sent to the Jupyter kernel process running on a Google Compute Engine virtual machine (VM) or Cloud Environment. Much of the discussion in this document involves understanding the Cloud Environment as a host for your notebooks.
In the rest of this article, your "Cloud Environment" refers to the Compute Engine VM that hosts your notebooks.
When you create your Cloud Environment, by default you create a single VM. However, the Terra environment supports more powerful clusters of VMs using Google Cloud Dataproc. Use of a VM cluster is an advanced topic that is outside of the scope of this document.
Cluster boot disk
A virtual machine needs a disk for storing data files, the operating system, or other software. The name of Google Compute Engine's block storage is Persistent Disk. It is called "persistent" because the disk itself can persist even when the VM to which it is attached is stopped or paused. However, information on the boot disk is lost if you delete or update the cloud environment. For this reason, we try to avoid the (somewhat misleading) term "persistent disk."
Detachable Persistent Disk
When you create a Cloud Environment, Terra also creates a portion of the Cloud Environment disk as a Detachable Persistent Disk. This storage can be detached from a VM prior to it’s deletion, so that it may persist and be reattached to a newly created VM. The detachable persistent disk option allows you to store packages and input files necessary for their analysis, and generated outputs, even if you delete the Cloud Environment.
Docker containers
Docker is a (branded) container technology for packaging software for rapid deployment and reproducibility. Like a sandboxed virtual machine, Dockers exists wholly inside the Compute Engine Virtual Machine. Docker containers include all the software and tools needed for an analysis and can be quickly deployed on a Cloud Environment. To learn more about using Dockers, including custom Dockers, in Terra, see this article.
Cloud Storage
Every Terra Workspace has an associated Google Cloud Storage bucket for long-term storage of notebooks and other files. Notebook files (i.e. only ".ipynb" files) are automatically saved to your workspace bucket (see section on Saving Notebooks below). You can save other generated files to your workspace bucket manually (see this article for more details on how to do this).
Comments
0 comments
Please sign in to leave a comment.