Cloud environment (Persistent Disk) storage

Allie Cliffe
  • Updated

If you're interested in using Terra on Azure, please email terra-enterprise@broadinstitute.org.

Terra attaches a persistent disk (PD) to your JupyterLab Cloud Environment VM where you can save generated data and installed libraries and other files you want to retain even if you delete or update your Cloud Environment VM.

PDs also act as a safeguard to protect your data in case something goes wrong with the VM. If you need to delete or recreate your cloud environment, the PD will be reattached when creating the next Cloud Environment.

You choose the PD size when you create the Azure cloud environment. A minimal cost (per month) is associated with maintaining the disk (see Azure disk pricing). You will pay this cost even when the Cloud Environment is paused or deleted.

Terra data storage overview

Two ways to store large data files in Terra

  • The workspace blob storage container
  • The Cloud Environment Persistent Disk (PD)

The workspace blob storage container is your workspace's hard disk

It is automatically created when you create the workspace and remains until the workspace is deleted. You pay only for the amount of data stored in the blob container.

The cloud environment PD is like a USB drive

It is automatically created when you launch a Cloud Environment VM. You can choose to keep it after deleting your cloud environment VM. If you keep it, Terra will automatically attach that same PD to the same mount point when you create a new cloud environment. You pay for the GB of disk you specify when you create your Cloud Environment. 

Colleagues in a shared workspace cannot access your PDSince the PD is unique to your Cloud Environment, and your Cloud Environment is unique to you, you will need to copy data to workspace storage to share with colleagues (even those working in a shared workspace) or to use as input for a workflow. See How to move data from an interactive analysis to workspace storage.

Persistent disks save time and reduce error

The PD lets you keep the packages your notebook code is built upon, input files necessary for your analysis, and outputs you’ve generated in your Cloud Environment (JupyterLab analysis).

You don't have to reload large package installations or input files, generated data, and other files stored on the persistent disk, even if you delete or recreate the Cloud Environment. 

Maintaining important data and files when deleting or re-creating the Cloud Environment makes the Terra experience more similar to working on a local machine.

What is stored on the PD?

  • Packages installed on the Cloud Environment VM
  • Input data (such as genomics files, or tabular data in CSV format) for JupyterLab analyses
  • Generated data files (JupyterLab)
  • Figures (including PDFs, PNGs, and JPEGs)

When persistence is especially useful

  • When running an analysis that requires a lengthy initialization (package installation).
  • When running a JupyterLab analysis that expects a certain input from the Cloud Environment. For example, this Encode tutorial downloads results created by one of its workflows into the Jupyter Cloud Environment for further analysis.
  • Whenever you want to be able to save your outputs/results, either to keep them organized or because some outputs are used in other parts of your interactive analysis.

Persistent disk (PD) details

When you create a Cloud Environment virtual machine, you automatically get 50G of Persistent Disk storage attached to the VM. You can change the default size when you first create your Cloud Environment and increase the size at any time

Terra-on-Azure_Cloud-Environment-config-pane_Screenshot.png

You can access PD files in many ways

  • Via the command line in the terminal
    Note that the terminal will open in a new browser tab
  • Via code snippets or TNU in a notebook
  • Directly from the built-in JupyterLab file manager

Types of persistent disks

Currently, only Standard HDD disks are available in Terra on Azure. Future releases will enable using Solid State Disks and a balanced option (a combination of standard and SSDs).

The PD file directory

In the terminal, you can navigate the PD’s file structure using bash commands, just as you would on a local machine. Any files you would like to save (persist) must be saved to the directory where the PD is mounted. Anything saved outside this directory is not saved to the persistent disk and will be lost when the PD is deleted.

  • JupyterLab PDs are mounted to /home/jupyter/persistent_disk.

To avoid losing data, make sure to store in the persistent_disk directory!Anything saved outside this directory is not saved to the persistent disk and will be lost when the PD is deleted.

To check the mount point for your Azure Cloud Environment PD, run !echo $HOME from your notebook.

When you might need to delete or recreate the Cloud Environment

  • To make certain types of changes to the environment (changing the cloud compute profile, for example)
  • If the Cloud Environment enters an error state or becomes unresponsive
  • To run with the latest updates described in our release notes (Note that our interactive analysis  best practices guidelines suggest regularly recreating your Cloud Environments)
  • If the Cloud Environment is automatically deleted to ensure it has the latest updates

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.