Overview: Cloud environment storage (detachable persistent disks)

Anton Kovalsky

Terra attaches a persistent disk (PD) to your Cloud Environment VM where you can store data (such as generated data and installed libraries) even if you delete or update your Cloud Environment.

PDs also act as a safeguard to protect your data in case something goes wrong with the VM. If you delete your cloud environment, but keep your PD, the PD will be reattached when creating the next Cloud Environment. 

A minimal cost per hour is associated with maintaining the disk even when the Cloud Environment is paused or deleted.

Terra storage overview

Terra workspaces have two dedicated storage locations – the workspace bucket and the Cloud Environment (VM) Persistent Disk (PD). Like a USB drive, the PD can be detached from the VM before deleting or recreating the Cloud Environment and attached to a new one. The PD lets you keep the packages your notebook code is built upon, input files necessary for your analysis, and outputs you’ve generated - without having to move anything to workspace storage (i.e. Google bucket). 

Sharing data with colleaguesSince the PD is unique to your Cloud Environment, and your Cloud Environment is unique to you, you will need to copy data to workspace storage to share with colleagues (even those working in a shared workspace) or to use as input for a workflow.

What is a “persistent disk” (PD) and how does it work?

When you create a Cloud Environment virtual machine using the default options, you automatically get 50G of VM storage (the persistent disk) attached.

You can access files in the PD by launching a Jupyter Cloud Environment and opening a VM terminal (to launch a terminal, click on the terminal icon in the right sidebar of any workspace page when a Jupyter cloud environment is running).

Note that the terminal will open in a new browser tab. 

Sidebar-icons_Screen_shot.png

Types of persistent disksStandard: standard hard disk drives
Solid State Drive: solid state drives
Balanced: a combination of standard and solid-state drives (SSD). They are an alternative to SSD persistent disks that balance performance and cost.

Which disk type is right for you?
Solid state drives are more expensive, but run faster and are more power efficient than standard hard disk drives.

The PD file directory

In the terminal, you can navigate the PD’s file structure using bash commands, just as you would on a local machine. Any files you would like to save, or persist, must be saved to the directory where the PD is mounted. Anything saved outside this directory is not saved to the persistent disk and will be lost when the PD is deleted.

  • RStudio PDs are mounted to the directory /home/rstudio
  • Jupyter PDs are mounted to /home/jupyter-user or /home/jupyter, depending on the age of the PD. 

To determine the name of the mount point for your Jupyter Cloud Environment PD, run !echo $HOME from within your notebook.

Persistent disks save time and reduce error

The persistent storage option saves time and reduces error because you don't have to reinitialize the Cloud Environment when using apps that require time-consuming package installation or input files. Data and other files stored on the persistent disk are safe even if you delete or recreate the Cloud Environment.

  • Input data (e.g. genomics files, tabular data)
  • Generated data files (from an interactive analysis)
  • Figures (e.g. PDFs, PNGs, JPEGs, etc)
  • Packages installed on the cloud environment

Maintaining important data and files when deleting or re-creating the Cloud Environment makes the Terra experience more analogous to working on a local machine. 

When persistence is especially useful

  • When running an analysis that requires a very lengthy initialization (i.e. package installation).
  • When running a notebook that expects a certain input from the Cloud Environment. For example, this Encode tutorial downloads results created by one of its workflows into the Jupyter Cloud Environment for further analysis.
  • Whenever you want to be able to save your outputs/results, either to keep them organized or because some outputs are used in other parts of your analysis.

Scenarios when you might need to delete or recreate the Cloud Environment

  • To make certain types of changes to the environment (e.g. changing cloud compute profile)
  • If the Cloud Environment enters an error state or becomes unresponsive
  • To run with the latest updates described in our release notes (Notebooks Best Practices guidelines suggest recreating Cloud Environments regularly)
  • If the Cloud Environment is automatically deleted to ensure it has the latest updates

How do I set up/use the Persistent Disk?

 

Was this article helpful?

1 out of 2 found this helpful

Have more questions? Submit a request

Comments

2 comments

  • Comment author
    Nicole Deflaux
    • Edited

    Sometimes `pip install --upgrade <pkg>` does not work successfully and people need to troubleshoot.

    Now that package installs are written to the Terra detachable persistent disk, one approach is to delete and recreate that disk to troubleshoot BUT it's easy to forget that you have some important files on that disk, then delete it during a troubleshooting session and regret the deletion.

    An alternative is to troubleshoot by starting from an empty `packages/` directory. For example, open a terminal and then run the following commands to move all currently installed packages to another directory so that they are no longer visible to pip, Python, and Jupyter:

    cd $HOME/notebooks

    export PKG_STASH_DIR=packages-as-of-$(date +"%Y%m%d")

    mkdir $PKG_STASH_DIR

    # Move all the currently installed packages out of the existing destination directory for package installations.
    mv packages/* $PKG_STASH_DIR



    Now you can retry the `pip`  commands to install the packages again!

    0
  • Comment author
    Tiffany Miller

    Note that Rstudio rmd files are not auto-syncing to the Google bucket. This feature should be released by the end of 2021.

    0

Please sign in to leave a comment.