Detachable Persistent Disks

Anton Kovalsky

Terra attaches a persistent disk (PD) to your Cloud Environment VM where you can store data (such as generated data and installed libraries) even if you delete or update your Cloud Environment.

PDs also act as a safeguard to protect your data in case something goes wrong with the VM. If you delete your cloud environment, but keep your PD, the PD will be reattached when creating the next Cloud Environment. 

A minimal cost per hour is associated with maintaining the disk even when the Cloud Environment is paused or deleted.

Terra storage overview

Terra workspaces have two dedicated storage locations – the workspace bucket and the Cloud Environment (VM) Persistent Disk. Like a USB drive, the PD can be detached from the VM prior to deleting or recreating the Cloud Environment and attached to a new one. The PD lets you keep the packages your notebook code is built upon, input files necessary for your analysis, and outputs you’ve generated - without having move anything to the workspace bucket for permanent storage. 

TIP: Since the PD is unique to the Cloud Environment, and the Cloud Environment is unique to each user, you will need to copy data to the Workspace bucket to share with colleagues (even those working in a shared workspace) or to use as input for a workflow.

What is “persistent disk” and how does it work?

When you create a Cloud Environment using the default options, you automatically get 50G of VM storage (the persistent disk) attached. You can access files in the PD by launching a Cloud Environment and opening a VM terminal (click on the terminal icon on the Cloud Environment button to start the terminal).

Screen_Shot_2020-09-16_at_10.18.47_AM.png

In the terminal, you can navigate the PD’s file structure using bash commands, just as you would if you were working on a local machine. The PD is mounted to the directory /home/jupyter-user/notebooks  or /home/rstudio- any files must be saved there if you want it to persist. Anything saved outside this directory is not saved to the persistent disk, and will be lost on deletion. 

When updating/replacing a Cloud Environment, you’ll be prompted to select whether to keep or delete the PD. 

Why is the persistent disk so helpful?

The persistent storage option saves time and reduces error because you don't have to reinitialize the Cloud Environment when using apps that require time-consuming package installation or input files. The PD keeps data and other files safe even if you delete or recreate the Cloud Environment. 

  • Input data (e.g. genomics files, tabular data)
  • Output data files 
  • Figures (e.g. PDFs, PNGs, JPEGs, etc)
  • Packages installed on the cloud environment

Maintaining important data and files even after deleting or re-creating the Cloud Environment makes the Terra experience more analogous to working on a local machine. 

Scenarios where persistence is especially useful

  1. When running an analysis that requires a very lengthy initialization (i.e. package installation).
  2. When running a notebook that expects a certain input from the Cloud Environment. For example, this Encode tutorial downloads results created by one of its workflows into the notebook Cloud Environment for further analysis.
  3. Whenever you want to be able to save your outputs/results, either to keep them organized, or because some outputs are used in other parts of your analysis.

Some examples when you might need to delete or recreate the Cloud Environment include the following.

  • To make certain changes to the environment (e.g. changing cloud compute profile)
  • If the Cloud Environment enters an error state or becomes unresponsive
  • To run with the latest updates described in our release notes (Notebooks Best Practices guidelines suggest recreating Cloud Environments regularly)
  • If the Cloud Environments is automatically deleted to ensure they have the latest updates (this happens in some specific cases)

How do I set up/use the Persistent Disk?

When you click on the Cloud Environment button, you should see the configuration options for your environment in this popup. At the bottom is a box for entering the size of your persistent disk. 

2020-09-16_1220.png

If you modify the configuration of an existing Cloud Environment, you'll see the "Update" button turn blue (active). Clicking this button will reveal this message, letting you know that your work will be preserved through deletion and recreation.

Screen_Shot_2020-09-16_at_2.44.18_PM.png

Warning when decreasing your PD size
Decreasing your persistent disk will remove active code and any files on the PD. You could lose things you're working on if you choose to decrease the PD size in the middle of the analysis. Updating the PD with a smaller disk size will trigger a warning message to this effect:

 

Screen_Shot_2021-09-02_at_4.58.10_PM.png

You can click "Delete Environment Options" to see the options shown below.

2020-09-16_1430.png

If you don't want to save the contents of your detachable Persistent Disk, select the "Delete everything, including persistent disk." Just make sure you've moved anything you wish to keep from the Cloud Environment VM to another location, such as your workspace bucket. 

Selecting the default option, "Keep persistent disk, delete application and compute profile", will delete the VM after detaching the Persistent Disk. This disk will be automatically reattached the next time you spin up a cloud environment, assuming you select the standard VM.

When you click “Delete” here, you should see the popup below, where you can select a configuration before creating a new VM.

Screen_Shot_2020-09-16_at_3.49.15_PM.png

If you choose the standard VM, it will automatically reattach the saved disk. If you choose a Spark mode (clicking the “Customize” button shown below will show additional options), this storage will NOT reattach to that cloud environment because spark and hail application configurations don't support the persistent disk feature.

The PD will, however, be saved until the next time you choose the standard VM option and click “Create”.

You can also click “Delete Persistent Disk”, if you no longer need to save files and data stored there. You’ll see a similar menu as before, but with only the option of deleting the persistent disk.

Note that you can't delete a persistent disk that's attached to a cloud environment without first deleting that environment. If you go to your Cloud Environments page (main menu navigation > your name > "Cloud Environment" from the top left corner of any page in Terra), you'll see separate items for the Cloud Environment application and the detachable Persistent Disk. You can delete either of these here, but the option to delete the detachable disk will be only activate after you've detached the disk by deleting the Environment first.

2020-09-17_1100.png

To identify your persistent disk in the Google Cloud Platform console, click on the link in the "Details" column. 

To keep from losing work while deleting or modifying your persistent disk, you may want to copy data from your Cloud Environment to another location. For detailed instructions on copying files from your interactive environment to your workspace bucket, see this article.

A note about auto-syncing behavior

The feature that enables Terra to frequently auto-save your notebook back to the Workspace bucket may affect files stored on the VM's persistent disk. When you use a notebook in a Terra workspace, the VM creates subdirectories named after the workspace in the /notebooks/ location, and Terra's auto-syncing feature regularly interacts with the notebooks in these subdirectories.

If you're storing anything on the VM's persistent disk that you don't want to be affected by the auto-syncing behavior - for example, notebooks that you would like to keep private - we recommend keeping these in a specifically named subdirectory under /notebooks/, that is not named after a workspace (such as /notebooks/no-sync/).

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

2 comments

  • Comment author
    Nicole Deflaux
    • Edited

    Sometimes `pip install --upgrade <pkg>` does not work successfully and people need to troubleshoot.

    Now that package installs are written to the Terra detachable persistent disk, one approach is to delete and recreate that disk to troubleshoot BUT it's easy to forget that you have some important files on that disk, then delete it during a troubleshooting session and regret the deletion.

    An alternative is to troubleshoot by starting from an empty `packages/` directory. For example, open a terminal and then run the following commands to move all currently installed packages to another directory so that they are no longer visible to pip, Python, and Jupyter:

    cd $HOME/notebooks

    export PKG_STASH_DIR=packages-as-of-$(date +"%Y%m%d")

    mkdir $PKG_STASH_DIR

    # Move all the currently installed packages out of the existing destination directory for package installations.
    mv packages/* $PKG_STASH_DIR



    Now you can retry the `pip`  commands to install the packages again!

    0
  • Comment author
    Tiffany Miller

    Note that Rstudio rmd files are not auto-syncing to the Google bucket. This feature should be released by the end of 2021.

    0

Please sign in to leave a comment.