Cloud Environment FAQs

Tiffany Miller

The article shares some of the frequently asked questions about the interactive analysis cloud environment functionality in Terra. Please leave a comment on the article if you have more questions. 

1. Do I have a different Cloud Environment for each workspace? 

Yes. Terra creates a separate Cloud Environment for each individual user and each workspace. This means data files stored in the Cloud Environment Persistent Disk are not available across workspaces. 

Note: The way Terra generates Cloud Environments means two users collaborating in the same workspace will have separate Cloud Environments running applications like Jupyter notebooks, Galaxy and RStudio. 

You can keep tabs on your Cloud Environments at  https://app.terra.bio/#clusters.

2. Can my collaborator and I use the same Cloud Environment?

No. Cloud Environments are unique to individual users, so collaborators cannot use the same Cloud Environment or the associated Persistent Disk. However, collaborators can contribute changes to the same workspace notebooks and R markdown files since these files live in the workspace bucket. Changes are written directly to the file and synced to the workspace bucket.

Note: To share data generated from an interactive analysis (for example, a Jupyter notebook), you need to copy that data to the Workspace bucket. See Copying notebook output to a Google bucket for more details.

Overwriting notebooks when collaboratingTerra prevents multiple users from editing a notebook simultaneously and automatically avoids collisions where two people overwrite each other's changes. If you try to edit a notebook that's currently in use, you'll see a message like this:

Pop-up message that lets a user know that the Notebook Is In Use, with options to Make a Copy or Run in Playground Mode

3. When are files saved and when are they not? What about packages?

The answer depends on a few factors. See which corresponds to your case below. To learn more, see How and why to save data generated in a notebook to workspace storage. 

When pausing a Cloud Environment

All generated files stored in the Cloud Environment's persistent disk will be preserved while your machine is paused.

When using Cloud Environments with a Persistent Disk (PD)

If your Cloud Environment includes a Persistent Disk, any files saved to your mounted Persistent Disk at /home/jupyter or /home/studio will be saved.

The location will be /home/jupyter-user for older Jupyter images.

To ensure data isn't deleted

  • Actively check that files are saved before deleting your Cloud Environment.
    You can do this in a terminal by navigating to the mounted PD directory and using the ls bash command.
  • Choose the "keep persistent disk" option.

Not all environments have Persistent DisksAny environment that uses Spark single nodes or clusters cannot have a Persistent Disk.

To learn more about key components of Terra's notebook environment and how they interact, we recommend Terra's Jupyter Notebooks environment Part I: Key components

To learn how to copy data from the notebook memory to workspace bucket storage, see Copying notebook output to a Google bucket.

Packages you install in a notebook are stored on your Cloud Environment Persistent Disk. As long as you don't delete the Persistent Disk, no active saving is required.

4. If I have a Persistent Disk (PD), what happens if I restart the Cloud Environment? What is saved?

If you restart a paused Cloud Environment, all generated data, as well as libraries and packages are protected. Even if you don’t have a PD, your files will still exist in the VM memory if you pause and resume your Cloud Environment.

If you delete your Cloud Environment VM and the detachable PD, you will lose installed packages plus your generated files when you re-create the Cloud Environment unless you saved them to your workspace bucket (see Detachable Persistent Disks). When deleting your environment, you will receive a prompt asking if you want to

  1. Keep persistent disk, delete application configuration and compute profile, or
  2. Delete everything, including persistent disk

To make sure you keep files in your PD, select the first option.

Warning if you decrease the size of the PDIf you decrease the size of the PD, your data is at high risk of being deleted. This is because the only way to reduce the disk size is to delete the existing PD and create a new one (which is different than when increasing the size of the disk). Don't decrease the size of a PD without first saving the data you care about to an external location (such as your workspace bucket). 

5. How does the “Application configuration” impact the “Cloud compute profile” field options?

Depending on the application configuration you select, the cloud compute profile fields will update. In this screenshot, the R/Bioconductor configuration defaults to a standard VM with these default values for CPU, memory, and Persistent Disk. 

Application configuration panel with the R/Bioconductor application configuration selected, and associated default compute options.

The Hail application configuration requires Spark, so the Persistent Disk option is unavailable.

Application configuration panel with the Spark application configuration selected, and associated default compute options. There is no persistent disk option.

6. Is the Memory (GB) configuration the amount of memory per CPU, or the amount of memory overall?

The Memory (GB) configuration is how much memory you want overall. The more CPUs you use in your Cloud Environment, the more memory you can request.

7. How do I know if the VM is running or not?

You will see a circle with a color that indicates the status in the respective Cloud Environment widget on the right-side panel of your workspace. If you hover over the widget it, you will see a pop-up with more details.

A green circle on the Jupyter icon in the right-side panel indicating that the Jupyter cloud environment is running.
  • No circle = no Cloud Environment exists
  • Blue = Creating/Pausing
  • Green = Running
  • Orange = Paused
  • Red = Error

 

8. How do I know if the kernel is running? (Jupyter notebooks)

The top of a Jupyter notebook interface, where a non-filled circle can be seen next to Python 3.

The circle icon at the far right (next to Python 3 in the screenshot above) will be filled if the kernel is processing and the row that is running will show ln[*]. If the circle is open, as shown above, it means the kernel is idle.

You can also hover your pointer over this circle to see the status. 

9. How do I tell the status at any moment in time? (Jupyter notebooks)

Use a combination of the answers to the two preceding questions. Note: You'll see when the notebook was last saved back to the workspace bucket by this "Last Checkpoint" note here at the top (you may save the notebook at any time by clicking the disk icon).

The top of a Jupyter notebook interface where one can see when the notebook was last saved

10. How do I pause my Cloud Environment VM?

To pause your Cloud Environment VM, follow the steps below.

Screenshot of the Jupyter Cloud Environment Details pane whighlighting the pause environment button at the top middle

1. Select the Environment Configuration (cloud icon) button on the sidebar. This will open the Cloud Environment Details pane (left).

2. Select the Pause Environment button for any active VM in your workspace.

What to expect

When a Cloud Environment is paused, this button changes to Resume Environment.

11. How do I delete my Cloud Environment VM?

To delete your Cloud Environment VM, follow the steps below:

1. Select the Environment Configuration (cloud icon) button on the sidebar. This will open the Cloud Environment Details pane for whichever app you are running (below).

2. Select the Settings button under the app logo.

CloudEnvironment-FAQs_Jupyter-Environment-Details-pane_Screenshot.png

3. Scroll down to select Delete environment.

CloudEnvironment-FAQs_Delete-Jupyter-Cloud-Environment_Screenshot.png

Galaxy app deletion

Galaxy VMs must be running to be deleted. If your Galaxy VM is paused, it needs to be resumed before it can be deleted.

13. Can I run Jupyter and RStudio at the same time in a single workspace?

Because Jupyter Notebook and RStudio environments use the same Google Cloud virtual machine (VM), creating an RStudio environment will delete an existing Jupyter Notebook environment and vice versa. If you try to do so, you’ll see a warning message:

Screenshot of Downtime required warning message - By continuing you will be changing the application of your cloud environment from Jupyter to RStudio. This change will require temporarily shutting down your cloud environment. You will be unable to perform any analysis for a few minutes. Your existing data will be preserved during this update.

However, you can have a Galaxy environment in the same workspace as either a Jupyter or RStudio environment.

14. I tried to upload my file, analysis.rmd, to Terra in the Analyses tab, but nothing happened. Why?

RStudio requires that R Markdown file extensions be exactly .Rmd. The “R” in the file extension needs to be capitalized.

15. What is the directory structure of the virtual machine and why is it structured this way?

RStudio

/home/rstudio

Jupyter Environments

/home/jupyter/your-workspace-name/[edit|safe]/my-notebook-name.ipynb

If you forget the directory structureYou can always check out what it is by running the ls command on the terminal or clicking the Jupyter icon on the left-hand side of the notebook.

The workspace name is embedded in the structure. (Note: This is just the name and doesn’t include the billing project). The next level down is edit or safe modes. These modes protect against overwriting when collaborating on a notebook with others in your workspace. The edit directory saves to the workspace bucket and safe represents playground mode. You can still execute in playground mode, but the edited notebook won’t save to the workspace bucket.

16. When should I request a preemptible machine?

Preemptible machines are only available for Spark clusters. Google provides best practices for using preemptibles in their Google Cloud article Create and use preemptible VMs

17. How do preemptible machines work for Jupyter notebooks?

Spark will try to find another worker for the task when your machine is preempted.

In Google's documentation about preemptible VMs, Google states that preemptibles can’t "live migrate" to a regular VM. So if your preemptible machine gets taken, another preemptible machine will replace it. These machines can’t run longer than 24 hours, and Google lists other limitations in the document linked earlier in this answer.

18. What are the parameter recommendations for the VM?

From Understanding and adjusting your Cloud Environment:

"Featured and template workspaces and notebooks will include recommended and project-specific configurations, as well as estimated costs to run (where possible). Since it is fairly straightforward to adjust the compute power, you can estimate an initial power to try and then dial it up or down as needed. Just be sure to be careful to save any generated data you want to keep when recreating the Cloud Environment."

Note: If you select the default environment or other application configuration, it is auto-populated with initial recommendations.

19. Which Docker image registries can I use to build custom cloud environments?

Terra allows you to use custom Docker images from

  • Google Container Registry (GCR)
  • GitHub Container Registry (GHCR)
  • DockerHub

Custom Docker images must be derived from a Terra base imageFor more details, see the Docker tutorial: Custom Cloud Environments for Jupyter Notebooks

To learn more, see the Docker section of the Terra Support documentation

Was this article helpful?

0 out of 0 found this helpful

Comments

1 comment

  • Comment author
    Matt Bookman

    With the release of Project Per Workspace (PPW), question 1 above can be updated:

    1. Do I have a different cloud environment for each workspace? 

    Cloud environments are created for an individual user at the project level. This means you have a unique cloud environment for each of your billing projects, and you share the environment across all workspaces in the same billing project. You can keep tabs of the cloud environments you've created for each billing project at https://app.terra.bio/#clusters.

    This should now say something like:

    For workspaces created on or after September 27, 2021:

    Cloud Environments are created for an individual user at the workspace level.

    For workspaces create before September 27, 2021:

    Cloud Environments are created for an individual user at the Terra Billing project level. This means you have a unique Cloud Environment for each of your Terra Billing projects, and you share the environment across all workspaces in the same Terra Billing project created prior to September 27, 2021.

    You can keep tabs of the Cloud Environments at https://app.terra.bio/#clusters.

    I don't know the exact time of day that PPW was pushed to production which could be noted above to make the statements more precise.

     

     

    0

Please sign in to leave a comment.