Cloud Environment FAQs

Tiffany Miller

The article shares some of the frequently asked questions about the interactive analysis cloud environment functionality in Terra. Please leave a comment on the article if you have more questions. 

Frequently Asked Questions

  1. Do I have a different Cloud Environment for each workspace? 
  2. Can my collaborator and I use the same Cloud Environment?
  3. When using a virtual machine (VM), when are files saved and when are they not? What about packages?
  4. If I have a persistent disk (PD), what happens if I restart the cloud environment? What is saved?
  5. How does the “Application configuration” I select impact the “Compute type” field options?
  6. How do I know if the VM is running or not?
  7. How do I know if the kernel is on or not?
  8. How do I tell the status at any moment in time?
  9. What is the directory structure of the virtual machine and why is it structured this way?
  10. When using the virtual machine, when should I request a preemptible machine?
  11. How do preemptible machines work in the context of Jupyter notebooks?
  12. What are the parameter recommendations for the VM?
  13. Which registries can I use for my Docker images to build custom cloud environments?

 

1. Do I have a different Cloud Environment for each workspace? 

[workspaces created on or after September 27, 2021 at 4:00 pm EDT]
Yes. Terra creates a separate cloud environment for each individual user and each workspace. This means data stored in the Cloud Environment Persistent Disk is not available across workspaces (note that this is a change from workspaces created before September 27, 2021. To learn more about these changes, see the Moving to a project per workspace model for improved resource management).

Note that the way Terra generates Cloud Environments also means two users collaborating in the same workspace will have separate cloud environments running applications like notebooks, Galaxy and RStudio. 

[workspaces created before September 27, 2021 at 4:00 pm EDT]
Cloud Environments are created for an individual user at the Terra Billing project level. This means you have a unique Cloud Environment for each of your Terra Billing projects, and you share the environment across all workspaces in the same Terra Billing project created prior to September 27, 2021.

TIP: You can keep tabs of your cloud environments at https://app.terra.bio/#clusters.

 

2. Can my collaborator and I use the same Cloud Environment?

No. Cloud Environments are unique to individual users, so collaborators cannot use the same Cloud Environment or the associated Persistent Disk. Collaborators can work in the same workspace notebook, however, since notebooks live in the workspace bucket, and changes are written directly to the file.

Note that to share data generated from an interactive (notebook, for example) analysis, you will need to copy that data to the Workspace bucket. See this article for more information on how to copy notebook data to the Workspace bucket.

Note about overwriting notebooks when collaborating
Terra prevents multiple users from editing a notebook simultaneously and automatically avoid collisions where two people overwrite each other's changes. If you try to edit a notebook that's currently in use, you'll see a message like this:

image__65_.png

 

3. When using a virtual machine (VM), when are files saved and when are they not? What about packages?

The answer depends on a few factors. See which corresponds to your case below. To learn more, see this article about What real-time changes you can make to your Cloud Environment without losing data.  

When pausing a Cloud Environment
All generated files or data will remain available in the Cloud Environment memory while your machine is running or paused, regardless of whether or not you have elected to have a persistent disk. 

When using Cloud Environments with a Persistent Disk (default) option
If your Cloud Environment includes a persistent disk, any files saved to your mounted persistent disk at /home/jupyter-user/notebooks or /home/rstudio will be saved.

Make sure to actively check that files are saved there when you are about to delete your cloud environment, and choose the "keep persistent disk" option. To learn more about key components of Terra's notebook environment and how they interact, we recommend this article. To learn how to copy data from the notebook memory to workspace bucket storage, refer to this article.

Packages you install in a notebook are stored on your Cloud Environment Persistent Disk. As long as you don't delete the persistent disk, no active saving is required.

4. If I have a persistent disk (PD), what happens if I restart the Cloud Environment? What is saved?

If you restart the Cloud Environment and keep the Persistent Disk, all generated data, as well as libraries and packages are protected.

If you update or recreate the cloud environment and delete the detachable PD, you will lose installed packages plus your generated files unless you saved them to your workspace bucket (see this article). Note that you will receive a prompt asking if you want to 1) keep persistent disk, delete application and compute profile, or 2) delete everything, including persistent disk. To make sure to keep files in your PD, select the first option.

Even if you don’t have a PD, your files will still exist in the VM memory if you pause and resume your Cloud Environment.

Warning if you decrease the size of the PD
If you decrease the size of the PD, some of your data may be lost if it is on the part of the PD that is deleted. 

5. How does the “Application configuration” impact the “Compute type” field options?

Depending on the application configuration you select, the compute-type dropdown field will update and recommend the type you should use. In this screenshot, the R/Bioconductor configuration recommends using a standard VM with these default values for CPU, memory, and persistent disk. The Hail application configuration requires Spark, so the persistent disk option is unavailable.

Screen_Shot_2021-02-18_at_5.13.40_PM.png

6. How do I know if the VM is running or not?

You will see the status in the Cloud Environment widget. It will display CREATING when it is getting started, RUNNING when it is running, etc. 

7. How do I know if the kernel is on or not? (notebooks)

Screen_Shot_2021-02-22_at_2.49.21_PM.png

The circle icon at the far right (next to Python 3 in the screenshot above) will be filled if the kernel is processing and the row that is running will show ln[*]. If the circle is open, as shown above, it means the kernel is idle.

You can also hover your pointer over this circle to see the status. 

8. How do I tell the status at any moment in time? (notebooks)

Use a combination of the answers to the two preceding questions. Note that you'll see when the notebook was last saved back to the workspace bucket by this "Last Checkpoint" note here at the top (you may save the notebook at any time by clicking the disk icon).

Screen_Shot_2021-02-22_at_2.53.46_PM.png

9. What is the directory structure of the virtual machine and why is it structured this way?

RStudio
/home/rstudio

Jupyter Environments
/home/jupyter-user/notebooks/your-workspace-name/[edit|safe]/my-notebook-name.ipynb

Hint: If you forget the structure you can always check out what it is by running the ‘ls’ command on the terminal or clicking the Jupyter icon on the left hand-side of the notebook.

The workspace name is embedded in the structure (note this is just the name and doesn’t include the billing project). The next level down is edit or safe modes. These modes protect overwriting when collaborating on a notebook with others in your workspace. The edit directory saves to the workspace bucket and safe represents playground mode. You can still execute in playground mode, but the edited notebook won’t save to the workspace bucket.

10. When using the virtual machine, when should I request a preemptible machine?

Preemptible machines are only available for Spark clusters. Google provides best practices for using preemptibles in this section of the Google Cloud guides. 

Note: This means you cannot have a PD and use preemptible machines.

11. How do preemptible machines work for Jupyter notebooks?

Spark will try to find another worker for the task when your machine is preempted. 

Google states that preemptibles can’t "live migrate" to a regular VM. So if your preemptible machine gets taken, another preemptible machine will replace it. These machines can’t run longer than 24 hours, and Google lists other limitations in the document linked earlier in this answer.

12. What are the parameter recommendations for the VM?

From this article:

"Featured and template workspaces and notebooks will include recommended and project-specific configurations, as well as estimated costs to run (where possible). Since it is fairly straightforward to adjust the compute power, you can estimate an initial power to try and then dial it up or down as needed. Just be sure to be careful to save any generated data you want to keep when recreating the cloud environment."

Note: If you select the default environment or other application configuration, it is auto-populated with recommendations.

13. Which registries can I use for my Docker images to build custom cloud environments?

Terra allows you to use custom Docker images from:

  • Google Container Registry (GCR)
  • GitHub Container Registry (GHCR)
  • DockerHub

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

1 comment

  • Comment author
    Matt Bookman

    With the release of Project Per Workspace (PPW), question 1 above can be updated:

    1. Do I have a different cloud environment for each workspace? 

    Cloud environments are created for an individual user at the project level. This means you have a unique cloud environment for each of your billing projects, and you share the environment across all workspaces in the same billing project. You can keep tabs of the cloud environments you've created for each billing project at https://app.terra.bio/#clusters.

    This should now say something like:

    For workspaces created on or after September 27, 2021:

    Cloud Environments are created for an individual user at the workspace level.

    For workspaces create before September 27, 2021:

    Cloud Environments are created for an individual user at the Terra Billing project level. This means you have a unique Cloud Environment for each of your Terra Billing projects, and you share the environment across all workspaces in the same Terra Billing project created prior to September 27, 2021.

    You can keep tabs of the Cloud Environments at https://app.terra.bio/#clusters.

    I don't know the exact time of day that PPW was pushed to production which could be noted above to make the statements more precise.

     

     

    0

Please sign in to leave a comment.