The article shares some of the frequently asked questions about the interactive analysis cloud environment functionality in Terra. Please leave a comment on the article if you have more questions.
1. Do I have a different Cloud Environment for each workspace?
[workspaces created on or after September 27, 2021 at 4:00 pm EDT]
Yes. Terra creates a separate Cloud Environment for each individual user and each workspace. This means data files stored in the Cloud Environment Persistent Disk are not available across workspaces. Note: This is a change from workspaces created before September 27, 2021. To learn more about these changes, see Moving to a project per workspace model for improved resource management.
Note: The way Terra generates Cloud Environments means two users collaborating in the same workspace will have separate Cloud Environments running applications like notebooks, Galaxy and RStudio.
[workspaces created before September 27, 2021 at 4:00 pm EDT]
Cloud Environments are created for an individual user at the Terra Billing project level. This means you have a unique Cloud Environment for each of your Terra Billing projects, and you share the environment across all workspaces in the same Terra Billing project created before September 27, 2021.
You can keep tabs on your Cloud Environments at https://app.terra.bio/#clusters.
2. Can my collaborator and I use the same Cloud Environment?
No. Cloud Environments are unique to individual users, so collaborators cannot use the same Cloud Environment or the associated Persistent Disk. However, collaborators can work in the same workspace notebook, since notebooks live in the workspace bucket and changes are written directly to the file.
Note: To share data generated from an interactive (notebook, for example) analysis, you need to copy that data to the Workspace bucket. See Copying notebook output to a Google bucket for more details.
Note: Overwriting notebooks when collaborating
Terra prevents multiple users from editing a notebook simultaneously and automatically avoids collisions where two people overwrite each other's changes. If you try to edit a notebook that's currently in use, you'll see a message like this:
3. When using a virtual machine (VM), when are files saved and when are they not? What about packages?
The answer depends on a few factors. See which corresponds to your case below. To learn more, see How and why to save data generated in a notebook to workspace storage.
When pausing a Cloud Environment
All generated files or data remain available in the Cloud Environment memory while your machine is running or paused, whether or not you elected to have a persistent disk.
When using Cloud Environments with a Persistent Disk (default) option
If your Cloud Environment includes a persistent disk, any files saved to your mounted persistent disk at /home/jupyter
or /home/jupyter-user
for older images or /home/rstudio
will be saved.
Actively check that files are saved there when you are about to delete your Cloud Environment, and choose the "keep persistent disk" option. To learn more about key components of Terra's notebook environment and how they interact, we recommend Terra's Jupyter Notebooks environment Part I: Key components. To learn how to copy data from the notebook memory to workspace bucket storage, see Copying notebook output to a Google bucket.
Packages you install in a notebook are stored on your Cloud Environment Persistent Disk. As long as you don't delete the persistent disk, no active saving is required.
4. If I have a persistent disk (PD), what happens if I restart the Cloud Environment? What is saved?
If you restart the Cloud Environment and keep the Persistent Disk, all generated data, as well as libraries and packages are protected.
If you update or re-create the Cloud Environment and delete the detachable PD, you will lose installed packages plus your generated files unless you saved them to your workspace bucket (see Detachable Persistent Disks). Note: You will receive a prompt asking if you want to 1) keep persistent disk, delete application and compute profile, or 2) delete everything, including persistent disk. To make sure you keep files in your PD, select the first option.
Even if you don’t have a PD, your files will still exist in the VM memory if you pause and resume your Cloud Environment.
Warning if you decrease the size of the PDIf you decrease the size of the PD, some of your data may be lost if it is on the part of the PD that is deleted.
5. How does the “Application configuration” impact the “Compute type” field options?
Depending on the application configuration you select, the compute-type dropdown field will update and recommend the type you should use. In this screenshot, the R/Bioconductor configuration recommends using a standard VM with these default values for CPU, memory, and persistent disk. The Hail application configuration requires Spark, so the persistent disk option is unavailable.
>
6. Is the Memory (GB) configuration the amount of memory per CPU, or the amount of memory overall?
The Memory (GB) configuration is how much memory you want overall. The more CPUs you use in your Cloud Environment, the more memory you can request.
7. How do I know if the VM is running or not?
You will see the status in the Cloud Environment widget. It will display CREATING when it is getting started, RUNNING when it is running, etc.
8. How do I know if the kernel is on or not? (notebooks)
The circle icon at the far right (next to Python 3 in the screenshot above) will be filled if the kernel is processing and the row that is running will show ln[*]. If the circle is open, as shown above, it means the kernel is idle.
You can also hover your pointer over this circle to see the status.
9. How do I tell the status at any moment in time? (notebooks)
Use a combination of the answers to the two preceding questions. Note: You'll see when the notebook was last saved back to the workspace bucket by this "Last Checkpoint" note here at the top (you may save the notebook at any time by clicking the disk icon).
10. What is the directory structure of the virtual machine and why is it structured this way?
RStudio
/home/rstudio
Jupyter Environments
/home/jupyter/your-workspace-name/[edit|safe]/my-notebook-name.ipynb
Hint: If you forget the structure, you can always check out what it is by running the ‘ls’ command on the terminal or clicking the Jupyter icon on the left-hand side of the notebook.
The workspace name is embedded in the structure. (Note: This is just the name and doesn’t include the billing project). The next level down is edit or safe modes. These modes protect overwriting when collaborating on a notebook with others in your workspace. The edit directory saves to the workspace bucket and safe represents playground mode. You can still execute in playground mode, but the edited notebook won’t save to the workspace bucket.
11. When using the virtual machine, when should I request a preemptible machine?
Preemptible machines are only available for Spark clusters. Google provides best practices for using preemptibles in this section of the Google Cloud guides.
Note: This means you cannot have a PD and use preemptible machines at the same time.
12. How do preemptible machines work for Jupyter notebooks?
Spark will try to find another worker for the task when your machine is preempted.
Google states >that preemptibles can’t "live migrate" to a regular VM. So if your preemptible machine gets taken, another preemptible machine will replace it. These machines can’t run longer than 24 hours, and Google lists other limitations in the document linked earlier in this answer.
13. What are the parameter recommendations for the VM?
From Understanding and adjusting your Cloud Environment:
"Featured and template workspaces and notebooks will include recommended and project-specific configurations, as well as estimated costs to run (where possible). Since it is fairly straightforward to adjust the compute power, you can estimate an initial power to try and then dial it up or down as needed. Just be sure to be careful to save any generated data you want to keep when recreating the Cloud Environment."
Note: If you select the default environment or other application configuration, it is auto-populated with recommendations.
14. Which registries can I use for my Docker images to build custom cloud environments?
Terra allows you to use custom Docker images from:
- Google Container Registry (GCR)
- GitHub Container Registry (GHCR)
- DockerHub
To learn more, see the Docker section in Terra Support.