The article shares some of the frequently asked questions about the interactive analysis cloud environment functionality in Terra. Please leave a comment on the article if you have more questions.
1. Do I have a different Cloud Environment for each workspace?
Yes. Terra creates a separate Cloud Environment for each individual user and each workspace. This means data files stored in the Cloud Environment Persistent Disk are not available across workspaces.
Note: The way Terra generates Cloud Environments means two users collaborating in the same workspace will have separate Cloud Environments running applications like Jupyter notebooks, Galaxy and RStudio.
You can keep tabs on your Cloud Environments at https://app.terra.bio/#clusters.
2. Can my collaborator and I use the same Cloud Environment?
No. Cloud Environments are unique to individual users, so collaborators cannot use the same Cloud Environment or the associated Persistent Disk. However, collaborators can contribute changes to the same workspace notebooks and R markdown files since these files live in the workspace bucket. Changes are written directly to the file and synced to the workspace bucket.
Note: To share data generated from an interactive analysis (for example, a Jupyter notebook), you need to copy that data to the Workspace bucket. See Copying notebook output to a Google bucket for more details.
Overwriting notebooks when collaboratingTerra prevents multiple users from editing a notebook simultaneously and automatically avoids collisions where two people overwrite each other's changes. If you try to edit a notebook that's currently in use, you'll see a message like this:
3. When are files saved and when are they not? What about packages?
The answer depends on a few factors. See which corresponds to your case below. To learn more, see How and why to save data generated in a notebook to workspace storage.
When pausing a Cloud Environment
All generated files stored in the Cloud Environment's persistent disk will be preserved while your machine is paused.
When using Cloud Environments with a Persistent Disk (PD)
If your Cloud Environment includes a Persistent Disk, any files saved to your mounted Persistent Disk at
/home/studio will be saved.
The location will be
/home/jupyter-user for older Jupyter images.
To ensure data isn't deleted
- Actively check that files are saved before deleting your Cloud Environment.
You can do this in a terminal by navigating to the mounted PD directory and using the
- Choose the "keep persistent disk" option.
Not all environments have Persistent DisksAny environment that uses Spark single nodes or clusters cannot have a Persistent Disk.
To learn more about key components of Terra's notebook environment and how they interact, we recommend Terra's Jupyter Notebooks environment Part I: Key components
To learn how to copy data from the notebook memory to workspace bucket storage, see Copying notebook output to a Google bucket.
Packages you install in a notebook are stored on your Cloud Environment Persistent Disk. As long as you don't delete the Persistent Disk, no active saving is required.
4. If I have a Persistent Disk (PD), what happens if I restart the Cloud Environment? What is saved?
If you restart a paused Cloud Environment, all generated data, as well as libraries and packages are protected. Even if you don’t have a PD, your files will still exist in the VM memory if you pause and resume your Cloud Environment.
If you delete your Cloud Environment VM and the detachable PD, you will lose installed packages plus your generated files when you re-create the Cloud Environment unless you saved them to your workspace bucket (see Detachable Persistent Disks). When deleting your environment, you will receive a prompt asking if you want to
- Keep persistent disk, delete application configuration and compute profile, or
- Delete everything, including persistent disk
To make sure you keep files in your PD, select the first option.
Warning if you decrease the size of the PDIf you decrease the size of the PD, your data is at high risk of being deleted. This is because the only way to reduce the disk size is to delete the existing PD and create a new one (which is different than when increasing the size of the disk). Don't decrease the size of a PD without first saving the data you care about to an external location (such as your workspace bucket).
5. How does the “Application configuration” impact the “Cloud compute profile” field options?
Depending on the application configuration you select, the cloud compute profile fields will update. In this screenshot, the R/Bioconductor configuration defaults to a standard VM with these default values for CPU, memory, and Persistent Disk.
The Hail application configuration requires Spark, so the Persistent Disk option is unavailable.
6. Is the Memory (GB) configuration the amount of memory per CPU, or the amount of memory overall?
The Memory (GB) configuration is how much memory you want overall. The more CPUs you use in your Cloud Environment, the more memory you can request.
7. How do I know if the VM is running or not?
You will see a circle with a color that indicates the status in the respective Cloud Environment widget on the right-side panel of your workspace. If you hover over the widget it, you will see a pop-up with more details.
8. How do I know if the kernel is running? (Jupyter notebooks)
The circle icon at the far right (next to Python 3 in the screenshot above) will be filled if the kernel is processing and the row that is running will show ln[*]. If the circle is open, as shown above, it means the kernel is idle.
You can also hover your pointer over this circle to see the status.
9. How do I tell the status at any moment in time? (Jupyter notebooks)
Use a combination of the answers to the two preceding questions. Note: You'll see when the notebook was last saved back to the workspace bucket by this "Last Checkpoint" note here at the top (you may save the notebook at any time by clicking the disk icon).
10. How do I pause my Cloud Environment VM?
To pause your Cloud Environment VM, follow the steps below.
1. Select the Environment Configuration (cloud icon) button on the sidebar. This will open the Cloud Environment Details pane (left).
2. Select the Pause Environment button for any active VM in your workspace.
What to expect
When a Cloud Environment is paused, this button changes to Resume Environment.
11. How do I delete my Cloud Environment VM?
To delete your Cloud Environment VM, follow the steps below:
1. Select the Environment Configuration (cloud icon) button on the sidebar. This will open the Cloud Environment Details pane for whichever app you are running (below).
2. Select the Settings button under the app logo.
3. Scroll down to select Delete environment.
Galaxy app deletion
Galaxy VMs must be running to be deleted. If your Galaxy VM is paused, it needs to be resumed before it can be deleted.
13. Can I run Jupyter and RStudio at the same time in a single workspace?
Because Jupyter Notebook and RStudio environments use the same Google Cloud virtual machine (VM), creating an RStudio environment will delete an existing Jupyter Notebook environment and vice versa. If you try to do so, you’ll see a warning message:
However, you can have a Galaxy environment in the same workspace as either a Jupyter or RStudio environment.
14. I tried to upload my file,
analysis.rmd, to Terra in the Analyses tab, but nothing happened. Why?
RStudio requires that R Markdown file extensions be exactly
.Rmd. The “R” in the file extension needs to be capitalized.
15. What is the directory structure of the virtual machine and why is it structured this way?
If you forget the directory structureYou can always check out what it is by running the
ls command on the terminal or clicking the Jupyter icon on the left-hand side of the notebook.
The workspace name is embedded in the structure. (Note: This is just the name and doesn’t include the billing project). The next level down is edit or safe modes. These modes protect against overwriting when collaborating on a notebook with others in your workspace. The edit directory saves to the workspace bucket and safe represents playground mode. You can still execute in playground mode, but the edited notebook won’t save to the workspace bucket.
16. When should I request a preemptible machine?
Preemptible machines are only available for Spark clusters. Google provides best practices for using preemptibles in their Google Cloud article Create and use preemptible VMs.
17. How do preemptible machines work for Jupyter notebooks?
Spark will try to find another worker for the task when your machine is preempted.
In Google's documentation about preemptible VMs, Google states that preemptibles can’t "live migrate" to a regular VM. So if your preemptible machine gets taken, another preemptible machine will replace it. These machines can’t run longer than 24 hours, and Google lists other limitations in the document linked earlier in this answer.
18. What are the parameter recommendations for the VM?
"Featured and template workspaces and notebooks will include recommended and project-specific configurations, as well as estimated costs to run (where possible). Since it is fairly straightforward to adjust the compute power, you can estimate an initial power to try and then dial it up or down as needed. Just be sure to be careful to save any generated data you want to keep when recreating the Cloud Environment."
Note: If you select the default environment or other application configuration, it is auto-populated with initial recommendations.
19. Which Docker image registries can I use to build custom cloud environments?
Terra allows you to use custom Docker images from
- Google Container Registry (GCR)
- GitHub Container Registry (GHCR)
Custom Docker images must be derived from a Terra base imageFor more details, see the Docker tutorial: Custom Cloud Environments for Jupyter Notebooks.
To learn more, see the Docker section of the Terra Support documentation.