The article shares some of the frequently asked questions about the cloud environment's functionality in Terra. Please leave a comment on the article if you have more questions that should be considered.
Frequently Asked Questions
- Do I have a different cloud environment for each workspace?
- Can my collaborator and I use the same cloud environment?
- When using a virtual machine (VM), when are files saved and when are they not? What about packages?
- If I have a persistent disk (PD), what happens if I restart the cloud environment? What is saved?
- How does the “Application configuration” I select impact the “Compute type” field options?
- How do I know if the VM is running or not?
- How do I know if the kernel is on or not?
- How do I tell the status at any moment in time?
- What is the directory structure of the virtual machine and why is it structured this way?
- When using the virtual machine, when should I request a preemptible machine?
- How do preemptible machines work in the context of Jupyter notebooks?
- What are the parameter recommendations for the VM?
1. Do I have a different cloud environment for each workspace?
Cloud environments are created for an individual user at the project level. This means you have a unique cloud environment for each of your billing projects, and you share the environment across all workspaces in the same billing project. You can keep tabs of the cloud environments you've created for each billing project at https://app.terra.bio/#clusters.
2. Can my collaborator and I use the same cloud environment?
Since Google creates cloud environments at the individual level, collaborators cannot use the same cloud environment. Collaborators can work in the same workspace notebook, however. Even though both users have their distinct cloud environments, notebooks live in the workspace's bucket, and changes are written directly to the file.
Terra automatically protects the file from being used by multiple people simultaneously. This helps avoid collisions where two people are overwriting each other's changes. If you try to edit a notebook that's currently in use, you'll see a message like this:
3. When using a virtual machine (VM), when are files saved and when are they not? What about packages?
Files or data that you generate will remain available while your machine is running or paused. Save any files that you want to keep to your mounted persistent disk at /home/jupyter-user/notebooks or /home/rstudio. You only need to actively check that files are saved there when you are about to delete your cloud environment, not when you pause and resume. To learn more, we recommend this article. To learn about how to copy data from a notebook to cloud bucket storage, refer to this article.
Packages you install in a notebook are stored on your persistent disk. No active saving is required.
4. If I have a persistent disk (PD), what happens if I restart the cloud environment? What is saved?
Even if you don’t have a PD and pause and resume, your files will still be there. When you want to update the cloud environment is when you should actively save files to your detachable PD as described in this article. You will receive a prompt when you opt to update that asks you to 1) Keep persistent disk, delete application and compute profile, or 2) Delete everything, including persistent disk. Select the first option to save your files to your detachable PD.
5. How does the “Application configuration” I select impact the “Compute type” field options?
Depending on the application configuration you select, the compute-type dropdown field will update and recommend the type you should use. In this screenshot the R/Bioconductor configuration recommends using a standard VM with these default values for CPU, memory, and persistent disk. If you select the Hail application configuration, it requires Spark and you will find that the persistent disk option is unavailable.
6. How do I know if the VM is running or not?
The status will be displayed in the Cloud Environment widget. It will display CREATING when it is getting started, RUNNING when it is running, etc.
7. How do I know if the kernel is on or not?
The circle icon at the far right (next to Python 3) will be filled if the kernel is processing and the row that is being processed will show ln[*]. If the circle is open, as shown above, it means the kernel is idle. You can also hover your pointer over this circle to see the status.
8. How do I tell the status at any moment in time?
It's a combination of the answers to questions 4 and 5 above. In addition, you will see when the notebook was last saved back to the workspace bucket by this "Last Checkpoint" note here at the top (you may save the notebook at any time by clicking the disk icon).
9. What is the directory structure of the virtual machine and why is it structured this way?
Hint: If you forget the structure you can always check out what it is by running the ‘ls’ command on the terminal or clicking the Jupyter icon on the left hand-side of the notebook.
The workspace name is embedded in the structure (note this is just the name and doesn’t include the billing project) because you can have multiple workspaces in your billing project where you may have used the cloud environment. The next level down is edit or safe modes. These modes protect overwriting when collaborating on a notebook with others in your workspace. The edit directory saves to the workspace bucket and safe represents playground mode. You can still execute in playground mode, but the results won’t save to the workspace bucket.
10. When using the virtual machine, when should I request a preemptible machine?
Preemptible machines are only available for Spark clusters. Google provides best practices for using preemptibles in this section of the Google Cloud guides.
Note: This means you cannot have a PD and use preemptible machines.
11. How do preemptible machines work in the context of Jupyter notebooks?
Spark will try to find another worker for the task when your machine is preempted.
Google states that preemptibles can’t "live migrate" to a regular VM. So if your preemptible machine gets taken, another preemptible machine will replace it. These machines can’t run longer than 24 hours, and Google lists other limitations in the document linked earlier in this answer.
12. What are the parameter recommendations for the VM?
"Featured and template workspaces and notebooks will include recommended and project-specific configurations, as well as estimated costs to run (where possible). Since it is fairly straightforward to adjust the compute power, you can estimate an initial power to try and then dial it up or down as needed. Just be sure to be careful to save any generated data you want to keep when recreating the cloud environment."
Note: If you select the default environment or other application configuration, it is auto-populated with recommendations.