Understanding and Customizing your Cloud Environment
FollowInteractive applications - such as Jupyter notebooks - run on virtual machines or clusters of machines. When running an interactive application in Terra, you can adjust the configuration of the your VM or cluster Cloud Environment to fit your computational needs. This article gives an overview of the components that make up your cloud environment and step-by-step instructions of how to customize them.
Note: The term "Notebook Runtime" is has been replaced with the term "Cloud Environment" in the UI.
Contents
Cloud Environment Components Overview
- Application configuration (packages and dependencies)
- Compute power
- Persistent disk
Customizing your Cloud Environment
- Warning about loss of data when recreating cloud environments
- Customizing with a Docker image - Expand for step-by-step instructions
- Customizing with a startup script - Expand for step-by-step instructions
- Setting a custom compute power - Expand for step-by-step instructions
How to determine the best configuration for your interactive analysis
How to avoid data loss when changing cloud environment parameters
Cost-saving recommendations
Cloud Environment Components Overview
The cloud environment is the components of the virtual machine or cluster of machines that run your interactive analysis application. Cloud environments consist of 1) an application configuration, 2) cloud compute and 3) a persistent disk.
To see your Cloud environment configuration, click the gear icon at the top right of any workspace page to reveal the form below. To customize your VM (including clusters!), you will select the "Customize" button at the bottom right of the Cloud environment form.
Application configuration
The application configuration includes the software and dependencies that are pre-installed in the cloud environment container. Terra includes several pre-configured environments for popular use-cases like Bioconductor or Hail analyses. You can customize the configuration with a Docker image or startup script.
|
|
---|---|
If your analysis requires software packages that are not part of the default or pre-configured configurations, you could start your interactive application by installing the ones you need. This approach can turn into a maintenance headache if you have multiple notebooks that require the same configuration commands. Some reasons to move some of those software installation steps into the application configuration proper: Efficiency Simplified setup Reproducibility |
Pre-configured options
Terra has four varieties of pre-configured application configurations - plus a custom option - available in the drop down menu (see screenshot). What versions and what libraries are included in each pre-configured option is also listed in the dropdown.
- Terra-maintained Jupyter environments
- Community-maintained Jupyter environments (verified partners)
- Community-maintained RStudio environments (verified partners)
- Custom environments
- Project-specific environments
Compute power
The compute power is the CPU and RAM available to your application, which determines how much processing can be done at a time. Customizing the compute power allows you to balance cost and functionality. For example, if your analysis is running slow, it could mean the CPUs and memory allotted are insufficient for the computations you're doing. It may be worth the cost of increasing the compute power so your analysis will complete quicker in real time.
|
|
---|---|
Note that more compute power costs more, and you don't want to request (and pay Featured and template workspaces and notebooks will include recommended and To learn more about controlling cloud costs, see this documentation. |
Setting a custom compute power - Step-by-step instructions
1. Select "Custom" profile from the drop down menu
2. In the new form that appears, choose the specification of your primary machine. For example:
CPUs | 8 | Memory (GB) | 30 | Disk Size (GB) | 100 |
If you only want one virtual machine, you're done!
3. To configure as a Spark cluster, (for parallel processing) first check off "Configure as a Spark cluster" and fill in the values for the secondary processors:
Workers | 120 | Preemptibles | 100 |
CPUs | 4 | Memory (GB) | 15 | Disk size | 500 |
The cost of the requested compute power will show at the bottom of the form. For example, when requesting a Spark cluster, your screen will look like this:
|
|
---|---|
Size your compute power appropriately Note that Terra automatically pauses a notebook after twenty minutes of inactivity. To learn more about controlling cloud costs in a notebook, see this article.
|
Persistent Disk
Your cloud environment comes with Persistent Disk storage that allows you to keep files stored in your Cloud Environment even after you delete the VM or cluster. The PD can be detached prior to deletion and reattached to a newly created VM. Using this as storage lets you keep the packages your notebook code is built upon, input files necessary for your analysis, and outputs you’ve generated (without having to move anything to permanent cloud storage).
|
|
---|---|
Overview: Customizing your Cloud Environment
|
|
---|---|
Changing the cloud environment can mean files generated or stored in the application To understand what changes you can make without losing generated data, see the Remember that your cloud environment is unique to you, and works at the Billing Project level. This means that you use the same singular cloud environment in all workspaces under the same Billing Project. If you recreate the cloud environment in one of your workspaces, you will see that change reflected when you use it in any other workspace you have under the same Billing Project. |
How to customize your cloud environment
If the default or project-specific environments don't fit your needs, you can use a custom Docker Image or include a startup script. Anyone using the same Docker image or startup script will have the exact same environment, which is critical for reproducibility. To learn more about developing and using custom Docker images in Terra, see these articles. Note that you can also use a custom environment to revert back to a previous version of the pre-configured environments.
To adjust the virtual environment and/or compute power of your application, first click on the gear icon in the widget at the top right of your workspace:
![]() |
This will reveal the form at left, with the current values of your cloud environment (see default values in screenshot below). To make changes, click the "Customize" button at the bottom right. You can modify the cloud environment at any time, even if you've already started working in an application (i.e. notebook). You'll see Terra's Cloud Environment configuration panel (screenshot below). Note that it is much simpler than the equivalent Google Cloud Platform interface! |
You'll pull up the configuration panel, specify what you want and let the system regenerate your cloud environment with the new specifications. It's not necessary to guess up front the resources you're going to need to do your work. You can start with minimal settings, then dial them up if you run into limitations.
![]() |
1. Application configuration 2. Cloud compute 3. Detachable Persistent Disk Don't forget to save the configuration, after changing any values. This will recreate the application compute with the new values, which can take up to five or ten minutes. |
Setting a custom environment with a Custom Docker Image
- First, select "Custom" from the Environment drop down menu
- Input the container image, using the format <image name>:<tag>. Note that custom environments must be based off one of the Terra Jupyter Notebook base images or a Project-Specific image
Setting a custom environment with a startup script
- Look a little further down at the "Compute Power" box, which allows you to modify the VM resource allocations
- Choose the "Custom" option from the drop down menu
- Input the path to the startup script in the field where it says "URI" (for Uniform Resource Identifier, a close cousin of URL, the Uniform Resource Locator)
How to avoid data loss when changing interactive analysis runtime
Notebooks are wonderful for interactive data analysis, but there are a few quirks that can lead you to lose work if you're not careful! The key issue is that files generated by the notebook are not automatically saved in the workspace bucket. Because the disk associated with the notebook runtime is deleted when you delete or make some changes to a cluster, you will lose installed packages and output data generated in a notebook if you delete or reconfigure a cluster in some ways without explicitly saving your output to the workspace bucket.
You will not lose your data if you pause (stop) a cluster, since the cluster goes away but the cloud environment disk does not. In fact, when you re-open your notebook, the cluster creates more quickly as the disk does not need to be recreated. As an added bonus, you do not need to reinstall your software.
What real-time updates can you make to Notebook compute resources without losing data?
- You can increase or decrease the # of CPUs or memory
During this update, the Notebook runtime will stop the runtime, update, and then restart. The update will take a couple of minutes to complete and you will not be able to continue editing or running the Notebook while it's completing. - You can increase the disk size or change the number of workers (when the number of workers is > 2)
During this update, you can continue to work in your Notebook without stopping your runtime. When the update is finished, you will see a confirmation banner.
Note that if you want to simultaneously change both the workers and CPU/memory, we advise doing this sequentially by first updating the CPUs/memory, waiting for the Notebook runtime to restart, and then adjusting the workers.
Any other runtime changes (e.g. decreasing the disk size or changing the environment type) require deleting the existing runtime and creating a new one, and any generated data files and installed packages will be lost. Please backup files as appropriate.
How to save interactive analysis outputs to workspace bucket
To avoid losing your data, make sure to explicitly save your outputs in the workspace bucket. You can find step by step instructions and exact code to do this within the notebook below.
Python kernel instructions (click "+" to expand)
1. Set the environment variables
import os BILLING_PROJECT_ID = os.environ['WORKSPACE_NAMESPACE'] WORKSPACE = os.environ['WORKSPACE_NAME'] bucket = os.environ['WORKSPACE_BUCKET']
2. Copy all files in the notebook into the workspace bucket
!gsutil cp ./* $bucket
# Run list command to verify file is in the bucket
!gsutil ls $bucket
Note: the bucket is a Google bucket, so basic bash commands in the notebooks need to be preceded by "gsutil." These commands will only work if you have run the commands above to set the environment variables. Once you execute these cells, the data files should be visible in the workspace bucket. If you want to copy individual files, you can replace `*` with the file name to copy.
R kernel instructions(click "+" to expand)
1. Set the environment variables
project <- Sys.getenv('WORKSPACE_NAMESPACE') workspace <- Sys.getenv('WORKSPACE_NAME') bucket <- Sys.getenv('WORKSPACE_BUCKET')
2. Copy all files in the notebook into the workspace bucket
#Copy all files generated in the notebook into the bucket system(paste0("gsutil cp ./* ",bucket),intern=TRUE) #Run list command to see if file is in the bucket system(paste0("gsutil ls ",bucket),intern=TRUE)
Note: the bucket is a Google bucket, so basic bash commands in the notebooks need to be preceded by "gsutil." These commands will only work if you have run the commands above to set the environment variables. Once you execute these cells, the data files should be visible in the workspace bucket. If you want to copy individual files, you can replace `*` with the file name to copy.
Comments
0 comments
Please sign in to leave a comment.