Large-scale analyses - such as training a Machine Learning model - are often more efficient when run on a Cloud Environment that uses a graphics processing unit (GPU). Terra supports the use of GPUs for Jupyter Notebook Cloud Environments. This feature is currently in beta; this article outlines some general limitations and gives a breakdown of cost estimates for available configurations.
To learn more about GPUs on Terra, check out Speed up your machine learning work with GPUs.
Background
Graphics Processing Units (GPUs) are a useful tool when running large-scale machine learning analyses in a Terra notebook. All of Terra's interactive cloud environment docker images support GPUs. This is because Terra's interactive cloud environment images extend from a base Docker image dependent on Google's Deep Learning platform. This platform includes packages for using the CUDA parallel processing platform, the TensorFlow machine learning platform, and the PyTorch machine learning framework. This image also installs the NVIDIA drivers necessary for GPU support.
How to add GPUs to your cloud environment
To take advantage of GPUs in Terra, add them to your cloud environment's compute configuration.
1. Navigate to a workspace where you have Compute access.
2. Open your workspace's Cloud Environment Configuration menu.
If you're starting up a new Cloud environment, do this by clicking on the cloud icon with a lightning bolt inside it on the right-hand panel and then select Settings in the Jupyter section. GPUs are available for use with Jupyter notebooks, but not RStudio or Galaxy analyses.
If you're modifying an existing environment, click on the Jupyter icon on the right-hand panel.
3. Select your desired cloud environment settings. To learn more about these settings, read Your Interactive Analysis VM (Cloud Environment).
4. Enable GPUs: if you're starting up a new Cloud Environment, (the cloud icon), then click the Enable GPUs check box in the "Cloud compute profile" section.
To add GPUs to an existing environment, delete the environment first If you already have an existing environment, you'll see that the checkbox is unavailable, and you have to delete the environment manually. You can do this either by clicking the "delete" button at the bottom of the Cloud Environment widget or from the section of your profile that lists your cloud environments. This will make the GPU checkbox available for creating a new environment.
This is also true if you want to modify the GPU configuration of an existing environment. For instance, if you want to increase GPU power or change the GPU type, you need to delete the existing environment and re-create it with the desired configuration.
How to check that you successfully enabled GPUs
You can check that you successfully enabled GPUs and installed the libraries that are relevant to many Machine Learning analyses by running the code snippets below in a Jupyter notebook. Note that for these commands to succeed, you must have clicked the Enable GPUs checkbox.
-
Note: this code will only work if you've set up your Cloud Environment using an Application Configuration that includes PyTorch (e.g., Pegasus). Check whether PyTorch is included in your Application Configuration by clicking What's installed on this environment? under the Application Configuration drop-down menu in the Cloud Environment setup menu.
import torch
print(torch.cuda.is_available())
print(torch.version.cuda)
print(torch.cuda.current_device())
print(torch.cuda.get_device_name())If PyTorch is installed, you should see something like this:
-
Note: this code will only work if you've set up your Cloud Environment using an Application Configuration that includes TensorFlow. Check whether TensorFlow is included in your Application Configuration by clicking What's installed on this environment? under the Application Configuration drop-down menu in the Cloud Environment setup menu.
import tensorflow as tf print(tf.config.list_physical_devices('GPU')) from tensorflow.python.client import device_lib print(device_lib.list_local_devices())
If TensorFlow is installed, you should see something like this:
GPU Limitations
- GPUs can be used on Terra with Jupyter Notebooks (e.g., for TensorFlow), but not with Galaxy or RStudio.
- As with other interactive analysis compute resources in Terra, only the n1 family of machines is supported.
- Terra does not support updating an existing machine's GPU configuration. If you need to modify your GPU-enabled machine, you need to delete and recreate the Cloud Environment.
- You may experience a runtime creation failure in one of the following circumstances:
- You run up against your quota limitation. To find out how to check and change your quotas to fix the issue, see How to troubleshoot and fix stalled workflows.
- You see a ZONE_RESOURCE_POOL_EXHAUSTED error; you can either wait a day or two and try again, or use Swagger to create a GPU-enabled virtual machine in another zone within the us-central1 region via the API. The default zone is "us-central1-a", so you need to change the "zone" parameter to one of the other available zones.
- Terra only supports GPU use with the standard VM, so make sure you don't select a Spark compute type (or a Hail image).
How to estimate the cost of your GPU configuration
Each GPU type has limitations as to how many GPUs are available for a given quantity of CPUs and memory. As a result, each GPU configuration has a different cost. To estimate the cost of your configuration, read more about GPUs on the Google Cloud Engine and GPU pricing in the Google Cloud Compute Engine documentation.