Compute environments allow you to launch different Terra-supported applications like Jupyter Notebooks or RStudio from your workspace. From your workspace cloud environments menu, you can select one of multiple pre-configured compute environments, including Terra-maintained Jupyter environments, Community-maintained environments, or Project-specific environments. This document describes project-specific environments and how to use them in Terra.
- What are project-specific environments?
- Current project-specific environments
- Accessing project-specific environments in Terra
- Saving data generated using project-specific environments
What are Project-specific environments?
Terra supports multiple genomic research projects developed by different consortia (i.e. AnVIL, BioData Catalyst, Firecloud, etc.). Each of these consortia require different computational environments to analyze their genomic datasets. These project-specific environments allow the consortia to run specific applications in their Terra workspace (Jupyter Notebooks, Bioconductor, RStudio, etc.).
**Current project-specific environments**
The following section lists project-specific environments for Terra-supported applications; this list will be updated as new project-specific environments become available.
- AnVIL RStudio (see the AnVIL Docker documentation for details).
- Pegasus, a Python package for analyzing and visualizing large single-cell transcriptomes (find the Pegasus Jupyter Image here)
Accessing a project-specific environment in Terra
You can access a project-specific environment by selecting "Project-Specific Environment" from the Environment drop-down menu of the workspace Cloud Environment box. The following steps will guide you through this process.
1. Go to your Terra workspace and select the Cloud Environment widget in the workspace upper right corner, and click the "Customize" button near the bottom.
2. From the Application Configuration drop-down menu, select "Project-specific Environment".
3. In the URL text box of the same Cloud Environment box, paste the path to the project-specific image. For example, for the Bioconductor image, paste "us.gcr.io/broad-dsp-gcr-public/terra-jupyter-r:0.0.7".
4. Select "Create" at the bottom of the Cloud Environment window.
5. A warning box will appear to inform you that the Docker image is unverified. Select "Create". The application will then take a few minutes to load.
6. When your application is ready, a screen will appear stating that your new cloud environment is ready to use. Select "Apply".
7. A message will appear in the upper right corner asking if you would like to launch. Select "Launch Application" to open the application.
8. The application will open in the workspace. Your Notebook's Cloud Environment box will indicate that the application is running.
9. To stop the application, press the pause button on the Notebook's Cloud Environment box.
Saving work generated in a project-specific environment
If you are launching any application other than Jupyter Notebooks, your work will not be saved to your workspace. While your code will be saved on the cloud environment, if you delete the cloud environment (or if your runtime becomes unresponsive), you may lose code.
You can save files and code generated using a project-specific environment by 1) copying them to your workspace Google bucket, 2) downloading them from your workspace bucket to your local computer, and 3) checking the code into GitHub.
1. Copying work to a workspace Google bucket
Use the gsutil tool to copy files to your workspace Google bucket. Before copying files, you must first identify the url for the workspace Google bucket. This information is found in the Dashboard's Workspace Information panel (see below).
Next, to copy all files generated after completing your work in any application, use the bash command shown below in either the application console or in the terminal. If you want to copy individual files, you can replace `*` with the file name to copy. If using the workspace terminal, you will have to navigate to the folder containing the files.
gsutil cp ./* gs://<WORKSPACE_BUCKET>
For example, if the google bucket id is 'fc-7da2e5e7-d21f-4f78-90c2-27fb2414086e', type the following command:
gsutil cp ./* gs://fc-7da2e5e7-d21f-4f78-90c2-27fb2414086e
More details on how to copy files into a google bucket using python or R commands can be found in the article "Copying notebook output to a Google bucket".
2. Downloading files from a workspace Google bucket
Once your files are copied to a workspace Google bucket, you can access them by selecting the Data tab of the workspace and choosing the Files option on the bottom left.
This will display the files available in your Google bucket. By selecting a file, you download it directly. Additionally, this Terra support document details alternative techniques you can use to download data files.
3. Checking code into GitHub
For Git savvy researchers using the AnVIL RStudio application, you can also save your work by installing Git on RStudio and checking your code into GitHub.
Want to learn more about Docker environments?
If you are a developer or a researcher interested learning more about building Docker environments, we have additional documentation on Working with Dockers, including (but not limited to!):
- Docker/container overview
- Docker tutorial: Custom cloud environments for Jupyter Notebooks
- Install Docker and test that it works