With Terra's terminal interface you can execute UNIX command-line code quickly within the cloud environment that runs notebooks in a workspace. This allows you to perform actions like listing files, moving files to and from the notebook disk and a Google bucket, and installing packages quickly - without having to put these actions into code cells.
This article covers the basics of what you can do with Terra's command-line terminal interface and how to do it.
Note that we recommend using this capability sparingly, largely because it runs counter to the underlying purpose of Jupyter Notebooks, which is to capture every meaningful action taken during an analysis. Listing directory contents may not be meaningful in that context, but installing a package or importing data may play an essential role. Omitting such actions from the notebook record could create missing links that break the reproducibility of the work.
Why use the terminal in Terra? Some use cases
Moving data, including controlled-access data, into a Terra workspace
There are times when you need to get large datasets into your workspace but can't use the platform interface.
- The data you need is not available with an "Export to Terra" option (for example, the most recent BioData Catalyst snapshots are in the cloud, but cannot be accessed through the Terra or Gen3 interface)
- Data are stored on an external FTP server
- Data are in local (on-prem) storage
- Data are in a Google bucket not associated with a Terra workspace
In these cases and more, Terra's terminal is the perfect way to move files into your workspace. You can use gsutil in the terminal to move files between the Cloud Environment and your workspace bucket as well as to move files from other Google Cloud buckets into your workspace bucket or Cloud Environment directly.
Speedy testing using the terminal
When you want to test quickly, you can run Python, spark, or R scripts directly from the terminal. Note that you can do this from any page in your workspace by selecting the "terminal" icon (
>_) from the widget at the top right, as long as the Cloud Environment runtime is active.
How to access the terminal from a workspace
Scroll to the top right corner of any workspace page to see these icons, which will lead you to a command-line interface. Click on the (
>_) icon (to the left of the play or pause button) and you'll be able to access what resembles a UNIX terminal. It will open in a separate browser tab.
You will need to launch a Cloud Environment first if one is not already running, as this is the virtual machine the terminal runs on as well. Note that since you are already authenticated in the terminal as your pet service account, there is no additional authentication required.
From here, you can perform command-line tasks like you would in a Jupyter notebook. Unlike a Jupyter notebook, tasks performed here are not reproducible, but this may be quicker depending on your specific needs.
There is a unique Cloud Environment that hosts your terminal instance for each user and each workspace. To learn more about your Cloud Environment and how to customize it, see this article.
Terminal/bash shell basics
How to change your terminal prompt
You can shorten this to a more reasonable length (like "userX" in the example below), with the following code:
Common bash commands
- Standard UN*X commands: pwd, ls, cd, grep, sed, awk, …
- Edit files: vi/vim, nano
- Copy files to/from Google storage buckets: gsutil
- Add structured data to BigQuery: gsutil
- Add/update/download Terra data tables: fissfc # FireCloud/FISS
- Download files using HTTPS: wget
- Perform HTTP web operations: curl
- Git/GitHub access: git clone https://github.com/my_org/my_repo.git
- FTP: ftp # Not installed by default
- SSH: ssh # Not installed by default
Useful environment variables defined by default in the terminal
Represents the parent folder of the workspace (the Terra Billing project). Use this variable for making Terra API calls via AnVIL or FISS.
Represents the actual Google project assigned to the workspace. Use this variable for direct cloud calls such as GCS or BigQuery.
Moving and copying data using the terminal: detailed instructions
The notebook cloud environment has detachable persistent storage (which you can learn about here), but you'll want to know how to copy files from your notebook's virtual machine (VM).
For example, any time you “delete environment options” you could lose all data you saved to that cloud environment VM disk, if you haven't selected the option to keep your persistent disk. Additionally, workflows cannot call data that is within the notebook VM. Therefore, you should move files you want to use for downstream analyses, or keep for an extended period of time, from the VM to your Workspace bucket.
Because Terra’s terminal shares a VM with Jupyter notebooks, you can easily access any files generated by calling commands in your Jupyter notebook with the workspace terminal. To move to or from the workspace Google bucket, follow the steps below:
1. Find the path for your workspace Google bucket: check the right-hand column of the workspace dashboard. You will see a link under the heading Google Bucket that you can copy by clicking on the clipboard icon.
2. Open Terra’s terminal by clicking the terminal icon (
>_) from the widget at the top right while the Cloud Environment is running.
3. Type in
ls and hit enter to verify that your files are indeed in your current working directory.
4. To move a file in your terminal VM called
README.txt to your workspace Google Bucket (fc-your-bucket-name-here), use the following command:
gsutil cp README.txt gs://fc-your-bucket-name-here/
5. To move all files, you can substitute '*' for the file name in the above command.
6. To create a psuedofolder with
README.txt in it, attach the desired psuedofolder name to the gs URI of your bucket. If this psuedofolder already exists, this command will also place the file in the existing psuedofolder.
gsutil cp README.txt gs://fc-your-bucket-name-here/psuedofolder/
|How to use the terminal to avoid losing data stored or generated in a notebook - see this article.
For additional instructions on moving or copying data from a notebook cloud environment to a Google bucket using gsutil or ftp, see this article.
Terminal limitations and troubleshooting
Moving files from public, non-gs servers
Terra's terminal does not support wget or ftp. If you are trying to move files from an online resource that hosts files publicly, such as an NIH FTP server, you will need to download the files using rsync.
sudo/root access is currently not supported
We are constantly improving functions in Terra. We will let you know if this becomes an option.
How to copy-paste in a notebook terminal
You will not be able to use keyboard shortcuts to paste into Terra’s terminal. Instead, use right-click and select paste.
What to do when the terminal stops responding
Terra’s terminal will sometimes hiccup if you change browser tabs, due to the way that some browsers handle tab-switching and manage memory. It is best to avoid switching tabs away from the terminal window if it’s currently doing something. If it does hang, refresh the page.
|The virtual cloud environment in Terra: Key Components|