In Terra, the terminal interface lets you execute UNIX command-line code quickly within the cloud environment that runs Jupyter Notebooks in a workspace. The terminal lets you list and move files to and from the notebook disk and a Google bucket, and install packages quickly - without having to put these actions into code cells.
This article covers the basics of what you can do with Terra's command-line terminal interface and how to do it.
Optimal use of the terminal in Terra
Note: We recommend using this capability sparingly, because it runs counter to the underlying purpose of Jupyter Notebooks, which is to capture every meaningful action taken during an analysis. Listing directory contents may not be meaningful in that context, but installing a package or importing data may play an essential role. Omitting such actions from the notebook record could create missing links that break the reproducibility of the work.
Why use the terminal in Terra?
Below are examples of actions that can be more quickly or easily done using command line operations in the workspace terminal.
Moving data, including controlled-access data, into a Terra workspace
Sometimes you need to get large datasets into your workspace storage (Google bucket or Cloud Environment persistent disk (PD) but can't use the platform interface (e.g., the Upload button in the Data tab). Some examples include:
- The data you need is not available with an "Export to Terra" option (for example, the most recent BioData Catalyst snapshots are in the cloud, but cannot be accessed through the Terra or Gen3 interface).
- Data are stored on an external FTP server.
- Data are in local (on-premises) storage.
- Data are in a Google bucket not associated with a Terra workspace.
The terminal in Terra is the perfect way to move files into your workspace - for all of these cases and more. For example, you can use gsutil in the terminal to do the following:
- Move files between the Cloud Environment (Persistent Disk) and your workspace bucket.
- Move files from an external Google Cloud bucket into your workspace bucket.
- Move files from an external Google Cloud bucket into your Cloud Environment directly.
Speedy testing using the terminal
When you want to test quickly, run Python, Spark, or R scripts directly from the terminal if you already have a Jupyter Environment VM running (see step-by-step instructions below).
How to access the terminal from a workspace
To access the workspace terminal, you must launch a Jupyter Cloud Environment first if one is not already running, as the terminal runs on this virtual machine. Note: Because you are already authenticated in the terminal as your pet service account, no additional authentication is required.
Scroll to the top right corner of any workspace page to see these icons, which will lead you to a command-line interface. Click on the (
>_) icon (to the left of the play or pause button) to start the terminal.
It is similar to a Unix terminal and will open in a separate browser tab.
Go to the Analyses tab of your workspace and follow the instructions below. If you already have a Jupyter Environment running, start with step 4.
1. Click the cloud icon in the sidebar (right-hand side) to access the Cloud Environment Details pane.
2. Click the gear icon under the Jupyter logo to start a notebook VM.
3. Click the create button to spin up a default Jupyter Environment.
4. When the VM is running (2-3 minutes), the dot under the Jupyter logo in the right sidebar will turn green.
Once that happens, you can click on the
>_icon in the sidebar to start the terminal. It is similar to a Unix terminal and will open in a separate browser tab.
To access the built-in terminal from RStudio, go to Tools > Terminal > New Terminal from the top menu.
You'll see your terminal shell right in RStudio.
From the terminal prompt, you can perform command-line tasks as you would in a Jupyter Notebook. Unlike a Jupyter Notebook, tasks performed here are not reproducible, but this may be quicker depending on your specific needs.
A unique Cloud Environment hosts your terminal instance for each user and each workspace. To learn more about your Cloud Environment and how to customize it, see Your workspace Cloud Environment.
Terminal/bash shell basics
How to change your terminal prompt
The shell prompt defaults to user-name@hostname, for example:
You can shorten this to a more reasonable length (like "userX"), with the following code:
Common bash commands
The default shell in Terra is bash. Some common bash commands include:
- Standard UNIX commands: pwd, ls, cd, grep, sed, awk, …
- Edit files: vi/vim, nano
- Copy files to/from Google storage buckets: gsutil
- Add structured data to BigQuery: gsutil
- Add/update/download Terra data tables: fissfc # FireCloud/FISS
- Download files using HTTPS: wget
- Perform HTTP web operations: curl
- Git/GitHub access: git clone https://github.com/my_org/my_repo.git
- FTP: ftp # Not installed by default
- SSH: ssh # Not installed by default
Useful environment variables defined by default in the terminal
To see the full list of environment variables defined by default in the terminal, use the command
Represents the parent folder of the workspace (the Terra Billing project). Use this variable for making Terra API calls via AnVIL or FISS.
Represents the actual Google project assigned to the workspace. Use this variable for direct cloud calls such as GCS or BigQuery.
How to move and copy data using the terminal
Jupyter and RStudio Cloud Environments have dedicated persistent storage (learn more in Detachable Persistent Disks). However, there are times you may want to copy files from your app's virtual machine (VM) to the workspace bucket (or external bucket).
Three reasons to copy data from the PD to your workspace bucket
- To share with collaborators
Since your Cloud Environment is unique to each user, if you want collaborators to have access to data generated in an interactive analysis (for further analysis), you need to copy it to the workspace bucket.
- To use as input for a workflow
Workflows cannot access data in the PD. Therefore, move files you want to use for downstream analyses, or keep for an extended period of time, from the VM to your Workspace bucket.
- To preserve generated data
If you “delete all environment options including the Persistent Disk” you will lose all generated data you saved. Note: You can always select the option to keep your persistent disk. Or, you may want to move generated data to less expensive Nearline or Coldline storage in an external bucket.
Because the terminal in Terra shares a VM with Jupyter Notebooks and RStudio, you can easily access data files generated in a notebook or RStudio analysis with the workspace terminal. To move to or from the workspace Google bucket, follow the steps below.
1. Find the path for your workspace Google bucket: check the right-hand column of the workspace dashboard. You will see a link under the heading Google Bucket. Copy it by clicking on the clipboard icon.
2. Open Terra’s terminal by clicking the terminal icon (
>_) from the right-hand sidebar while a Jupyter Environment is running.
3. Type in
ls and hit enter to verify that your files are indeed in your current working directory.
4. To move a file in your terminal VM called
README.txt to your workspace Google Bucket (fc-your-bucket-name-here), use the following command:
gsutil cp README.txt gs://fc-your-bucket-name-here/
5. To move all files, you can substitute '*' for the file name in the above command.
6. To create a psuedofolder with
README.txt in it, attach the desired psuedofolder name to the gs URI of your bucket. If this psuedofolder already exists, this command will also place the file in the existing psuedofolder.
gsutil cp README.txt gs://fc-your-bucket-name-here/psuedofolder/
Learn more!How to use the terminal to avoid losing data stored or generated in a notebook or RStudio analysis - see How (and why) to save data generated in an interactive analysis to a Workspace bucket.
For more instructions on moving or copying data from an analysis app cloud environment to a Google bucket using gsutil or ftp, see Copying notebook or RStudio output to a Google bucket.
Cannot move files from public, non-gs servers
The terminal in Terra does not support wget or ftp. If you try to move files from an online resource that hosts files publicly, such as an NIH FTP server, you will need to download the files using rsync.
Currently, sudo/root access is not supported
We constantly improve functions in Terra. We will let you know if this becomes an option.
How to copy-paste in a notebook terminal
You cannot use keyboard shortcuts to paste into the terminal in Terra. Instead, right click and select paste.
What to do when the terminal stops responding
The terminal in Terra will sometimes hiccup if you change browser tabs, due to the way that some browsers handle tab-switching and manage memory. It is best to avoid switching tabs away from the terminal window if it’s currently doing something. If it does hang, refresh the page.
Please sign in to leave a comment.