Using the terminal and interactive analysis shell in Terra

Allie Hajian
  • Updated

In Terra, the terminal interface lets you execute UNIX command-line code quickly within the cloud environment that runs Jupyter Notebooks in a workspace. The terminal lets you list and move files to and from the notebook disk and a Google bucket, and install packages quickly - without having to put these actions into code cells. 

This article covers the basics of what you can do with Terra's command-line terminal interface and how to do it.

Use the terminal sparinglyWe recommend using this capability sparingly, because it runs counter to the underlying purpose of Jupyter Notebooks, which is to capture every meaningful action taken during an analysis. Omitting some actions - like installing a package or importing data - from the notebook record could make it harder to reproduce an analysis.

Why use the terminal in Terra? 

Below are examples of actions that can be done more quickly or easily in the workspace terminal, rather than from inside a notebook.

Moving data, including controlled-access data, into a Terra workspace

Sometimes you need to get large datasets into your workspace storage, but can't use the platform interface (e.g., the Upload button in the Data tab). Some examples include:

  • The data is on the Cloud, but it isn't possible to export it straight into Terra.
  • The data is stored on an external FTP server.
  • The data is in local (on-premises) storage.
  • The data is in a Google bucket not associated with a Terra workspace.

The terminal in Terra is the perfect way to move files into your workspace - for all of these cases and more. For example, you can use gcloud storage in the terminal to do the following:

  • Move files between the Cloud Environment (Persistent Disk) and your workspace bucket.
  • Move files from an external Google Cloud bucket into your workspace bucket.
  • Move files from an external Google Cloud bucket into your Cloud Environment directly. 

Speedy testing

When you want to quickly test out your code, you can run Python, Spark, or R scripts directly from the terminal. Note: you must already have a Jupyter Environment VM running to do this (see step-by-step instructions below). 

How to access the terminal from a workspace

You can access the terminal from the Analyses tab or from within an RStudio notebook.

  • Go to the Analyses tab of your workspace and follow the instructions below. If you already have a Jupyter Environment running, start with step 4. 

    1. Click the cloud icon in the sidebar (right-hand side) to access the Cloud Environment Details pane.

    2. Click the gear icon under the Jupyter logo to start a notebook VM.

    3. Click the create button to spin up a default Jupyter Environment.

    4. When the VM is running (2-3 minutes), the dot under the Jupyter logo in the right sidebar will turn green.
    Screenshot showing the Analyses tab for an example workspace. An orange rectangle and arrow highlight three icons on the right-hand panel of the screen: the Cloud Environment icon, the Jupyter environment icon, and the terminal icon.
    Once that happens, you can click on the >_ icon in the sidebar to start the terminal. It is similar to a Unix terminal and will open in a separate browser tab.

  • Go to the Analyses tab of your workspace and follow the instructions below. If you already have an RStudio Environment running, skip to step 5.

    1. Click the cloud icon in the sidebar (right-hand side) to access the Cloud Environment Details pane.

    2. Click the gear icon under the RStudio logo to start a notebook VM.

    3. Click the create button to spin up a default RStudio Environment.

    4. When the VM is running (2-3 minutes), the dot under the RStudio logo in the right sidebar will turn green. Once this happens, you can open RStudio by clicking on the RStudio icon > open.
    5. To access the built-in terminal from RStudio, go to Tools > Terminal > New Terminal from the top menu.
    Screenshot showing the menu selection used to open a new terminal from within RStudio on Terra.

    You'll see your terminal shell right in RStudio.
    Screenshot showing a new terminal window open in the Rstudio app on Terra.

From the terminal prompt, you can perform command-line tasks as you would in a Jupyter Notebook. Unlike a Jupyter Notebook, tasks performed here are not reproducible, but this may be quicker depending on your specific needs.

A unique Cloud Environment hosts your terminal instance for each user and each workspace. To learn more about your Cloud Environment and how to customize it, see Your interactive analysis VM (Cloud Environment).

Terminal/bash shell basics

How to change your terminal prompt

The shell prompt defaults to user-name@hostname, for example:“Jupyter-user987983y751y97500566”

You can shorten this to a more reasonable length (like "userX"), with the following code: export PS1="userX"

Common bash commands

The default shell in Terra is bash. Some common bash commands include:

  • Standard UNIX commands: pwd, ls, cd, grep, sed, awk, …
  • Edit files: vi/vim, nano
  • Copy files to/from Google storage buckets: gcloud storage cp
  • Add/update/download Terra data tables: fissfc # FireCloud/FISS
  • Download files using HTTPS: wget
  • Perform HTTP web operations: curl
  • Git/GitHub access: git clone https://github.com/my_org/my_repo.git
  • FTP: ftp (not installed by default)
  • SSH: ssh (not installed by default)

Useful environment variables defined by default in the terminal

To see the full list of environment variables defined by default in the terminal, use the command env.

  • WORKSPACE_NAME
  • WORKSPACE_NAMESPACE: Represents the parent folder of the workspace (the Terra Billing project). Use this variable for making Terra API calls via AnVIL or FISS.
  • GOOGLE_PROJECT: Represents the actual Google project assigned to the workspace. Use this variable for direct cloud calls such as GCS or BigQuery.
  • WORKSPACE_BUCKET
  • OWNER_EMAIL
  • RUNTIME_NAME

How to move and copy data using the terminal

Jupyter and RStudio Cloud Environments have dedicated persistent storage (learn more in Detachable Persistent Disks). However, there are times you may want to copy files from your app's virtual machine (VM) to the workspace bucket (or external bucket).

Three reasons you might want to copy data from the PD to your workspace bucket

  • To share with collaborators
    Since your Cloud Environment is unique to each user, if you want collaborators to have access to data generated in an interactive analysis (for further analysis), you need to copy it to the workspace bucket.
  • To use as input for a workflow
    Workflows cannot access data in the PD. Therefore, move files you want to use for downstream analyses, or keep for an extended period of time, from the VM to your Workspace bucket.
  • To preserve generated data
    If you “delete all environment options including the Persistent Disk” you will lose all generated data you saved. Note: You can always select the option to keep your persistent disk. Or, you may want to move generated data to less expensive Nearline or Coldline storage in an external bucket. 

Because the terminal in Terra shares a VM with Jupyter Notebooks and RStudio, you can easily access data files generated in a notebook or RStudio analysis with the workspace terminal. To move to or from the workspace Google bucket, follow the steps below.

Step-by-step instructions

1. Find the path for your workspace Google bucket, listed under the Cloud Information section on the right-hand side of your workspace dashboard. Copy it by clicking on the clipboard icon.
Screenshot showing the Cloud Information section of an example workspace dashboard. An orange rectangle highlights the google bucket name for the workspace.

2. Open Terra’s terminal by clicking the terminal icon (>_) from the right-hand sidebar while a Jupyter Environment is running. 

3. Type inls and hit enter to verify that your files are indeed in your current working directory.

4. To move a file in your terminal VM called README.txt to your workspace Google Bucket, use the following command:
gcloud storage cp README.txt gs://fc-your-bucket-name-here/

5. To move all files, you can substitute '*' for the file name in the above command.

6. To create a psuedofolder with README.txt in it, attach the desired psuedofolder name to the gs URI of your bucket. If this psuedofolder already exists, this command will also place the file in the existing psuedofolder.
gcloud storage cp README.txt gs://fc-your-bucket-name-here/psuedofolder/

Learn more!To learn how to use the terminal to avoid losing data stored or generated in a notebook or RStudio analysis, see How (and why) to save data generated in an interactive analysis to a Workspace bucket.

For more instructions on moving or copying data from an analysis app cloud environment to a Google bucket using gcloud or ftp, see Copying notebook or RStudio output to a Google bucket.

Terminal limitations

Cannot move files from public, non-gs servers

The terminal in Terra does not support wget or ftp. If you try to move files from an online resource that hosts files publicly, such as an NIH FTP server, you will need to download the files using rsync.

Currently, sudo/root access is not supported

We constantly improve functions in Terra. We will let you know if this becomes an option.

How to copy-paste in a notebook terminal

You cannot use keyboard shortcuts to paste into the terminal in Terra. Instead, right click and select paste.

What to do when the terminal stops responding

The terminal in Terra will sometimes hiccup if you change browser tabs, due to the way that some browsers handle tab-switching and manage memory. It is best to avoid switching tabs away from the terminal window if it’s currently doing something. If it does hang, refresh the page. 

Additional terminal resources

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.