How to transfer data between your Cloud Environment PD and workspace storage

Allie Hajian
  • Updated

Learn how to move files between the two workspace storage options - the workspace bucket and Cloud Environment storage (Persistent Disk) - using gcloud command-line tools in the workspace terminal.

Diagram illustrating how data can move between Persistent Disk storage, Cloud storage, and local storage. A green double-ended arrow connects Cloud storage and Persistent Disk storage. This arrow is labeled 'Cloud Environment tools'. A black double-ended arrow connects Cloud storage and local storage. This arrow is labeled 'Terminal (local instance)'.

Workspace storage: Workspace bucket versus PD

Your workspaces contain two kinds of storage: a workspace bucket ("cloud storage") and a Persistent Disk.

The Workspace bucket is dedicated Google storage created when you create a workspace.

  • It is independent of Compute Engine instances and persistent disks. 
  • It's accessible to anyone with workspace access permissions.
  • It's deleted when the workspace is deleted.

The Cloud Environment Persistent Disk (PD) is storage mounted to the the virtual system that runs an interactive analysis (i.e., Galaxy, Jupyter Notebooks, or RStudio) on Terra.

  • Each user has a unique PD.
  • The PD is not accessible to collaborators, even in a shared workspace.
  • It's created when the Cloud Environment is created (e.g., to run a notebook).
  • Similar to USB storage on a personal computer, the PD can be detached and attached to another virtual machine (VM). Unless you intentionally delete it, it exists as long as you are a member of the Billing project. If you intentionally delete it when you delete a cloud environment, a different PD is created when you create a cloud environment again. Any data that used to be on the persistent disk before deletion are lost.
  • The PD exists in the cloud, but for security reasons is isolated from the rest of the cloud: no inbound communication is possible from the internet into the VM or associated PD storage. 

To learn more, see Terra architecture and where your files live in it

Reasons to copy/move data from the PD to the workspace bucket1. To run a workflow on data generated in an interactive analysis. Workflows cannot access data stored on a Persistent Disk; therefore, you must move data out of the PD into the workspace bucket to use the data in a workflow.

2. To allow a colleague to access generated data from an interactive analysis. Because your Cloud Environment VM is unique to you, you need to move generated data out of your PD and into the workspace bucket for your collaborators to access it.

3. To back up generated data from an interactive analysis in Terra. This includes backing your data up on local storage or in an external bucket. Note: To back up to local storage, you need to move the data from the workspace bucket to local storage using gcloud storage command line interface (CLI) in a local terminal instance (see Moving data between local storage and the workspace bucket).  

Reasons to copy/move data from Workspace bucket to the PD1. To analyze data generated by a workflow (or uploaded from local storage) in an interactive app (Galaxy, Jupyter notebook or RStudio).

Overview: How to move/copy data

What to do

Because of the one-way nature of communication between the VM and the rest of the cloud, you need to work from the VM terminal. You can push data from the PD to a workspace or external bucket or pull data from the workspace or external bucket into the PD.

How to do it

You'll use a command-line tool in the Cloud Environment terminal instance. There are many tools built into the workspace terminal instance, including gcloud storage, sshe, ftp. See the step-by-step instructions below to copy/move files using gcloud storage.

You must be an Owner or Writer to upload to a Google bucket!This includes the workspace bucket (you must be a workspace Owner or Writer). 

Step-by-step instructions

1. Start a Jupyter Cloud Environment

If you haven't already started a Jupyter Cloud Environment, scroll to the right of any workspace page and click the Environment Configuration button, shaped like a cloud. Configure your environment by clicking the gear-shaped Settings button under Jupyter, then click Create to create the environment. 

Screenshot showing the Environment Configuration icon, which is shaped like a cloud with a lightning bolt inside it.

2. Start the workspace terminal

Once Terra has created the Cloud Environment (2-3 minutes), a new >_ icon will appear underneath the Jupyter Cloud Environment icon. Click the >_ icon to open the terminal in a separate browser tab. 
Screenshot showing the button used to open a terminal window on Terra. An orange square highlights this button, which is below the Environment Configuration and Jupyter Cloud icons on the right-hand panel of any workspace page.

3. Run gcloud storage commands

From here, you can perform command-line tasks including gcloud storage, ssh, and ftp. The basic structure for the gcloud storage cp command to copy data is below.

To copy from the PD (home directory) to the Workspace bucket, use the command below.

gcloud storage cp example.file gs://my-bucket

To copy from the Workspace bucket to the PD, use:

gcloud storage cp gs://my-bucket/*.txt .

Use the full path to the workspace or external bucket, even if it is in the same workspace.You can find a file's full path by locating the file in your workspace's Data tab, hovering your mouse over the file name, and clicking on the clipboard icon that appears to the right of the file name. This will copy the file's full path to your clipboard.

For additional details on the gcloud storage cp command, see the Google documentation.

Example code (gcloud storage cp)

  • To upload the file "Example.bam" from the detachable Persistent Disk to the workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7":

    gcloud storage cp cp Example.bam gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7/

    Finding the full path to workspace bucket
    To find the full path to the workspace bucket, expand the Cloud Information section on your workspace's dashboard, then click the Clipboard icon in the right side of the Bucket Name.
    Screenshot showing the Cloud Information section on an example workspace's dashboard. An orange rectangle highlights the Bucket Name, which provides the full path to the workspace's bucket storage.

    Where will the copied files be stored?

    Once uploaded, you can see the file by clicking the "Files" icon in the workspace Data tab.

  • To copy the file "Example.bam" from the workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7" to the detachable Persistent Disk:

    gcloud storage cp gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7/Example.bam .

    Finding the full path to workspace bucket

    To find the full path to the workspace bucket, expand the Cloud Information section on your workspace's dashboard, then click the Clipboard icon in the right side of the Bucket Name.
    Screenshot showing the Cloud Information section on an example workspace's dashboard. An orange rectangle highlights the Bucket Name, which provides the full path to the workspace's bucket storage.

    Where will the copied files be stored?

    By default, this will copy the files to the notebook home directory in the Cloud Environment detachable Persistent Disk at /home/jupyter or /home/jupyter-user, depending on the age of the Persistent Disk. To find the name of the mount point, run !echo $HOME from within your notebook.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.