How to transfer data between your Cloud Environment PD and workspace storage

Allie Hajian
  • Updated

Learn how to move files between the two workspace storage options: the workspace bucket and Cloud Environment storage (Persistent Disk)) using gsutil command-line tools in the workspace terminal.

Data-transfer_Bucket-PD-local-storage_Diagram.png

Workspace storage: Workspace bucket versus PD

The Workspace bucket is dedicated Google storage created when you create a workspace.

  • It exists as a storage location independent of Compute Engine instances and persistent disks. 
  • It's accessible to anyone with workspace access permissions.
  • It's deleted when the workspace is deleted.

The Cloud Environment Persistent Disk (PD) is storage mounted to the the virtual system that runs an interactive analysis (i.e., Galaxy, Jupyter Notebooks, or RStudio) on Terra.

  • Each user has a unique PD.
  • The PD is not accessible to collaborators, even in a shared workspace.
  • Created when the cloud environment is created.
  • Similar to USB storage on a personal computer, the PD can be detached and attached to another virtual machine (VM_. Unless you intentionally delete it, it exists as long as you are a member of the Billing project. If you intentionally delete it when you delete a cloud environment, a different PD is created when you create a cloud environment again. Any data that used to be on the persistent disk before deletion are lost.
  • PD exists in the cloud, but for security reasons is isolated from the rest of the cloud: no inbound communication is possible from the internet into the VM or associated PD storage. 

To learn more, see Terra architecture and where your files live in it

Reasons to copy/move data from the PD to the workspace bucket1. To run a workflow on data generated in an interactive analysis 

2. To allow a colleague to access generated data from an interactive analysis 
Because your Cloud Environment VM is unique to you, you need to move generated data out of your PD and into the workspace bucket for your collaborators to access.

3. To back up generated data from an interactive analysis in Terra 
This includes backing up on local storage or in an external bucket. Note: To back up to local storage, you need to move the data from the workspace bucket to local storage using gsutil in a local terminal instance (see Moving data between local storage and the workspace bucket).  

Reasons to copy/move data from Workspace bucket to the PDTo analyze generated data from a workflow (or uploaded from local storage) in an interactive app (Galaxy, Jupyter notebook or RStudio).

Overview: How to move/copy data

What to do
Because of the one-way nature of communication between the VM and the rest of the cloud, you need to work from the VM terminal. You can push data from the PD to a workspace or external bucket or pull data from the workspace or external bucket into the PD.

How to do it
You'll use a command-line tool in the Cloud Environment terminal instance. There are many tools (gsutil, ftp, or ssh) built into the workspace terminal instance, including gsutil, sshe, ftp. See the step-by-step instructions below to copy/move files using gsutil.

You must be an Owner or Writer to upload to a Google bucket!This includes the workspace bucket (you must be a workspace Owner or Writer). 

Step-by-step instructions

1. Start the workspace terminal
Scroll to the right of any workspace page and click on the (>_) icon and you can access what resembles a UNIX terminal. The terminal instance will open in a separate browser tab.
Screen_Shot_2022-07-29_at_10.53.45_AM.png

Note: You need to start a Jupyter Notebook Cloud Environment first if one is not already running, as this is the virtual machine the terminal runs on.

2. Run gsutil commands
From here, you can perform command-line tasks including gsutil, ssh, and ftp. The basic structure for the gsutil cp command is below.

To copy from the PD (home directory) to the Workspace bucket, use the following command.

gsutil cp example.file gs://my-bucket

or to copy from the Workspace bucket to the PD, use

gsutil cp gs://my-bucket/*.txt .

Note: You need to use the full path to the workspace or external bucket, even if it is in the same workspace.

For additional details on the gsutil cp command, see the Google documentation.

Example code (gsutil)

  • To upload the file "Example.bam" from the detachable Persistent Disk to the workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7":

    gsutil cp Example.bam gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7/

    Finding the full path to workspace bucket
    To find the full path to the workspace bucket, click the Clipboard icon in the right side of the workspace Dashboard.
    Moving-data_Google-bucket_Screen_Shot.png

    Where will the copied files be stored?
    Once uploaded, you can see the file by clicking the "Files" icon in the workspace Data tab.

  • To copy the file "Example.bam" from the workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7" to the detachable Persistent Disk:

    gsutil cp gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7/Example.bam .

    Finding the full path to workspace bucket
    To find the full path to the workspace bucket, click the Clipboard icon in the right side of the workspace Dashboard.
    Moving-data_Google-bucket_Screen_Shot.png

    Where will the copied files be stored?
    By default, this will copy the files to the notebook home directory in the Cloud Environment detachable Persistent Disk at /home/jupyter or /home/jupyter-user, depending on the age of the Persistent Disk. To find the name of the mount point, run !echo $HOME from within your notebook.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.