Moving data between your Cloud Environment PD and workspace bucket

Allie Hajian
  • Updated

Learn how to move files between the two workspace storage options: the workspace bucket and Cloud Environment storage (Persistent Disk)) using gsutil command-line tools in the workspace terminal.

Data-transfer_Bucket-PD-local-storage_Diagram.png

Workspace storage: Workspace bucket versus PD

The Workspace bucket is dedicated Google storage created when you create a workspace.

  • It exists as a storage location independent of Compute Engine instances and persistent disks. 
  • It is accessible to anyone with workspace access permissions.
  • It is deleted when the workspace is deleted.

The Cloud Environment Persistent Disk (PD) is storage mounted to the the virtual system that runs an interactive analysis (i.e. Galaxy, Jupyter notebooks, or RStudio) on Terra.

  • Each user has a unique PD.
  • It is not accessible to collaborators, even in a shared workspace.
  • Created when the Cloud Environment is created.
  • Similar to USB storage on a personal computer, the PD can be detached and attached to another VM. Unless you intentionally delete it, it exists as long as you are a member of the billing project. If you intentionally delete it when you delete a Cloud Environment, a different PD is created when you create a cloud environment again. Any data that was on the persistent disk before deletion is lost.
  • PD exists in the cloud, but for security reasons is isolated from the rest of the cloud: no inbound communication is possible from the internet into the VM or associated PD storage. 

To learn more, see Terra architecture and where your files live in it

Reasons to copy/move data from the PD to the workspace bucket1. To run a workflow on data generated in an interactive analysis 

2. To allow a colleague to access generated data from an interactive analysis 
Because your Cloud Environment VM is unique to you, you will need to move generated data out of your PD and into the workspace bucket in order for your collaborators to have access.

3. To back up generated data from an interactive analysis in Terra 
This includes backing up on local storage or in an external bucket. Note that to backup to local storage, you will need to move the data from the workspace bucket to local storage using gsutil in a local terminal instance (eee Moving data between local storage and the workspace bucket).  

Reasons to copy/move data from Workspace bucket to the PDTo analyze generated data from a workflow (or uploaded from local storage) in an interactive app (Galaxy, Jupyter notebook or RStudio).

Overview: How to move/copy data

What to do
Because of the one-way nature of communication between the VM and the rest of the cloud, you will need to work from the VM terminal. You can push data from the PD to a workspace or external bucket or pull data from the workspace or external bucket into the PD.

How to do it
You'll use a command-line tool in the Cloud Environment terminal instance. There are many tools (gsutil, ftp, or ssh) built into the workspace terminal instance, including gsutil, sshe, ftp. See the step-by-step instructions below to copy/move files using gsutil.

You must be an owner or writer to upload to a Google bucket!This includes the workspace bucket (you must be a workspace owner or writer). 

Step-by-step instructions

1. Start the workspace terminal
Scroll to the top right corner of any workspace page and click on the (>_) icon (to the left of the play or pause button - see screenshot below) and you'll be able to access what resembles a UNIX terminal. The terminal instance will open in a separate browser tab.
Screenshot of runtime icon with terminal icon

Note that you will need to start the Cloud Environment first if one is not already running, as this is the virtual machine the terminal runs on as well.

Screen_Shot_2022-02-11_at_3.02.50_PM.png

2. Run gsutil commands
From here, you can perform command-line tasks including gsutil, ssh, and ftp. The basic structure for the gsutil cp command is below.

To copy from the PD (home directory) to the Workspace bucket, use the following command.

gsutil cp example.file gs://my-bucket

or to copy from the Workspace bucket to the PD, use

gsutil cp gs://my-bucket/*.txt .

Note that you need to use the full path to the workspace or external bucket, even if it in the same workspace.

For additional details on the gsutil cp command, see the Google documentation.

Example code (gsutil)

  • To upload the file "Example.bam" from the detachable Persistent Disk to the workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7":

    gsutil cp Example.bam gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7/

    Finding the full path to workspace bucket
    To find the full path to the workspace bucket, click the Clipboard icon in the right ride of the workspace Dashboard.
    Moving-data_Google-bucket_Screen_Shot.png

    Where will the copied files be stored?
    Once uploaded, you should be able to see the file by clicking the "Files" icon in the workspace Data tab.

  • To copy the file "Example.bam" from the workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7" to the detachable Persistent Disk:

    gsutil cp gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7/Example.bam .

    Finding the full path to workspace bucket
    To find the full path to the workspace bucket, click the Clipboard icon in the right ride of the workspace Dashboard.
    Moving-data_Google-bucket_Screen_Shot.png

    Where will the copied files be stored?
    By default, this will copy the files to the notebook home directory in the Cloud Environment detachable Persistent Disk at /home/jupyter or /home/jupyter-user, depending on the age of the Persistent Disk. To determine the name of the mount point, run !echo $HOME from within your notebook.

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.