Moving data to/from the Cloud Environment detachable Persistent Disk

Allie Hajian
  • Updated

Learn how to move files between your workspace VM/PD and your workspace bucket using the gsutil command-line tools in the workspace terminal.

Data-transfer_Bucket-PD-local-storage_Diagram.png

What is the Cloud Environment "Persistent Disk"?

The Cloud Environment is the virtual system (compute and storage) that runs an interactive analysis on Terra. The detachable Persistent Disk is storage mounted to the VM that can be detached and attached to another VM. It is similar to USB storage on a personal computer. The PD exists in the cloud, but for security reasons is isolated from the rest of the cloud: no inbound communication is possible from the internet into the VM or associated PD storage. 

G0_tip-icon.png


Reasons to copy/move data to/from the PD and workspace bucket

  To run a workflow on data generated in an interactive (notebook) analysis 
(copy/move data from PD to workspace bucket)

To back up generated data from an interactive analysis in Terra 
(copy/move from PD to workspace bucket and then to local storage)


To learn more about the Terra ecosystem (infrastructure) and where your data live in it, see this article.

Overview: How to move/copy data

What to do
Because of the one-way nature of communication between the VM and the rest of the cloud, you will work from the VM terminal. You can push data from the PD to a workspace or external bucket or pull data from the workspace or external bucket into the PD.  

How to do it
You'll use a command-line tool in the Cloud Environment terminal instance. There are many tools (gsutil, ftp, or ssh) built into the workspace terminal instance, including gsutil, sshe, ftp. See the step-by-step instructions below to copy/move files using gsutil.

Step-by-step instructions

1. Start the workspace terminal Scroll to the top right corner of any workspace page and click on the (>_) icon (to the left of the play or pause button - see screenshot below) and you'll be able to access what resembles a UNIX terminal.

Screenshot of runtime icon with terminal icon

The terminal instance will open in a separate browser tab. Note that you will need to start the Cloud Environment first if one is not already running, as this is the virtual machine the terminal runs on as well.

Moving-copying-data-from-PD_PD-terminal_Screen_shot.png

2. Run gsutil commands
From here, you can perform command-line tasks including gsutil, ssh, and ftp. The basic structure for the gsutil cp command looks like this (note that you need to use the full path to the unstructured storage (workspace or external bucket), even if it is your workspace Google bucket):

gsutil cp <where_to_copy_from>/<file_name> <where_to_copy_data_to>
G0_warning-icon.png


You must be an owner or writer to upload to a Google bucket!

  This includes the workspace bucket (you must be a workspace owner or writer).

G0_tip-icon.png


For additional details on the gsutil cp command, click
here.

Example 1: Copy file from the PD to the workspace bucket

To upload the file "Example.bam" from the detachable Persistent Disk to the workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7":
gsutil cp Example.bam gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7/

Finding the full path to workspace bucket
To find the full path to the workspace bucket, click the Clipboard icon in the right ride of the workspace Dashboard: 

Moving-data_Google-bucket_Screen_Shot.png

Where will the copied files be stored?
Once uploaded, you should be able to see the file by clicking the "Files" icon in the workspace Data tab.

Example 2: Copy file from the workspace bucket to the Persistent Disk

To copy the file "Example.bam" from the workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7" to the detachable Persistent Disk:
gsutil cp gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7/Example.bam .

Finding the full path to workspace bucket
To find the full path to the workspace bucket, click the Clipboard icon in the right ride of the workspace Dashboard: 

Moving-data_Google-bucket_Screen_Shot.png

Where will the copied files be stored?
By default, this will copy the files from your notebooks directory in the Cloud Environment detachable Persistent Disk:

/home/jupyter-user/notebooks/

Copy/move to local storage (institutional HPC or personal computer)

When would you do this?
This is useful if you want to back up data generated in an interactive analysis (notebook) to your local storage.

How do you do this?
Because neither the Cloud Environment VM nor the local system have a cloud-native identification, you will need to do this in two steps:

1. First move from the VM/PD to workspace bucket
 Follow the directions in Example 1 above to copy/move to the workspace bucket. Remember to reverse the order in the gsutil command so the local path is first and the workspace bucket is second.

2. Move from workspace bucket to local storage
You will use gsutil running on your local system to pull data from the workspace bucket. Follow the directions in this article.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.