How to move data between local and workspace storage (Google bucket)

Allie Hajian
  • Updated

Learn how to move files between local storage and your workspace bucket using gsutil command-line tools in a terminal running on your local system. Note that for small numbers of small files, you can do this in Terra. See instructions here

What is "local storage"?

Local storage can be your institutional HPS or your personal computer. It is connected to the cloud, but cannot (usually) be accessed from outside the local system.

Move-data-from-local-storage-to-bucket_Diagram.png

Reasons to copy/move from local storage to workspace bucketTo analyze private data stored on a local system in Terra
 (copy/move data from local storage to your workspace bucket)

To back up generated data from an analysis in Terra
(copy/move from workspace bucket to local storage)

To learn more about the Terra ecosystem, see Terra architecture and where your files live in it.

How to move/copy small numbers of small files (in Terra) 

Uploading in Terra Recommended for small numbers (one to ten) of small files

Only for transfers between workspace bucket and local storage (i.e. laptop)

Note that this is the sort of transfer you often see where you upload or download a file from the internet. Because your local storage has no cloud-native  "path", you can only transfer files stored on the system running your browser. 

  • Upload files (local storage to workspace bucket)

    1. Start from the workspace Data page.

    2. Select the "Files" icon on the left side of the screen:
    Screen_Shot_2022-07-29_at_11.19.07_AM.png

    3. In the "files" section, click on UPLOAD icon.
    Screen_Shot_2022-07-29_at_11.22.52_AM.png

  • Download files (workspace bucket to local storage)

    1. Start from the workspace Data page.

    2. Select the "Files" icon on the bottom of the left column (underneath "Other Data").

    3. Find the file you want to download (note that you may have to navigate down many levels of file folders) to access the file you want.
    Moving-data_File-icon-Screen_Shot.png

    4. Click on the file to download, this will open a popup window (screenshot below). The popup window offers multiple choices for copying the data, as well as the cost.
    Moving-data_File-details-modal.png

  • Download files (workspace data table to local storage)

    1. Click on one of the table tabs on the left side of the screen. Example below uses a tab labeled"sample".

    Any files available for download will be shown as a link in the relevant sample row.
    Files_to_download_Screen_Shot.png

    3. Click on a file link to open a pop-up window describing the size and cost of the download.
    Download_costs_Screen_Shot.png

    4. Click on the “Download for $0.56” link to initiate the download. Note: This button starts the download immediately. You do not have another opportunity to verify before the download starts. However, you can cancel the download at any time during the process. 

    5. Repeat for any additional files you would like to download.

How to move/copy large numbers of large data files (gsutil)

To move (or copy) large numbers of large data files between local storage and your Workspace bucket, you will use the command-line Python application gsutil in a terminal running on your local storage system.

See the step-by-step instructions below for installing gsutil locally and running commands to copy/move files. Note that you can move files in either direction (from local storage to the workspace bucket or from the workspace bucket to local storage).

You must have permission to upload to/download from a Google bucket!This includes the workspace bucket (you must be a workspace owner or writer).

For additional details on the gsutil cp command, see the Google documentation.

Step 1. Install gsutil locally 

First, open a terminal running on your local system. Then, follow Google’s installation instructions for Cloud SDK - or the directions below - to install Google Cloud SDK, which includes gsutil.

1.1. Run the following command using bash shells in your Terminal.

curl https://sdk.cloud.google.com | bash

Note: the command is only supported in bash shells.      

 Alternative: download google-cloud-sdk.zip or google-cloud-sdk.tar.gz and unpack it. 

1.2. Open a new bash shell in your terminal, or restart your shell using this command:

exec -l $SHELL

1.3. Authenticate by running the following:

gcloud init

Verify gsutil installationBefore downloading data using gsutil, you can use the ls command to look at the buckets you can access.

Run gsutil ls to see all of the Cloud Storage buckets under the workspace project ID.

Run gsutil ls -p [project name] to list buckets for a specific project.

Once gsutil is installed, you can perform command-line gsutil tasks to move/copy files between local storage and your workspace bucket (or any external bucket).

Step 2. Run gsutil to copy/move files 

The basic structure for the gsutil cp command looks like this (note that you need to use the full path to the Workspace or external bucket, i.e. gs://<bucket-path>):

gsutil cp <where_to_copy_data_from>/<file_name> <where_to_topy_data_to>  
  • 1. Click the Clipboard icon in the right side of the Dashboard

    Moving-data_Google-bucket_Screen_Shot.png

    2. Add gs:// to the front of the clipboard content to use in the gsutil command.

  • 1. Go to Cloud Storage browser in GCP console. 

    2. Click on the bucket link.

    Find-external-bucket-path-in-GCP-console_Screen_shot.png

    3.Click the clipboard icon (to the right of the bucket name) to copy the bucket path. 

    Copy-path-to-bucket-in-GCP-console_Screen_shot.png

    4. Add gs:// to the front of the clipboard content to use in the gsutil command.

Example code

Select the use-case below for the exact code you will run to download or upload.

  • Copy a single file
    To copy the file "Example.bam" from your local storage system to the Workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7":
    gsutil cp <local_folder>/Example.bam cp gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7

    If you are in the directory/folder where the data are stored
    Note that you can find the local directory you are in by running the command pwd - which stands for "print working directory".

    gsutil cp example.bam gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7 

    To copy all the files in the directory 
    Use the wild card *.

    gsutil cp * gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7

    Where will the copied files be?
    This will copy the files into the default directory. You will be able to see the files by clicking the "Files" icon at the bottom left in the Data page, or the "Open in browser" link on the right in the Dashboard.

    In the Data pageScreen_Shot_2022-07-29_at_11.19.07_AM.png In the Dashboard
    Move-data-local-storage_Open-in-browser-Dashboard-tab_Screen_Shot.png
  • To copy a single file
    To download the file "Example.bam" from the workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7" into your local system:
    gsutil cp gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7/Example.bam .

    The ' . ' at the end of the command means "here" (i.e. the home directory of the local system running the terminal). So the command is saying "copy example.bam from Google bucket fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7 to the directory I am in in my terminal."

    To copy all the files in the directory
    Use the wild card *.

    gsutil cp * gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7

    Where will the files be?
    You can find the local directory by running the command pwd - which stands for "print working directory".

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.