How to move data between local and workspace storage

Allie Hajian
  • Updated

Learn how to move files between local storage and your workspace bucket using gcloud storage command-line tools in a terminal running on your local system. Note: For small numbers of small files, you can do this in Terra. See How to move data to/from a Google bucket (workspace or external).

What is "local storage"?

Local storage can be your institutional High-Performance Substrate (HPS) or your personal computer. It is connected to the cloud, but cannot (usually) be accessed from outside the local system.

Diagram illustrating the relationships between local storage, and the cloud environment. A bidirectional arrow connects local storage and cloud storage, and this arrow is labeled 'terminal (local instance)'. Another bidirectional arrow connects cloud storage and the cloud environment, labeled 'cloud environment tools'.

Reasons to copy/move from local storage to workspace bucketTo analyze private data stored on a local system in Terra
 (copy/move data from local storage to your workspace bucket)

To back up generated data from an analysis in Terra
(copy/move from workspace bucket to local storage)

To learn more about the Terra ecosystem, see Terra architecture and where your files live in it.

How to move/copy small numbers of small files (in Terra) 

If you're moving small numbers (1-10) of small files, you can do so directly in Terra. This involves uploading a file from your local system (e.g., a laptop) to a Terra workspace, or downloading a file from a Terra workspace to your local system.

  • Upload files (local storage to workspace bucket)

    1. Start from the workspace Data page.

    2. Select the "Files" icon on the left side of the screen:
    Screenshot of the Data tab for an example Terra workspace. An orange rectangle highlights the Data tab and the Files section on the left-hand panel of the Data tab.

    3. In the "files" section, click on UPLOAD button.
    Screenshot showing the Upload button in an example Terra workspace.

  • Download files (workspace bucket to local storage)

    1. Start from the workspace Data page.

    2. Select the "Files" icon on the bottom of the left panel (underneath "Other Data").

    3. Find the file you want to download (Note: you may have to navigate down many levels of file folders) to access the file you want.

    4. Click on the file name; this will open a popup window, where you can click to download the file or copy a gcloud storage command to download it from your workspace's Terminal. You'll also see the cost associated with downloading the file.
    Download-file-popup-File-details_Screenshot.png

  • Download files (workspace data table to local storage)

    1. Navigate to the Data section of your workspace.

    2. Under the Tables section on the left-hand panel, click on the table that contains links to a file that you want to download.

    Any files available for download will be underlined and shown in blue.
    Download-files-from-table_Screenshot-of-table-with-files-available-for-download-underlined-in-blue-in-the-unmapped-bams-list-available.png

    3. Click on the file link; this will open a popup window, where you can click to download the file or copy a gcloud storage command to download it from your workspace's Terminal. You'll also see the cost associated with downloading the file.
    Download-file-popup-File-details_Screenshot.png

How to move/copy large numbers of large data files (gcloud storage CLI)

To move (or copy) large numbers of large data files between local storage and your Workspace bucket, use the command-line Python application gcloud storage cp in a terminal running on your local system.

See step-by-step instructions below for installing SDK (includes the gcloud storage library) locally and running commands to copy/move files.

You must have permission to upload to/download from a Google bucketThis includes the workspace bucket (you must be a workspace Owner or Writer).

For additional details on the gcloud storage cp command, see the Google documentation.

Step 1. Install SDK locally 

The instructions in this section rely on bash shells, which are most straightforward to run on Mac or Linux systems. See Google's documentation to learn how to install gcloud storage on other systems.     

1.1. Open a terminal running on your local system. Then, follow Google’s installation instructions for Cloud SDK - or the directions below - to install Google Cloud SDK, which includes gcloud storage.

1.2. Run the following command using bash shells in your Terminal.

curl https://sdk.cloud.google.com | bash

1.3. Open a new bash shell in your terminal, or restart your shell using this command:

exec -l $SHELL

1.4.Authenticate by running the following:

gcloud init

Verify gcloud storage installationBefore downloading data using gcloud storage; use the ls command to look at the buckets you can access.

Run gcloud storage ls to see all of the Cloud Storage buckets under the workspace project ID.

Run gcloud storage ls -p [project name] to list buckets for a specific project.

Once gcloud storage is installed, you can perform command-line gcloud storage tasks to move/copy files between local storage and your workspace bucket (or any external bucket).

Step 2. Run gcloud storage to copy/move files 

The basic structure for the gcloud storage cp command looks like this (Note: You need to use the full path to the Workspace or external bucket, i.e. gs://<bucket-path>):

gcloud storage cp <where_to_copy_data_from>/<file_name> <where_to_copy_data_to>  
  • 1. Locate the bucket name listed in the Cloud Information section on the right-hand side of your workspace's Dashboard. 

    Screenshot showing the Cloud Information section on an example workspace's Dashboard. An orange rectangle highlights the bucket name.

    2. Click the clipboard icon to the right of the bucket name to copy it to your clipboard.

    3. Add gs:// to the front of the clipboard content to use in the gcloud storage command.

  • 1. Go to Cloud Storage browser in Google Cloud console. 

    2. Click on the bucket link.

    Screenshot showing the bucket link to click on the Google Cloud Console.

    3.Click the clipboard icon (to the right of the bucket name) to copy the bucket path. 

    Screenshot showing the icon used to copy the bucket path on the Google Cloud Console.

    4. Add gs:// to the front of the clipboard content to use in the gcloud storage command.

Example code

The examples below illustrate how this code might look if you're uploading a file from your local system to the Cloud, or downloading from the Cloud to your local system. 

  • Copy a single file

    To copy the file "Example.bam" from your local storage system to the Workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7":

    gcloud storage cp <local_folder>/Example.bam gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7

    If you are in the directory/folder where the data are stored

    Note: You can find the local directory you are in by running the command pwd - which stands for "print working directory".

    gcloud storage cp example.bam gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7 

    To copy all the files in the directory 

    Use the wild card *.

    gcloud storage cp <local_folder>/* gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7

    Where will the copied files be?

    This will copy the files into the default directory. You can see the files by clicking the "Files" icon at the bottom left in the Data page, or the "Open in browser" link on the right in the Dashboard.

    In the Data pageScreenshot showing the files section of the Data tab on an example workspace. In the Dashboard
    Screenshot showing the link to open the workspace bucket in the google cloud console from the Cloud Information section of the workspace's Dashboard.
  • To copy a single file

    To download the file "Example.bam" from the workspace bucket "gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7" into your local system:

    gcloud storage cp gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7/Example.bam .

    The ' . ' at the end of the command means "here" (i.e. the home directory of the local system running the terminal). So the command is saying "copy example.bam from Google bucket fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7 to the directory I am in in my terminal."

    To copy all the files in the directory

    Use the wild card *.

    gcloud storage cp * gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7 .

    Where will the files be?

    You can find the local directory by running the command pwd - which stands for "print working directory".

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.