gcloud storage tutorial

Yashasvika Duggal
  • Updated

 

Learn how to use gcloud storage to manage buckets and objects in Terra and Google Cloud Storage. The Python command line tool gcloud is part of the gcloud shell scripts and is fully open sourced on GitHub and under active development. 

For hands-on practice, see the gscloud storage tutorial workspace

Overview: gcloud storage in a nutshell

gcloud is a useful python tool for navigating and managing Google Cloud Storage including dedicated Terra workspace buckets. gcloud storage allows users to interact with the Google Cloud from the terminal on their local machine or in a workspace.

Tasks in Google Cloud you can do with gcloud storage

  • Create and delete buckets
  • Upload, download, and delete objects
  • List buckets and objects
  • Move, copy, and rename objects.

1. Install/open gcloud SDK in a terminal

To use gcloud storage you'll need to start a Jupyter Cloud Environment in a Terra workspace or a python terminal on your local machine.

Note that for some tasks you may need to use a particular instance of the terminal. For example, when moving files from local storage to the cloud, you need to use a local terminal instance. 

Step-by-step instructions to set up gcloud

2. Set environment variables

Oftentimes, the code is cleaner and easier to work with if you set environment variables prior to running commands. Since the IDs for things like the workspace Google Project and Google Bucket are long and non-human-friendly, setting these variables will help avoid errors when executing gcloud storage commands. 

Variables you will need for this tutorial

  • BUCKET - the URI for workspace storage
  • PROJECT_ID - the workspace Google Project ID. 
  • 1. To find the workspace storage (i.e., Google bucket) ID, click on cloud information on the right hand side of the workspace dashboard. 

    bucket_URI.png

    2. Next to the Bucket Name is the full bucket URI. 

    Syntax for accessing resourcesgcloud uses the prefix gs://to indicate a resource in Cloud Storage.

    To make the bucket name functional as an address, you will need to add gs:// to the start of the bucket name. For example, gs://fc-392080b2-a7b1-40c8-9550-5c971be3f7e6.

  • 1. To find the Google Project ID of your workspace, click on Cloud Information in the workspace dashboard (right side).

    project-to-bill.png

    2. Copy your workspace Google Project ID, this is the billing ID associated with the workspace (used for accessing Requester Pays buckets in GCS). 

Run the following code to set the BUCKET and PROJECT_ID variables on the appropriate terminal (either in a Terra workspace or local machine).

BUCKET='gs://your-bucket-address'
PROJECT_ID='your_Google_project_ID'

Example: Setting environment variables (Python)

BUCKET='gs://fc-392080b2-a7b1-40c8-9550-5c971be3f7e6'
PROJECT_ID='terra-47b6f28c'

R syntaxRStudio uses different terminology to set variables. You can assign value to variables in R by using <- instead of =.

Built in gcloud help

If you ever need help while working with gcloud you can type the following command into your local machine. 

gcloud help

This will open up a list of all available commands as well as a brief description of their function. 

Reference this article for more help using your local machine. 

3. Run commands in cloud environment terminal

There are many possible gcloud storage commands. To practice the ones most used in Terra, follow the instructions below.

For a comprehensive list, see Google Cloud documentation

  • Step 1. Open a terminal configured to run gcloud SDK. This can be either your Terra workspace or local machine.

    Step 2. List files within your gcloud directory with the following command.

    gcloud storage ls $BUCKET

    Using environment variables

    Remember that you set the BUCKET variable to be the gcloud URI in the above section.

  • You can copy a file to your workspace Google bucket using the copy command. This command works in both python and R environments.

    Copying files in Python

    Step 1. To copy a file to your workspace Google bucket from your interactive analysis, run the following gcloud storage cp command.

    gcloud storage cp [file name] $BUCKET

    Copying files in RStudio

    When using R, you will need to adjust the code slightly to save and load R objects from the workspace bucket.

    Step 1. Run the following command on your R terminal.

    system('gcloud storage cp [file name] [destination]')

    Example code

    system('gcloud storage cp ubams.list gs://fc-392080b2-a7b1-40c8-9550-5c971be3f7e6 2>&1', intern = TRUE)

    Step 2. Verify that the file has downloaded to bucket by running the following.

    gcloud storage ls $BUCKET
  • Requester pays is a useful setting in Google Cloud Storage (i.e. Google buckets) that allows dataset owners to make data available without incurring data transfer fees when someone reads or copies data from a different region.

    To learn more, see Requester Pays buckets.

    Step 1. Find the Project ID of the workspace you are charging to from setting the variables (above)

    Step 2. In the workspace notebook, workflow, or on the local command line run the following code.

     gcloud storage -billing-project=$PROJECT_ID cp <gs://path/to/file> <destination>

    Requester pays caveats A note of caution: When you use the -billing-project= flag for data transfer, you are charging a workspace/project for the data transfer out, which is likely different from the workspace you are transferring data from. If you are mistaken about the bucket in question being a "Requester Pays" bucket and use this command, you may inadvertently charge the bucket owner for the data transfer.

    If Requester pays is turned on and you do not provide --billing-project flag, the command will fail.

  • You can save images and tables into a file using gcloud storage.

    For this command, you can use an image file you already have on your local machine or you can download and use the following image:
    cute_cat.png

    Step 1. Find the files you want to upload to your workspace Google bucket. If you're using the above photo, download to your local machine before moving onto step 2.

    Step 2. Run the following command in the terminal on your local machine.

    gcloud storage cp [file-name.png] $BUCKET

    Note: To save all images within a folder use the wildcard *.png.

  • The gcloud storage set metadata command allows you to set or remove metadata on objects.

    For this command you can use an image file you already have on your local machine or you can download and use the following image:
    cute_cat.png

    Step 1: Find the file path for the image you want to upload to your workspace bucket. If you are using the above photo, download to your local machine before moving onto step 2.

    Step 2: Upload the image to your workspace bucket using the gcloud storage cp command on your local terminal.

    gcloud storage cp [file path] $BUCKET

    Step 3: Move to your workspace terminal and enter the following command. When you have a large number of objects, use the gcloud copy -m to perform a parallel update:

    gcloud storage -m setmeta -h 'Content-Type:image/png' $BUCKET/[file name]
  • Step 1. You can to use the -m flag to copy the files in parallel.

    gcloud storage -m cp -R $BUCKET [local file path]

    Step 2. To to maximize parallelization by configuring thread count, use -o.

    gcloud storage -o ‘GSUtil:parallel_thread_count=1’ 
    -o ‘GSUtil:sliced_object_download_max_components=8’
    cp gs://[bucket URL]/[file name] [local file path]

How to find the URI for individual files in workspace Data tab

1. Click on the workspace Data page.

2. Go to the files icon in the bottom left. This will open a list of files in workspace storage (i.e., Google Bucket).

3. Double click any file to open file details which include gsutil information. Note that you will need to replace gsutil with gcloud storage to use the modern gcloud CLI.  

Demo walk-through

Large_GIF__1494x586_.gif

Copying to a different directory This text box contains the full gcloud storage cp command followed by a period (dot operator ".") in addition to the file URL. You will need to change directories if you want to copy this file somewhere else. 

4. Move local files (run commands in local terminal)

To transfer data to or from local storage always requires running gcloud atorage in a local terminal instance. 

Click to expand each section for the specific commands.

  • gcloud storage is particularly ideal for moving large files or large numbers of files. For smaller files it is much easier to upload files through the data tab in Terra.

    Step 1. Run the following command on the terminal of your local machine.

    Permissions requirementsYou must be an Owner or Writer of the workspace to upload data to the workspace.

    gcloud storage cp [local file path] $BUCKET

    Example code

    To upload a file "Example.bam.tsv" from local machine into our workspace bucket.

    gcloud storage cp /Users/yduggal/Documents/Example.bam.tsv gs://fc-392080b2-a7b1-40c8-9550-5c971be3f7e6Note: If you want to copy all files in the directory you can use the wild card * instead of a specific file.

    Step 2. Verify that the file has downloaded to bucket by running the following.

    gcloud storage ls $BUCKET
  • Often you will need to download data from a bucket to local machine.

    Step 1. Run the following code in your local terminal. This code is the reverse of uploading from local machine to the Terra Platform.

    gcloud storage cp $BUCKET/[file name] [local file path]

    Make sure to leave a space before entering the [local file path].

    gcloud storage cp gs://fc-392080b2-a7b1-40c8-9550-5c971be3f7e6/ubams.list Users/Documents

    downloading_from-Terra-to-localmachine.png

    If you're downloading folders, you'll need to use the -R flag to copy the folder and its contents

    gcloud storage cp -R $BUCKET [local file path]

Practice workspace

Getting comfortable with gcloud storage can take some time and practice. To practice moving around buckets and data within the workspace and local machine, try the gcloud storage tutorial workspace

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.