Moving data to/from a workspace Google bucket
FollowThere are several ways to add data to - or download from - your workspace Google bucket. This article outlines four. Which you will want to use depends on how many and what size files you are working with, whether you are moving to or from local storage - and how familiar you are with the different options.
Contents
Move files through the Terra interface
(for 1 to 10 files - only between local storage and workspace bucket)
Upload and download from a terminal with gsutil
(recommended for all transfers, but ideal for large file sizes or 1000s of files)
Move files from a Broad server to Terra using gsutil in a terminal
(recommended for all transfers, but ideal for large file sizes or 1000s of files)
Engage BITS to help upload large amounts of data
(Broad Institute community members only)
Upload and download through the Terra interface
- Recommended for small numbers (one to ten) of files
- Only for transferring between the workspace bucket and local storage
You'll start from the workspace Data page. Click on the File icon on the left side and follow the instructions below.
Upload files from local storage to the workspace bucket
1. Select the file icon on the left side of the screen: | 2. In the "files" section, click on the "+" button in the lower right corner to upload files to the dedicated workspace bucket: |
Download files from the workspace bucket to local storage
1. Find the file you want to download (note that you may have to navigate down many levels of file folders): |
2. The popup window offers multiple choices for copying the data, as well as the cost. |
|
|
Upload and download data files in a terminal using gsutil
- Works well for all transfers
- Ideal for large file sizes or 1000s of files
- Can be used for local transfers as well as between Google buckets
What is gsutil? |
|
gsutil is a Python application that lets you access Cloud Storage from
|
To move files between Google buckets using gsutil
Using the built-in workspace terminal
|
Using a terminal on your local machine
|
To move files to/from your local machine using gsutil
- Open a terminal on a local machine and set up gsutil (if not already installed)
- Run gsutil commands
1. Set up terminal
Start built-in workspace terminal (for transfer to/from local storage only)
>_
) icon (to the left of the play or pause button) and you'll be able to access what resembles a UNIX terminal.
You will need to start a cloud environment runtime first if one is not already running, as this is the virtual machine the terminal runs on as well.
From here, you can perform command-line tasks including gsutil.
Set up gsutil locally
-
Run the following command using bash shells in your Terminal:
curl https://sdk.cloud.google.com | bash
Or download google-cloud-sdk.zip or google-cloud-sdk.tar.gz and unpack it. Note: The command is only supported in bash shells. -
Restart your shell:
exec -l $SHELL
or open a new bash shell in your Terminal - Run
gcloud init
to authenticate
Before uploading/downloading data using gsutil, you can use the ls
command to look at the buckets you have access to:
- Run
gsutil ls
to see all of the Cloud Storage buckets under the workspace's project ID - Run
gsutil ls -p [project name]
to list buckets for a specific project
2. Run gsutil - syntax and examples
Once in a terminal (either on your local machine or in a Terra workspace), you can copy data from one place to another using the cp
command:
gsutil cp where_to_copy_data_from/filename where_to_copy data to
Note: you must be an Owner or Writer to upload to a Google bucket, including the workspace bucket!
Example: Copy from external bucket to workspace bucket
gsutil cp gs://My_GCP_bucket/Example.bam gs://fc-7ac2cfe6-4ac5-4a00-add1-c9b3c84a36b7
Finding the full path to workspace bucket
In Terra, you can find the full path to the workspace bucket by clicking the Clipboard icon in the right ride of the workspace Dashboard:
Example: Download data from workspace bucket to local storage
To download data from a bucket, reverse the order of the bucket URL and local file path:
gsutil cp [bucket URL]/[file name] [local file path]
Make sure to leave a space between the the bucket URL and the file path:
gsutil cp gs://WorkspaceBucket/gene_files/example.bam /Users/Documents
Finding the full path to the workspace bucket
In Terra, you can find the full path to the workspace bucket by clicking the Clipboard icon in the right ride of the workspace Dashboard:
Example: Download data from a requester pays bucket
gsutil -u [google-billing-project] cp gs://[bucket URL]/[file name] [local file path]
To learn more about accessing files from a requester-pays enabled Google bucket, see this article
Move files from a Broad server to Terra using gsutil in a terminal
We recommend this option for all transfers, but it's ideal for large file sizes or 1000s of files.
To initialize gsutil (for moving data from a Broad server only), first use the commands below:
- Run `ssh gsa5`
- Enter your password
- Run `use Google-Cloud-SDK` Note: you may see out of date messages. Don't worry about this.
- Run 'gcloud-init' to initialize
Before uploading/downloading data using gsutil, you can use the ls command to look at the buckets you have access to:
- Run
gsutil ls
to see all of the Cloud Storage buckets under your default project ID - Run
gsutil ls -p [project name]
to list buckets for a specific project
Upload
To upload data to a bucket run gsutil cp [local file path] [bucket URL]
(you must be an Owner or Writer of the workspace to upload).
The bucket URL is the path to your file or folder in the Google Cloud SDK. It will have the format gs://[bucket name] or, for folders within a bucket, gs://[bucket name]/[folder name]
For example, to upload a file "Example.bam" into the folder "gene_files" in a bucket:
gsutil cp /Users/Documents/Example.bam gs://WorkspaceBucket/gene_files
Download
To download data from a bucket, reverse the order of the bucket URL and local file path:
gsutil cp [bucket URL]/[file name] [local file path]
Make sure to leave a space between the the bucket URL and the file path:
gsutil cp gs://WorkspaceBucket/gene_files/example.bam /Users/Documents
Contact BITS for help with moving large amounts of data in and out of the cloud (Broad Institute community members only)
If you are a member of the Broad Institute community, BITS is a great resource to help migrate large amounts of on-prem data to the cloud in a cost-effective way! You can read more about their support offerings here.
Comments
5 comments
I got this error message: ServiceException: 401 Anonymous caller does not have storage.objects.list access to <my bucket link> while I was following the instructions for gsutil uploading.
I did authenticate my account and am able to see my buckets under google console. Is this related to service account?
Hi xiao li thanks for posting your question. May I ask, are you positive that the email you used to register for Terra has the necessary access to the bucket(s) in question? If it is not a bucket you created, you may need to contact the bucket's owner for them to add permission for you.
If you are certain that the permissions should all line up, you should submit a question to customer support (use the "contact us" link at the bottom of the main menu in the Terra interface) specifying the bucket and the email address you used to authenticate.
Hi Anton, the bucket was created by Terra when the workspace was created (I supposed it was created by the corresponding service account). I could see that bucket from my google cloud console, and I was able to upload my file directly from the google cloud's console. I used `gcloud auth login` from my console and followed the instructions for authentication, and switched to the correct project ID. Are there something I might miss doing this?
Hi xiao li, I would be happy to take a closer look at your case here. Can you create a new support request through the UI, or email support@terra.bio with details about which steps you are following on this article, as well as information about the workspace and bucket you are working with, and the email address you are using to authenticate? Please also share the workspace with GROUP_FireCloud-Support@firecloud.org if possible.
Sounds good! I will do that.
Please sign in to leave a comment.