Broad Genomics: Downloading data from a Terra workspace

Allie Hajian
  • Updated

This document describes ways Broad users can retrieve data from Terra. Please refer to the documents for setting up a Google App Account (setting up a Google account with a non-Gmail address) and Terra Account if you have not yet registered on Terra. Your Broad Project Manager will notify you when your data is ready for download and will provide you with the name of the Terra workspace and backing Google bucket where you can find the data.

Moving data from Terra to an on-premises location using the Terra interface

Recommended for small numbers (1 to 10 files)

Follow the instructions in Moving data to/from a workspace Google bucket , in the section labeled Upload/download through the Terra UI .  If this is a Broad-owned Data Delivery workspace, all egress charges are covered by the Genomics Platform.  If you have any concerns, please discuss with your Project Manager.  

Moving files from a Broad server to Terra using gsutil in a terminal

We recommend this option for all transfers, but it's ideal for large file sizes or 1000s of files:

Upload commands

To upload (copy) data to a bucket run gsutil cp [local file path] [bucket URL] (you must be an Owner or Writer of the workspace to upload).

The bucket URL is the path to your file or folder in the Google Cloud SDK. It will have the format gs://[bucket name] or, for folders within a bucket, gs://[bucket name]/[folder name]

For example, to upload a file "Example.bam" into the folder "gene_files" in a bucket:

gsutil cp /Users/Documents/Example.bam gs://WorkspaceBucket/gene_files

Download commands (click to expand)

To download (copy) data from a bucket, reverse the order of the bucket URL and local file path:

gsutil cp [bucket URL]/[file name] [local file path]

Make sure to leave a space between the the bucket URL and the file path:

gsutil cp gs://WorkspaceBucket/gene_files/example.bam /Users/Documents

 

icon-warning2.png


DataShuttle is no longer supported

 

Please note that this application is not actively supported. As such, we are unable to troubleshoot any bugs or error messages associated with its use.

 

Contact BITS for help with moving large amounts of data in and out of the cloud (Broad Institute community members only)

If you are a member of the Broad Institute community, BITS is a great resource to help migrate large amounts of on-prem data to the cloud in a cost-effective way! You can read more about their support offerings here.

File validation / checksum generation

Per Google: At the end of every upload or download the gsutil cp command validates that the checksum it computes for the source file/object matches the checksum the service computes. If the checksums do not match, gsutil will delete the corrupted object and print a warning message. This very rarely happens, but if it does, please contact gs-team@google.com.

Troubleshooting

gcloud authorization error 

You may have trouble accessing your Terra workspaces if you have authorized your gcloud sdk installation with a Google Account that is not registered in Terra and applied to your workspace.  You can verify which Google Account you’ve authorized with gcloud by running the following command: gcloud auth list

    1. If the Google ID returned matches the one on your Terra workspace, you should be able to access your workspace.  If not, please contact your Project Manager.
    2. If the Google ID returned does not match the one on your Terra workspace, run the following command to specify the correct account:
      gcloud auth login [Google account]

gsutil errors on Unix

When working on a Unix system, you will need to to tell it not to try to start a browser. Then it gives you a url you can paste into your desktop browser. 

To tell the system not to start a browser, use the command

gcloud auth login --no-launch-browser

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.