Broad Genomics: Downloading data from a Terra workspace

Allie Hajian
  • Updated

This article describes ways Broad users can retrieve data stored in a Workspace bucket in Terra.

If you have not yet registered for a Terra account, find step-by-step instructions in How to register for a Terra Account and Setting up a Google account with a non-Google email.

Your Broad Project Manager will notify you when your data are ready for download and will provide you with the name of the Terra workspace and Google bucket where you can find the data.

Transfer from Terra to an on-premises location (in Terra)

When to use the Terra interface for moving files
Recommended for small numbers (1 to 10 files), as you may experience time outs or delays for large files or large numbers of files.

What to do
Follow the instructions in Moving data to/from a workspace Google bucket (see the section labeled Upload/download through the Terra UI). 

A note about egress charges: If this is a Broad-owned Data Delivery workspace, all egress charges are covered by the Genomics Platform. If you have any concerns, please discuss with your Project Manager.

Transfer from Broad server to Terra (gsutil in a terminal)

When to use gsutil to move/copy files
We recommend this option for all transfers, but it's ideal for large file sizes or 1000s of files.

What to do
1. ssh login to UGER, the on-premises Broad cluster, following BITS instructions over VPN.

2. Once you start an interactive session, copy data from the cluster directory to the destination Google bucket following BITS instructions.

  • To upload (copy) data to a bucket run gsutil cp [local file path] [bucket URL] (you must be an Owner or Writer of the workspace to upload).

    The bucket URL is the path to your file or folder in the Google Cloud SDK. It will have the format:

    gs://[bucket name]

    or, for folders within a bucket:

    gs://[bucket name]/[folder name]

    For example, to upload a file "Example.bam" into the folder "gene_files" in a bucket:

    gsutil cp /Users/Documents/Example.bam gs://WorkspaceBucket/gene_files
  • To download (copy) data from a bucket, reverse the order of the bucket URL and local file path:

    gsutil cp [bucket URL]/[file name] [local file path]

    Make sure to leave a space between the the bucket URL and the file path:

    gsutil cp gs://WorkspaceBucket/gene_files/example.bam /Users/Documents

DataShuttle is no longer supportedPlease note that this application is not actively supported. As such, we cannot troubleshoot any bugs or error messages associated with its use.

Contact BITS for help with moving large amounts of data in and out of the cloud (Broad Institute community members only)

If you are a member of the Broad Institute community, BITS is a great resource to help migrate large amounts of on-premise data to the cloud in a cost-effective way! Read more about their support offerings here.

File validation / checksum generation

Per Google: At the end of every upload or download, the gsutil cp command validates that the checksum it computes for the source file/object matches the checksum the service computes. If the checksums do not match, gsutil will delete the corrupted object and print a warning message. This very rarely happens, but if it does, please contact gs-team@google.com.

Troubleshooting

See the tips below for help with common sources of error. 

gcloud authorization error 

You may have trouble accessing your Terra workspaces if you authorized your gcloud sdk installation with a Google Account that is not registered in Terra and applied to your workspace.

You can verify which Google Account you’ve authorized with gcloud by running the following command: gcloud auth list

1. If the Google ID returned matches the one on your Terra workspace, you should be able to access your workspace. If not, please contact your Project Manager.

2. If the Google ID returned does not match the one on your Terra workspace, run the following command to specify the correct account:

gcloud auth login [Google account]

When working on a Unix system, tell it not to try to start a browser. Then, it will give you a url you can paste into your desktop browser. 

To tell the system not to start a browser, use the command

gcloud auth login --no-launch-browser

 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.