This article describes ways Broad users can retrieve data stored in a Workspace bucket in Terra.
If you have not yet registered for a Terra account, find step-by-step instructions in How to register for a Terra Account and Setting up a Google account with a non-Google email.
Your Broad Project Manager will notify you when your data are ready for download and will provide you with the name of the Terra workspace and Google bucket where you can find the data.
Transfer from Terra to an on-premises location (in Terra)
When to use the Terra interface for moving files
Recommended for small numbers (1 to 10 files), as you may experience time outs or delays for large files or large numbers of files.
What to do
Follow the instructions in Moving data to/from a workspace Google bucket (see the section labeled Upload/download through the Terra UI).
A caution about data transfer charges If this is a Broad-owned Data Delivery workspace, all data transfer out charges are covered by the Genomics Platform. If you have any concerns, please discuss with your Project Manager.
Transfer from Broad server to Terra (gcloud storage in a terminal)
When to use gcloud storage to move/copy files
We recommend this option for all transfers, but it's ideal for large file sizes or 1000s of files.
What to do
1. ssh login to UGER, the on-premises Broad cluster, following BITS instructions over VPN.
2. Once you start an interactive session, copy data from the cluster directory to the destination Google bucket following BITS instructions.
- To upload (copy) data to a bucket run
gcloud storage cp [local file path] [bucket URL]
(you must be an Owner or Writer of the workspace to upload).The bucket URL is the path to your file or folder in the Google Cloud SDK. It will have the format:
gs://[bucket name]
or, for folders within a bucket:
gs://[bucket name]/[folder name]
For example, to upload a file "Example.bam" into the folder "gene_files" in a bucket:
gcloud storage cp /Users/Documents/Example.bam gs://WorkspaceBucket/gene_files
-
To download (copy) data from a bucket, reverse the order of the bucket URL and local file path:
gcloud storage cp [bucket URL]/[file name] [local file path]
Make sure to leave a space between the the bucket URL and the file path:
gcloud storage cp gs://WorkspaceBucket/gene_files/example.bam /Users/Documents
DataShuttle is no longer supportedPlease note that this application is not actively supported. As such, we cannot troubleshoot any bugs or error messages associated with its use.
Contact BITS (large amounts of data; Broad Institute members only)
If you are a member of the Broad Institute community, BITS is a great resource to help migrate large amounts of on-premise data to the cloud in a cost-effective way! Read more about their support offerings here.
File validation / checksum generation
Per Google: At the end of every upload or download, the gcloud storage cp command validates that the checksum it computes for the source file/object matches the checksum the service computes. If the checksums do not match, gcloud will delete the corrupted object and print a warning message. This very rarely happens, but if it does, please contact gs-team@google.com.
Troubleshooting
See the tips below for help with common sources of error.
gcloud authorization error
You may have trouble accessing your Terra workspaces if you authorized your gcloud sdk installation with a Google Account that is not registered in Terra and applied to your workspace.
1. You can verify which Google Account you’ve authorized with gcloud by running the following command.
gcloud auth list
2. If the Google ID returned matches the one on your Terra workspace, you should be able to access your workspace. If not, please contact your Project Manager.
3. If the Google ID returned does not match the one on your Terra workspace, run the following command to specify the correct account.
gcloud auth login [Google account]
When working on a Unix system, tell it not to try to start a browser using the following command.
gcloud auth login --no-launch-browser
What to expect
It will return a url you can paste into your desktop browser.