Broad Genomics: Downloading data from a Terra workspace
FollowThis document describes ways users can retrieve data from Terra. Please refer to the documents for setting up a Google App Account (setting up a Google account with a non-Gmail address) and Terra Account if you have not yet registered for an account. Your Broad Project Manager will notify you when your data is ready for download and will provide you with the name of the Terra workspace and backing Google bucket where it is stored.
Contents
Downloading data to an on-premises location
- Via browser download (for 1 to 10 files)
- Using Terra DataShuttle (10s to 100s of files)
- Using Google’s gsutil command line tool
(recommended for all transfers, but ideal for large file sizes or 1000s of files)
Copying data to another Google bucket
- Using Terra DataShuttle (for 10s to 100s of files) Coming soon!
- Using Google’s gsutil command line tool
(recommended for all transfers, but idell for large file sizes or 1000s of files)
File validation/checksum generation
- Terra Data Shuttle
- Using gsutil
Troubleshooting - gcloud authorization error
Versioning
Downloading data to an on-premises location
Via Browser Download (for 1 to 10 files)
- Log in to Terra and navigate to your workspace
- Click on the “Data” tab and then the “sample” tab
- Any files available for download will be shown as a link in the relevant sample row:
4. Clicking on a file link will open a pop-up window describing the size and cost of the download. If this is a Broad-owned Data Delivery workspace, all egress charges are covered by the Genomics Platform. If you have any concerns, please discuss with your Project Manager.
5. Clicking on the “Download for $2.18” link will initiate the download. Note: This button starts the download immediately. You do not have another opportunity to verify before the download starts. However, you can cancel the download at any time during the process.
6. Repeat for any additional files you would like to download.
Via Terra DataShuttle (10s to 100s of files)
|
|
---|---|
Please note that this application is not actively supported. As such, we are unable to troubleshoot any bugs or error messages associated with its use. |
Terra DataShuttle is a GUI-based application that reads your workspace data from Terra to simplify the transfer of moderate numbers of files.
- Download and install the DataShuttle application for your preferred operating system: Windows (.exe), MacOS (.dmg), and Linux (.deb)
- Open the Terra DataShuttle application and login with the Google Account that you registered with Terra.
- Once authorized, the application will display a list of your Terra workspaces. Terra DataShuttle will default to the “Download” tab when first opened.
- Navigating into a sample’s result directory will allow you to select individual files for download or transfer. You may also choose to select one or more workspaces or workspace sub-directories. NOTE: The application will remember your selection after navigating into a directory and selecting a file or sub-directory and navigating back out to a higher level.
- You can initiate a download or clear your selections at any time using the buttons at the bottom right of the application window.
- Clicking “Download selection” will prompt you to choose an output directory on your local computer. You also have the option to preserve the existing folder structure. Unchecking this box will download all selected files to a single directory.
- Selecting a download directory and clicking “Start Download!” will transition you to the “Status” tab and begin transferring the selected files. A per sample transfer status is displayed along with an overall job status at the top of the application window.
Using Google’s gsutil command line tool
We recommend this approach for all transfers. It's ideal for large file sizes or 1000s of files.
- Follow Google’s installation instructions for Cloud SDK (which includes gsutil).
- Complete the gcloud authorization process, specifying the Google Account that was registered with Terra.
- If your Google Account is already associated with one or more Google Projects, you may be prompted to choose one or create a new one. Creating a Google Project is not required for accessing your data. If you are prompted, choose to create a project, respond with “n”. If your Google Account is not associated with a Google Project, you should not be asked to specify a project and can proceed to Step 4.
- Once you have completed the setup process, you can begin transferring data to your on-premises location using the following command:
gsutil -m cp -r [gs://sourcegooglebucket] [destination dir]
(example Terra bucket path: gs://fc-29b0585f-1010-484d-b4a4-ba07b924ab88)
This will start a multithreaded, recursive copy of your data to your specified location. - Additional details on the gsutil cp command can be found here.
Copying data to another Google bucket
Using Google’s gsutil command line tool
(recommended for all transfers, but ideal for large file sizes or 1000s of files).
First, install gsutil to your local computer. The Google Cloud SDK installation includes gsutil. To install Google Cloud SDK:
- Follow Google’s instructions for installing the Cloud SDK. The SDK contains Google’s gsutil tool which allows you to interact with data stored in Terra
- Complete the gcloud authorization process, specifying the Google Account that was registered with Terra.
- If your Google Account is already associated with one or more Google Projects, you may be prompted to choose one or create a new one. Creating a Google Project is not required for accessing your data. If you are prompted, choose to create a project, respond with “n”. If your Google Account is not associated with a Google Project, you should not be asked to specify a project and can proceed to Step 4.
-
You can run the following command using bash shells in your Terminal:
curl https://sdk.cloud.google.com | bash
Or download google-cloud-sdk.zip or google-cloud-sdk.tar.gz and unpack it. Note: The command is only supported in bash shells. -
Restart your shell:
exec -l $SHELL
or open a new bash shell in your Terminal. - Run
gcloud init
to authenticate.
Before copying data using gsutil, you can use the ls command to look at the buckets you have access to:
- Run
gsutil ls
to see all of the Cloud Storage buckets under your default project ID - Run
gsutil ls -p [project name]
to list buckets for a specific project - Once you have completed the setup process, you can begin transferring data to your Google bucket using the following command:
gsutil -m cp -r [gs://sourcegooglebucket] [gs://destinationgooglebucket]
(example Terra bucket path: gs://fc-29b0585f-1010-484d-b4a4-ba07b924ab88)
This will start a multithreaded, recursive copy of your data to your specified location.
Additional details and options for the gsutil cp command can be found here.
File validation / checksum generation in the Terra DataShuttle
Terra DataShuttle
The DataShuttle application does not include checksum validation.
Using gsutil
At the end of every upload or download the gsutil cp command validates that the checksum it computes for the source file/object matches the checksum the service computes. If the checksums do not match, gsutil will delete the corrupted object and print a warning message. This very rarely happens, but if it does, please contact gs-team@google.com.
Troubleshooting - gcloud authorization error
You may have trouble accessing your Terra workspaces if you have authorized your gcloud sdk installation with a Google Account that is not registered in Terra and applied to your workspace. You can verify which Google Account you’ve authorized with gcloud by running the following command: gcloud auth list
- If the Google ID returned matches the one on your Terra workspace, you should be able to access your workspace. If not, please contact your Project Manager.
- If the Google ID returned does not match the one on your Terra workspace, run the following command to specify the correct account:
gcloud auth login [Google Account]
Versioning
Version |
Date |
Description |
1.4 |
11/24/2020 |
Added note about DataShuttle |
1.3 |
05/19/2020 |
Reformatted |
1.2 |
05/31/18 |
Clarified DataShuttle instructions |
1.1 |
04/27/18 |
Various edits, added troubleshooting section |
1.0 |
03/12/2018 |
New document |
Comments
0 comments
Please sign in to leave a comment.