Step-by-step instructions to access GTEx data for analysis in a Terra workspace or download locally. These instructions will work for AnVIL and Biodata Catalyst researchers working in Terra.
Overview
GTEx data will soon be available through the AnVIL Data Explorer (AnVIL users) or Seven Bridges (BioData Catalyst), with bulk downloads available with a CLI download tool. These are currently being prepared for availability.
In the interim, you can access GTEx data for analysis in Terra using DUOS (see instructions below). Additionally, the GTEx v10 data is available to download from the Terra workspace: AnVIL_GTEx_v10_hg38 (for GTEx v8 data, see AnVIL GTEx v8 hg38).
Access GTEx data for analysis in Terra using DUOS
Step 1: Request access on DUOS
1.1. Go to duos.org and click sign-up/sign in using MSFT or Google SSO.
1.2. Accept the Terms of service.
1.3. In the Researcher Console, select Your Profile under your name (top right).
1.4. Once you land on the profile page, add your full name and select your institution from the dropdown (start to fill in the name).
1.5. Link your eRA Commons ID. This will allow you to request access to controlled data.
You’ll be taken to the external NIH page to sign into your eRA Commons account.
1.6. Invite your signing official to duos.org to register and give you a library card. See How to pre-authorize researchers to submit Data Access Requests (DARs) in DUOS for instructions.
1.7. Once you receive a Library Card, login to the DUOS Researcher Console. You should see your library card in the Profile page.
1.8. Go to the Data Library (https://duos.org/datalibrary) and filter/search for the dataset you’re looking for.
1.9. Select the dataset of interest (circled in screenshot) and click apply for access at the bottom right.
1.10. Fill out your data access request form, with as much detail as possible.
1.11. Make sure to attest to all terms in the library card agreements (screenshot below) by clicking the blue Attest button.
1.12. Submit your DAR by clicking the blue Submit button (bottom left) beneath the addendum summarizing the requested datasets grouped by data use terms.
What to expect
It will take a few days to a few weeks for your DAR to be approved. You will get a notification in your email when you have access to the dataset.
Step 2: Export snapshot to Terra from the DUOS Library
2.1. Once your DAR is approved, return to the DUOS Data Library, where you should now see a blue Export link in the View by Dataset tab under Export to Terra (right column) .
2.2. Clicking the button should allow you to export to an existing Terra workspace or to create a new one.
2.3. Note that if you're accessing controlled data, DUOS will automatically enable additional security on your new or existing workspace.
What to expect
DUOS will import the data snapshot to a new or existing workspace.
It will take a few minutes to export the snapshot. You’ll get a green popup (upper right) when data is in your workspace.
Once you refresh your page, you’ll see the data tables containing all the snapshot data and metadata in the Data tab of your workspace. Note the security shield at the top right indicating additional security monitoring.
Download GTEx data to local machine
GTEx data, including controlled access data, can be downloaded from the Terra workspace GCP bucket using the CLI commands provided by Google (gsutil or gcloud storage).
Download caveats
- The bucket has requester pays enabled.
- You must be in the appropriate Authorization Domain to access these workspaces.
GTEx v10 details
- Workspace: AnVIL_GTEx_v10_hg38
- Bucket: fc-secure-e0503432-75b9-4674-8e6d-2597dc529c4c
GTEX v8 details
- Workspace: AnVIL_GTEx_v8_hg38
- Bucket: fc-secure-ff8156a3-ddf3-42e4-9211-0fd89da62108
Step-by-step instructions
1. Install the gcloud CLI
For detailed instructions, see How to install gcloud on a local machine.
2. Authenticate with Google
Set up user credentials with the Google user identity you use when logging in to Terra (described in the article above).
3. Select and download the desired files
- See How to move data to/from a Google bucket for detailed instructions.
- Because the bucket has requester pays enabled, you must provide a Google project to be billed (using the "--billing-project" or "-u" option), as described in How to access Requester Pays data/resources in Terra.