Part 3: Export AnVIL data to Terra for analysis

Allie Cliffe
  • Updated

Learn how to analyze data you've found through the AnVIL Data Explorer in your Terra workspace by exporting the data from the AnVIL Data Explorer to Terra.

For an overview of the process, see Finding and using AnVIL data

Before you can work with AnVIL data on Terra, make sure you have set up billing and data access. See Part 1: Set up billing and data access in Terra

To learn how to find the data you need, see Part 2: Search and select data in the AnVIL Data Explorer

Data exported from the AnVIL Data Explorer to a Terra workspace includes

  • The dataset tabular data (e.g., phenotypic data, sample and file metadata, etc.)
  • GA4GH DRS URIs that refer to the file data

The contents of large data files are not transferred. Instead, DRS URIs metadata (in the tables) are used to access the file data on demand while performing analysis in Terra.

Support for exporting finer-grained selections is being developed

Currently, finer-grained export selections may result in exports that do not include all of the expected/necessary tabular data.

Requirements to export all the tabular data for the desired datasets1. Select only Dataset filter facets (on the main page)
2. During the export process, select all Organism Types and all File Formats (on the Analyze in Terra page).

This will ensure all available tabular data for the dataset will be exported to Terra, including information such as a pedigree table and other information that may be important for analyzing the data.

3.1. Export to a Terra workspace 

1. When you are ready to export the data, click the Export button at the top right of your screen from either the AnVIL Data Explorer’s main page or from within the dataset description page.

To export all tabular data, ensure only the Dataset filter facet is selected.

AnVIL-Data-Explorer_Screenshot-of-Explore-data-page-with-the-AnVIL-1kg_high-coverage-dataset-selected-and-an-arrow-pointing-to-the-Export-button-at-the-top-right.png

2. Clicking this button will take you to a window where you can export to a Terra workspace by clicking the Analyze in Terra button (circled in orange in the screenshot below).
AnVIL-Data-Explorer-Guide_Screenshot-of-Choose-export-method-screen-with-option-to-export-study-data-and-metadata-to-Terra-workspace-circled.png

What to expect

After selecting Analyze in Terra, the Export to Terra page will appear.

3. To export all tabular data for the selected dataset(s), make sure to select all the Organism Type and File Format checkboxes (circled in screenshot below) and then click the Request Link button (indicated with an orange arrow). 
AnVIL-Data-Explorer-Guide_Screenshot-of-Export-to-Terra-page-with-circle-around-organism-type-and-file-format-checkboxes-all-checked-and-arrow-pointing-to-the-Request-Link-button.png

What to expect

It may take a few minutes while TDR prepares the data for export. When the exported data is ready, you'll see a page with the Open Terra button (below). Clicking this button will display a workspace selection screen in Terra, where you can select the workspace to import this data.

The AnVIL Data Explorer export process will transition to the Terra import screen (screenshot below). You can choose to export to an existing workspace or create a new one. When importing from an NIH data repository, you'll see security-related information on the left side of the window.

AnVIL-Data-Explorer-Guide_Screenshot-of-import-data-to-a-workspace-screen.png

3.2. Working with NIH Data in Terra caveats

Workspace requirements for protected data

When working with NIH data in Terra, we require users to import data to workspaces with additional security monitoring enabled. Additionally, an Authorization Domain may be applied and is highly recommended if working with controlled access data.

Note that an Authorization Domain can be set only at the time the workspace is created. It cannot be added to or removed from the workspace later. For more information, see Overview: Managing access to controlled data with Authorization Domains.

Selecting the workspace

On the right side, you'll see a workspace selection screen where you can either choose to use an existing workspace or create a new workspace to receive the data.

Starting with an existing workspace

If you choose to “Start with an existing workspace”, you can only select workspaces for which you have write access and which have the required security settings. If a workspace of interest is not listed, it means this workspace is non-compliant with the required settings. You may either select an existing workspace that is listed or create a new workspace with the required settings.

Because the structure of tabular data varies between AnVIL datasets, you should carefully consider the compatibility of the exported data with any data already in the workspace.

AnVIL-Data-Explorer-Guide_Screenshot-of-popup-to-choose-an-exiting-workspace.png

Creating a new workspace

If you choose to “Create a new workspace”, you'll see the import was recognized as coming from an NIH repository and the “Enable additional security monitoring” is already checked.

You can add an optional Authorization Domain, which is recommended for controlled data.

Screenshot of Create a new workspace popup with enable additional security monitoring checked and blank fields for the workspace name, billing project, description, and optional authorization domain

Avoid data access costs by using the us-central1 regionTo avoid data access costs, ensure the Bucket location is us-central1 and subsequent Interactive Analysis and Workflow analysis is performed in the us-central1 region. For additional information, see Customizing where your data are stored and analyzed.

Once you've completed this step, your workspace will spin up, and the import processing will begin. This may take several/many minutes, depending on the volume of data being imported.
You can go to the Data tab of your workspace to see the status of the import. To view the data as it is imported or when the import has finished, refresh the page in your browser.

Alternate Part 3: Download data for local analysis

Alternate instructions: Local analysis AnVIL strongly recommends analyzing data in a Terra workspace. If you must perform your analysis on local/institutional systems, you will need to download the data. Functionality for this is coming soon. Note that you will be charged for the Google data download costs.

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.