How to use the TDR support articles

Leyla Tarhan
  • Updated

The Terra Data Repository is designed to make it easier to share large datasets. You can choose between multiple tools to interact with TDR, depending upon your needs and background. This article guides you through which tools to use when, and where to find instructions for each step in your TDR journey.

Overview: using TDR

There are four main stages to using TDR, outlined below.

Different team members might work on each of these stages. For example, an administrator might set up the billing profile, a data manager might create the dataset and upload the data, a project lead might share the data, and an outside researcher might analyze the data on Terra. Therefore, it's useful to have multiple tools available when navigating through these steps.

Tools for using TDR

You can use three tools to interact with TDR:

1. TDR's user interface (UI). Logging into https://data.terra.bio/ brings you to a graphical interface where you can create a dataset, view your data, create snapshots to share data, and more. This makes it easy to examine your data, and it's especially useful for those who don't have a background using API endpoints. However, the website doesn't support as many functions as the Swagger APIs or Zebrafish.

2. Swagger API endpoints. TDR's full functionality is accessible through its Swagger API endpoints. This includes creating datasets, uploading data, creating and sharing snapshots, creating TDR billing profiles, managing permissions, creating assets, and checking the status of jobs launched from other interfaces. However, Swagger can be difficult to navigate if you're not already familiar with API endpoints.

3. Zebrafish. Zebrafish is a web-based graphical user interface that interacts with the Swagger APIs so that you don't have to. It offers richer functionality than the TDR UI, but less than the Swagger APIs: you can create a dataset, configure file references in your data, upload and modify data, and create snapshots. Zebrafish also handles file references in tabular data better than the Swagger APIs.

Zebrafish is only available for Terra-on-GCP but we're working on supporting Azure as well!

Deciding which tool to use

Which of these tools should you use to manage your TDR data? The answer may change depending upon what you’re trying to do in TDR – for example, whether you’re setting up billing or creating a dataset. 

The best tool for you will also depend upon the version of Terra that you’re using (backed by GCP or Azure) and your familiarity with APIs

Why does API familiarity matter?Some steps can be completed using a Swagger API endpoint or a user interface. In general, the Swagger APIs allow you to do more things in TDR; however, if you’re not already familiar with APIs, these can be a bit opaque. While some functions are only available through the Swagger endpoints, we recommend using the TDR UI or Zebrafish for functions that they support, unless you've worked with APIs before. 

The rest of this article will break down how to choose your tool for each stage of working in TDR.

Step 1. Setting up a TDR billing profile

The process of setting up a TDR billing profile depends upon whether you're using Terra-on-Google or Terra-on-Azure. See How to create a TDR Billing Profile (Azure) or How to create a TDR Billing Profile (GCP) for step-by-step instructions. 

Step 2. Creating a dataset and uploading data

Flow chart depicting how to decide which tool(s) to use to create a TDR dataset and upload data. The flow chart begins with the question, 'are you comfortable working with API endpoints?' If the answer is yes, the chart's next step is to write your dataset's schema in JSON, then create dataset with Swagger, then ingest data into the dataset with Swagger, then update the data with Swagger. If the answer is no, the chart asks another question: 'which version of Terra are you using?' If the answer is Azure, the chart's next step is to create a dataset and define schema through the TDR UI, then ingest data into the dataset with Swagger, then update the data with Swagger. If the answer is GCP, the next step is write your dataset's schema in JSON (including assets), then create a dataset and ingest data with Zebrafish, then update the data with Zebrafish.

Creating a dataset

Once you’ve set up billing and are ready to upload data to TDR, the next step is to define your dataset’s schema. The schema sets up your tables, their columns and primary keys, and the relationships between tables. Setting up your schema is crucial for updating the tables later on. Learn more about schemas in Overview: Defining your TDR dataset schema.

Creating a dataset through the UI vs. Zebrafish The benefit of creating your dataset in the TDR UI is that you can define the schema using a GUI, rather than working in JSON: you’ll type the names of your tables and their columns, and define their types with drop-down menus.  The downside is that you can’t upload or edit your data through the TDR UI, so you’ll then have to use the Swagger API endpoints to complete those steps. In contrast, if you use Zebrafish you’ll have to write your schema in JSON. But then you can create your dataset, upload your data, and update your data through a GUI.

Ingesting and updating data

Step 3. Sharing data

To share TDR data, create a snapshot — a subset of the rows in your dataset that you want to share with a particular researcher or group.

Flow chart illustrating how to decide which tool(s) to use to share TDR data. The diagram starts with the question, 'are you comfortable working with API endpoints?'. If the answer is yes, the next step is to add assets with Swagger, then create a snapshot with Swagger. If the answer is no, the next step is to include assets in your schema when creating your dataset, then create a snapshot through the TDR web interface.

Assets are a pre-requisite for creating a snapshot. Assets are subsets of the columns in your data that you want to be able to include in snapshots. Learn more about assets in How to create dataset assets in TDR.

Once you’ve created a snapshot, how do you decide who can access the data? See Streamlining access for approved requestors with DUOS & TDR to learn how to screen which researchers can access your TDR dataset.

Step 4. Analyzing data

See How to export a TDR snapshot and How to use TDR snapshots with workflows to learn how to import data from TDR into a Terra workspace and analyze it in a workflow.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.