How to create snapshots in TDR

Anton Kovalsky
  • Updated

Step-by-step instructions for selecting the subset of data (a snapshot) in a TDR asset for analysis. 

Snapshots overview

Snapshots are Terra's new way to streamline data delivery and sharing. Previously, data that not accessible either through Terra's data library or a featured workspaces could only be shared if you had access to workspaces where that data was already staged, or by uploading metadata-containing spreadsheets from your hard drive to a workspace data tab.

The Terra Data Repository (data.terra.bio) allows data custodians (e.g. PIs, sequencing centers, research organizations, etc) to grant access to large datasets from which you can create Snapshots. Provided the datasets have been set up by custodians to include assets that specify which data columns may be included in snapshots, users with access to datasets are able to create Snapshots - subsets of data from those datasets. 

Screen_Shot_2021-09-22_at_10.29.29_PM.png

Two ways to create Snapshots

Three advantages to creating snapshots using the Swagger API

  1. The API approach gives you more granular control over how your Snapshot is created
  2. You can use API calls programmatically to automate regular/periodic Snapshot creation
  3. Because the Terra Data Repo's UI is a beta-stage product, using the API may help avoid downtime as we work out the kinks 

Option 1. Create Snapshots in the Data Repository

To create Snapshots, you must first have access to a dataset in the Data Repo. You can see in the Data Repo homepage (data.terra.bio) some datasets to which you've been recently granted access. For a full list of datasets to which you have access, navigate to the "Datasets" tab.

1.1. Browse datasets

Click on the name of a dataset to view the contents. You can toggle between whatever separate tables the dataset contains using the dropdown menu bar near the top left of your screen:

Screen_Shot_2021-09-22_at_11.29.30_PM.png

1.2. Filter data for your Snapshot

On the right edge of the screen are buttons that open dataset information, Snapshot creation, and sharing widgets.

2021-09-22_23-36-53.png

Clicking the dashed triangle icon will open the Snapshot creation widget, where you'll be able to select your desired data based on the available filters. Note that the search bar is case sensitive, and only searches among the column names, not the metadata in the rest of the table.

Screen_Shot_2021-09-23_at_12.08.09_AM.png

1.3. Select an asset for your Snapshot

The asset specifies which columns from the dataset to include in your Snapshot.

Once you've filtered for your desired rows, click "Create Snapshot" at the bottom of the widget. You'll name your Snapshot and select an asset in the Add Details pane (screenshot below).

Screen_Shot_2021-09-23_at_12.12.53_AM.png

Checking which assets include which data

If you're not sure which assets include which data, you could look it up using the retrieveDataset API endpoint endpoint in Swagger.

Remember to authorize Swagger every time you use it This article includes instructions on using API commands through the Swagger UI. All instructions related to Swagger require you to first authenticate yourself whenever you’ve opened a window with the Swagger UI.

Instructions
Click “Authorize” near the top of the page, check all of the boxes in the pop up and hit “Authorize” again, and then input the appropriate credentials to authenticate. Make sure you close the subsequent pop up without clicking the “Sign Out” button.

For a more detailed description of this authentication step, see this article on Authenticating in Swagger.

To use this API, you'll need to authenticate yourself on the Swagger page, and then click "Try it out" in the top right corner of the API endpoint to activate it. Once it's active, use the UUID for the dataset you're interested in as the input for the UUID field - you can find this UUID in the URL bar when you're looking at the dataset in the Data Repo UI - and select SCHEMA from the menu right beneath:

2021-09-23_00-23-56.png

Scroll down to where you can click execute, then scroll down to the response body and scroll through it until you see the "assets" section of the schema. You'll be able to see a list of assets and within that list each asset will also have a list of the names of the columns included in that asset.

2021-09-23_00-36-15.png

1.4. Create the Snapshot

Once you've selected your asset and named your Snapshot, clicking "Next" will take you to the Data Release view, where you can add other Terra users (including groups) so that they'll be able to see the Snapshot under the "Snapshots" tab when they go to the Data Repo UI. Clicking "Release Dataset" will create the Snapshot.

What to expect

Once you've successfully completed this step, the Snapshot has been created, and you should be able to see it populated under the "Snapshots" tab (https://data.terra.bio/snapshots).

Option 2. Create a Snapshot using the Swagger API

While the UI option is nice - especially for exploratory purposes - using the API can be an efficient way to build exactly the Snapshot you want. There are several flavors of Snapshot creation when using API requests. 

All three use the createSnapshot API endpoint.

Remember to authorize Swagger every time you use it This article includes instructions on using API commands through the Swagger UI. All instructions related to Swagger require you to first authenticate yourself whenever you’ve opened a window with the Swagger UI.

Instructions
Click “Authorize” near the top of the page, check all of the boxes in the pop up and hit “Authorize” again, and then input the appropriate credentials to authenticate. Make sure you close the subsequent pop up without clicking the “Sign Out” button.

You should now be able to execute the commands below by clicking the “Try it out” button next to the command of your choice. For a more detailed description of this authentication step, see this article on Authenticating in Swagger.

Prerequisites

  • You need a Profile ID, which you get by generating a Spend Profile.
  • You need to have a list of "readers" ready - this should be in the form of an  array of Terra identities (emails) that should be granted read-access to the Snapshot (the array can just be your email by itself, you can always add more readers later).

Option 2.1. Create a "full view" Snapshot

Often, the simplest and most convenient Snapshot to share is one that contains the entire dataset. This is known as "full view" mode. To create a Snapshot in this mode, use the .JSON below.

createSnapshot API request body ("full view" snapshot)

{
   "name":"full_view_example_snapshot",
   "description":"full view snapshot of example DR Dataset",
   "profileId":"/*your Spend Profile ID*/",
   "readers":"<reader-email>",
  "contents":[
     {
         "datasetName":"tdr_example_dataset",
         "mode":"byFullView"
     }
  ]
}

Whoever was listed under the "readers" parameter should be able to see that Snapshot under the "Snapshots" tab.

Option 2.2. Create a Snapshot by row ID

Another way to create a Snapshot is to provide the Data Repo row IDs and columns names that should be included in the Snapshot, for every table that should be included.

The row IDs can be obtained by querying BigQuery

SELECT datarepo_row_id
FROM `my-project.my-dataset.my-table`
WHERE column1 = "value"

createSnapshot API request body (by row ID)

{
"name": "my_row_id_snapshot", "profileId": "/*your Spend Profile ID*/",   "contents": [     {       "mode": "byRowId",       "datasetName": "my_dataset",       "rowIdSpec": {         "tables": [           {             "tableName": "my_first_table",             "columns": ["column1", "column2"],             "rowIds": ["1111-2222-3333", "333-2222-1111"]           },           {             "tableName": "my_second_table",             "columns": ["column_a", "column_b"],             "rowIds": ["AAAA-BBBB-CCCC", "CCCC-BBBB-AAAA"]           }
      ]       }
  }   ] }

Option 2.3. Create a Snapshot by SQL query

Defining a snapshot by inclusion criteria is often an attractive mode for creating a Snapshot. To do this, you will convert a BigQuery-supported SQL query directly into a Snapshot.

createSnapshot API request body (SQL query)

{   "contents":[
     {
        "datasetName":"encode",
        "mode":"byQuery",
        "querySpec":{
           "assetName":"default",
           "query":"SELECT encode.read_groups.datarepo_row_id FROM encode.bams WHERE encode.bams.create_date > '2021-05-06T04:00:00'"
        }      }   ],   "description":"Encode Aug 2021 release”,   "name":"encode_Aug2021_release",   "profileId":"<uuid>",   "readers":[     “email1”, “email2”   ] }

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.