Creating and importing Snapshots

Anton Kovalsky
  • Updated

This article includes instructions on using API commands through the Swagger UI. All instructions related to Swagger require you to first authenticate yourself whenever you’ve opened a window with the Swagger UI. Click “Authorize” near the top of the page, check all of the boxes in the pop up and hit “Authorize” again, and then input the appropriate credentials to authenticate. Make sure you close the subsequent pop up without clicking the “Sign Out” button. You should now be able to execute the commands below by clicking the “Try it out” button next to the command of your choice. For a more detailed description of this authentication step, see this article on Authenticating in Swagger.

Screen_Shot_2021-09-23_at_9.16.08_PM.png


 

Snapshots are Terra's new way to streamline data delivery and sharing. Previously, data that isn't accessible either through Terra's data library or through one of our featured workspaces could only be shared if you had access to workspaces where that data was already staged, or by uploading metadata-containing spreadsheets from your hard drive to a workspace's data tab.

 

Now you can go to the Terra Data Repository (data.terra.bio), where data custodians (e.g. PIs, sequencing centers, research organizations, etc) can grant you access to large datasets from which you can create Snapshots. Provided the datasets have been set up by custodians to include assets that specify which data columns may be included in snapshots, users with access to datasets are able to create Snapshots - subsets of data from those datasets. 

 

Screen_Shot_2021-09-22_at_10.29.29_PM.png

 

Broadly speaking, there are two ways of creating Snapshots:

  • There is a straightforward way to this through the Terra Data Repo UI
  • You can also create Snapshots using the Terra Data Repo's Swagger API. This option has three advantages:
    • It gives you more granular control over how your Snapshot is created
    • You can use API calls programmatically if you want to automate regular/periodic Snapshot creation
    • Because the Terra Data Repo's UI is a beta-stage product, using the API may help you avoid any down time as we work out the kinks 

 

1. Creating Snapshots through the Data Repo user interface

In order to create Snapshots, you must first be granted access to a dataset in the Data Repo. If you go to the Data Repo homepage (data.terra.bio), you can see right up front some datasets to which you've been recently granted access. You can see a full list of datasets to which you have access by navigating to the "Datasets" tab.

 

1.1. Browsing datasets

Click on the name of a dataset to view the contents. You can toggle between whatever separate tables the dataset contains by using the dropdown menu bar near the top left of your screen:

 

Screen_Shot_2021-09-22_at_11.29.30_PM.png

 

1.2. Filtering data for your Snapshot

On the right edge of the screen, you can find the buttons that open widgets for dataset information, Snapshot creation, and sharing.

 

2021-09-22_23-36-53.png

 

 

Clicking the dashed triangle icon will open the widget for Snapshot creation, where you'll be able to select your desired data based on the available filters. Note that the search bar you see here is case sensitive, and only searches among the column names, not the metadata contained in the rest of the table.

 

Screen_Shot_2021-09-23_at_12.08.09_AM.png

 

1.3. Selecting an asset for your Snapshot

Once you've filtered your data for your desired rows, click "Create Snapshot" at the bottom the widget. This open a view that requires you name your Snapshot and select an asset. The asset specifies a set of columns from the dataset that will be included in your Snapshot.

 

Screen_Shot_2021-09-23_at_12.12.53_AM.png

 

If you're not sure which assets include which data, you could look it up using the retrieveDataset API endpoint. To use this API, you'll need to authenticate yourself on the Swagger page as described at the top of this article, and then click "Try it out" in the top right corner of the API endpoint to activate it. Once it's active, use the UUID for the dataset you're interested in as the input for the UUID field - you can find this UUID in the URL bar when you're looking at the dataset in the Data Repo UI - and select SCHEMA from the menu right beneath:

 

2021-09-23_00-23-56.png

 

Scroll down to where you can click execute, then scroll down to the response body and scroll through it until you see the "assets" section of the schema. You'll be able to see a list of assets and within that list each asset will also have a list of the names of the columns included in that asset.

 

2021-09-23_00-36-15.png

 

1.4. Creating the Snapshot

Once you've selected your asset and named your Snapshot, clicking "Next" will take you to the Data Release view, where you can add other Terra users (including groups) so that they'll be able to see the Snapshot under the "Snapshots" tab when they go to the Data Repo UI. Clicking "Release Dataset" will create the Snapshot. Once you've successfully completed this step, the Snapshot has been created, and you should be able to see it populated under the "Snapshots" tab (https://data.terra.bio/snapshots).

 

1.5. Exporting the Snapshot using the UI

Go to your "Snapshots" tab and click on the Snapshot you want to export. This will take take you to the page for the Snapshot, where you can click on "Export Workspace". If you're listed as a Steward for the workspace, you'll be able to manage access to the Snapshot from this page as well.

 

Screen_Shot_2021-09-23_at_1.02.17_AM.png

 

If you are the creator of the Snapshot, you're automatically granted the role of Steward, which allows you to add other users as either Stewards or Readers. A reader will see the Snapshot listed in their "Snapshots" tab, and also be able to export the Snapshot to their Terra workspaces, but they won't be able to use any of the access management options. Clicking "Export to Workspace" will take you to the Import Snapshot page, where you can either select a Terra workspace to which you have Owner or Writer access, or you can create a new workspace containing your Snapshot from this page as well.

 

1.6. Exporting the Snapshot using the API

You can use this Rawls API endpoint to import your Snapshot into a Terra workspace. You'll need the following three pieces of information:

  • Your workspace name
  • Your workspace namespace - This is actually just the Billing Project used to create the workspace, and can be easily identified as the thing to the left of the workspace name, separated by a slash "/" at the top of your screen when you're viewing your workspace
  • The Snapshot's UUID - The snapshotId is actually just the regex string at the end of the URL you see in the address bar when you're viewing that Snapshot on the Data Repo UI

Put the first two into the appropriate input fields in the API, and the Snapshot UUID into the snapshotId parameter in the request body of that API, as shown in this image:

 

2021-09-23_03-54-51.png

 

2. Creating a Snapshot using the Swagger API

There are several flavors of Snapshot creation when using API requests. While the UI option is nice especially for exploratory purposes, using the API can be an efficient way to build exactly the Snapshot you want. All of the instructions below apply to the use of the createSnapshot API endpoint, and all of these have the following prerequisites in common:

  • You'll need to have authenticated yourself in the Swagger UI as described at the top of this article
  • You need a Profile ID, which you get by generating a Spend Profile
  • You need to have a list of "readers" ready to create the Snapshot - this should be in the form of an  array of Terra identities (emails) that should be granted read-access to the Snapshot (the array can just be your email by itself, you can always add more readers later)

 

2.1. Creating a "full view" Snapshot

Often, the simplest and most convenient Snapshot to share is one that simply contains the entire dataset. This is known as "full view" mode. To create a Snapshot in this mode, use the .JSON below as an example for the code to use in the createSnapshot API request body:quest body:

 

{
   "name":"full_view_example_snapshot",
   "description":"full view snapshot of example DR Dataset",
   "profileId":"/*your Spend Profile ID*/",
   "readers":"<reader-email>",
  "contents":[
     {
         "datasetName":"tdr_example_dataset",
         "mode":"byFullView"
     }
  ]
}

 

Whoever was listed under the "readers" parameter should be able to see that Snapshot under the "Snapshots" tab.

 

2.2. Creating a Snapshot by row ID

Another way to create a Snapshot is to provide the Data Repo row IDs and columns names that should be included in the Snapshot, for every table that should be included. The row IDs can be obtained by querying BigQuery, for example:

SELECT datarepo_row_id
FROM `my-project.my-dataset.my-table`
WHERE column1 = "value"

 

To create a Snapshot in this mode, use the .JSON below as an example for the code to use in the createSnapshot API request body:

{
  "name": "my_row_id_snapshot",
"profileId": "/*your Spend Profile ID*/",
  "contents": [
    {
      "mode": "byRowId",
      "datasetName": "my_dataset",
      "rowIdSpec": {
        "tables": [
          {
            "tableName": "my_first_table",
            "columns": ["column1", "column2"],
            "rowIds": ["1111-2222-3333", "333-2222-1111"]
          },
          {
            "tableName": "my_second_table",
            "columns": ["column_a", "column_b"],
            "rowIds": ["AAAA-BBBB-CCCC", "CCCC-BBBB-AAAA"]
          }
        ]
      }
    }
  ]
}

 

2.3. Creating a Snapshot by SQL query

Defining a snapshot by inclusion criteria is obviously an attractive mode fo creating a Snapshot. This can be accomplished by converting a BigQuery-supported SQL query directly into a Snapshot. To create a Snapshot in this mode, use the .JSON below in the createSnapshot API request body:

 

{
  "contents":[
     {
        "datasetName":"encode",
        "mode":"byQuery",
        "querySpec":{
           "assetName":"default",
           "query":"SELECT encode.read_groups.datarepo_row_id FROM encode.bams WHERE encode.bams.create_date > '2021-05-06T04:00:00'"
        }
     }
  ],
  "description":"Encode Aug 2021 release”,
  "name":"encode_Aug2021_release",
  "profileId":"<uuid>",
  "readers":[
    “email1”, “email2”
  ]
}

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.