How to create dataset assets in TDR

Anton Kovalsky
  • Updated

Step-by-step instructions for creating dataset assets, the last step before subsetting data into snapshots. 

Overview

Once a dataset has been created and populated with ingested data, the last step before the data can be subsetted into snapshots is asset creation. The asset creation step is an access control that enables the data custodian to specify which columns from their dataset are available to individuals creating snapshots.

Snapshots can only be created from datasets with at least one asset, and each snapshot contains one asset.

Diagram schematizing a TDR dataset asset. The dataset contains one data table with 5 columns. Blue rectangles highlight three of the columns, which make up the asset.

What to expect after creating assets

You'll be able to see a list of existing assets in the Terra Data Repo UI by going to a dataset, clicking the three-dash triangle logo near the top right of the screen, and then clicking "Create Snapshot". This will call up an "Assets" dropdown that lists all the assets available for that dataset. If a dataset has no assets, the "Create Snapshot" button will be greyed out.

Screen recording showing how to begin creating a snapshot. Once you've created an asset, it will be visible from the 'assets' dropdown menu in the snapshot creation process.

When creating snapshots (i.e., subsets of data), anyone added to a dataset will always see all assets for which they are authorized to access.

Sharing data with snapshots

Once a dataset has an asset, a user who has access to that dataset can create a snapshot by selecting an asset and specifying which rows they want included in their snapshot.

Diagram schematizing a TDR dataset with an asset and a snapshot. The dataset consistent of one data table with 5 columns. Blue rectangles highlight 3 of these columns, which comprise an asset. Orange rectangles highlight individual rows (queries). Green rectangles highlight individual cells at the overlap between the asset and the queries -- these represent the contents of the snapshot.

To allow access to one asset but not another, you would create a snapshot with that asset. Including all of the rows shares the full asset without sharing any other assets. The only columns included in the snapshot will be those specified in the asset used to create the snapshot.

Two ways to create assets

Option 1. Use the addDatasetAssetSpecification API endpoint

Use the addDatasetAssetSpecification API endpoint with the .JSON code below as the request body in the API. The UUID for the dataset to which you wish to add this asset is specified in a separate field in this API.

Remember to authorize Swagger every time you use it See How to authenticate/troubleshoot Swagger for TDR for step-by-step instructions.

addDatasetAssetSpecification request body

{
"name": "asset3",
"rootTable": "table1",
"rootColumn": "column_1",
"follow": [],
"tables": [
{
"columns": [],
"name": "table1"
}
]
}

addDatasetAssetSpecification required parameters

  • rootTable: You must select a table from your dataset as the root table, even if you include multiple tables in the asset.
  • rootColumn: You must select a root column from the root table.
  • follow: This parameter lists any relationships present between tables. This parameter can be set to "empty" as shown above - but the parameter must still be included in this way. In the above example, no relationships are listed. 

Option 2. Include assets in your schema

You can create your datasets with assets already present. The article How to create a dataset in TDR outlines how to use the createDataset API, and the article How to create a dataset schema in TDR shows what the .JSON code for a schema looks like.

To create your dataset with the assets already present, include the JSON object (highlighted in the example below) as part of your schema at the same level as your "tables" and "relationships" objects.

Example schema JSON

"schema": {
 "tables": [{
"name": "table1",
"columns": [{
"name": "column_1",
"datatype": "string"
},
{
"name": "column_2",
"datatype": "fileref"
},
{
"name": "column_3",
"datatype": "fileref"
}
]
}],
"assets": [{
"name": "asset1",
"tables": [{
"name": "table1",
"columns": [
"column_1",
"column_2"
]
}],
"rootTable": "table1",
"rootColumn": "column_1"
},
{
"name": "asset2",
"tables": [{
"name": "table1",
"columns": [
"column_1",
"column_3"
]
}],
"rootTable": "table1",
"rootColumn": "column_1"
}]
}

When creating datasets with preinstalled assets, don't forget that each asset needs to have a non-null value for the "rootTable" and "rootColumn" parameters. The "follow" parameter is not required if you're doing it this way, but if you include relationships, you'll want your assets to follow any relationships between tables included in those assets.

To do that, add the "follow" parameter at the same level as the "rootTable" parameter, and set it with a list of relationships in square brackets (highlighted below):

"assets": [{
"name": "asset1",
"tables": [{
"name": "table1",
"columns": []
}],
"rootTable": "table1",
"rootColumn": "col1",
"follow": ["relation1", "relation2"]
}]
"relationships": [{
"name": "relation1",
"from": {
"table": "table1",
"column": "col1"
},
"to": {
"table": "table2",
"column": "col1"
}
},
{
"name": "relation 2",
"from": {
"table": "table1",
"column": "col2"
},
"to": {
"table": "table2",
"column": "col2"
}
}
]

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.