How to create dataset assets in TDR

Anton Kovalsky
  • Updated

Step-by-step instructions for creating dataset assets, the last step before subsetting data into snapshots. 

Overview

Once a dataset has been created and populated with ingested data, the last step before the data can be subsetted into snapshots is asset creation. The asset creation step is an access control that enables the data custodian to specify which columns from their dataset are available to individuals creating snapshots. Snapshots can only be created from datasets with at least one asset.

Screen_Shot_2021-09-22_at_8.42.32_PM.png

What to expect after creating assets

You'll be able to see a list of existing assets in the Terra Data Repo UI by going to a dataset, clicking the three-dash triangle logo near the top right of the screen, and then clicking "Create Snapshot". This will call up an "Assets" dropdown that lists all the assets available for that dataset. If a dataset has no assets, the "Create Snapshot" button will be greyed out.

Sep-22-2021_20-50-09.gif

When creating snapshots (i.e., subsets of data), anyone added to a dataset will always see all assets for which they are authorized to access.

Sharing data with snapshots

Once a dataset has an asset, a user who has access to that dataset can create a snapshot by selecting an asset and specifying which rows they want included in their snapshot.

Screen_Shot_2021-09-22_at_10.29.29_PM.png

To allow access to one asset but not another, you would create a snapshot with that asset. Including all of the rows shares the full asset without sharing any other assets. The only columns included in the snapshot will be those specified in the asset used to create the snapshot.

Two ways to create assets

Option 1. Use the addDatasetAssetSpecification API endpoint

Use the addDatasetAssetSpecification API endpoint with the .JSON code below as the request body in the API. The UUID for the dataset to which you wish to add this asset is specified in a separate field in this API.

Remember to authorize Swagger every time you use it This article includes instructions on using API commands through the Swagger UI. All instructions related to Swagger require you to first authenticate yourself whenever you’ve opened a window with the Swagger UI.

Instructions
Click “Authorize” near the top of the page, check all of the boxes in the pop up and hit “Authorize” again, and then input the appropriate credentials to authenticate. Make sure you close the subsequent pop up without clicking the “Sign Out” button.

You should now be able to execute the commands below by clicking the “Try it out” button. For a more detailed description of this authentication step, see this article on Authenticating in Swagger.

addDatasetAssetSpecification request body

{
"name": "asset3",
"rootTable": "table1",
"rootColumn": "column_1",
"follow": [],
"tables": [
{
"columns": [],
"name": "table1"
}
]
}

addDatasetAssetSpecification required parameters

  • rootTable: You must select a table from your dataset as the root table, even if you include multiple tables in the asset.
  • rootColumn: You must select a root column from the root table.
  • follow: This parameter lists any relationships present between tables. This parameter can be set to "empty" as shown above - but the parameter must still be included in this way. In the above example, no relationships are listed. 

Option 2. Include assets in your schema

You can create your datasets with assets already present. The article How to create a dataset in TDR outlines how to use the createDataset API, and the article How to create a dataset schema in TDR shows what the .JSON code for a schema looks like.

To create your dataset with the assets already present, include the JSON object (highlighted in the example below) as part of your schema at the same level as your "tables" and "relationships" objects.

Example schema JSON

"schema": {
 "tables": [{
"name": "table1",
"columns": [{
"name": "column_1",
"datatype": "string"
},
{
"name": "column_2",
"datatype": "fileref"
},
{
"name": "column_3",
"datatype": "fileref"
}
]
}],
"assets": [{
"name": "asset1",
"tables": [{
"name": "table1",
"columns": [
"column_1",
"column_2"
]
}],
"rootTable": "table1",
"rootColumn": "column_1"
},
{
"name": "asset2",
"tables": [{
"name": "table1",
"columns": [
"column_1",
"column_3"
]
}],
"rootTable": "table1",
"rootColumn": "column_1"
}]
}

When creating datasets with preinstalled assets, don't forget that each asset needs to have a non-null value for the "rootTable" and "rootColumn" parameters. The "follow" parameter is not required if you're doing it this way, but if you include relationships, you'll want your assets to follow any relationships between tables included in those assets.

To do that, add the "follow" parameter at the same level as the "rootTable" parameter, and set it with a list of relationships in square brackets (highlighted below):

"assets": [{
"name": "asset1",
"tables": [{
"name": "table1",
"columns": []
}],
"rootTable": "table1",
"rootColumn": "col1",
"follow": ["relation1", "relation2"]
}]
"relationships": [{
"name": "relation1",
"from": {
"table": "table1",
"column": "col1"
},
"to": {
"table": "table2",
"column": "col1"
}
},
{
"name": "relation 2",
"from": {
"table": "table1",
"column": "col2"
},
"to": {
"table": "table2",
"column": "col2"
}
}
]

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.