How to create assets for a dataset

Anton Kovalsky
  • Updated

Step-by-step instructions for creating dataset assets, the last step before subsetting data into snapshots. 

Overview

Once a dataset has been created and populated with ingested data, there is still one more step before the data can be subsetted into snapshots: asset creation. The asset creation step is an access control that enables the data custodian to specify which columns from their dataset are available for snapshot creation.

Screen_Shot_2021-09-22_at_8.42.32_PM.png

Only after at least one asset has been created for a dataset will it be possible to create snapshots from that database. Once assets exist for a dataset, you'll be able to see a list of those assets in the Terra Data Repo UI by going to a dataset, clicking the three-dash triangle logo near the top right of the screen, and then clicking "Create Snapshot". This will call up a view with an "Assets" dropdown that lists all of the assets available for that dataset. If a dataset has no assets, the "Create Snapshot" button will be greyed out.

Sep-22-2021_20-50-09.gif

Any user added to a dataset will always see all assets from which it's possible to make snapshots. If you have access to a dataset and want someone to have access to one asset but not another, the best thing to do is to create a snapshot with that asset. You can include all of the rows so that you share the full asset without sharing any other assets.

There are two ways to create assets

Once a dataset has an asset, a user who has access to that dataset can create a snapshot by selecting an asset and specifying which rows they want included in the snapshot. The only column that will be included in the snapshot will be those specified in the asset used to create the snapshot.

Screen_Shot_2021-09-22_at_10.29.29_PM.png

Option 1. Using the addDatasetAssetSpecification API endpoint

To use the addDatasetAssetSpecification API endpoint, you will use the .JSON code below as the request body in the API. The UUID for the dataset to which you wish to add this asset is specified in a separate box in this API.

Remember to authorize Swagger every time you use it This article includes instructions on using API commands through the Swagger UI. All instructions related to Swagger require you to first authenticate yourself whenever you’ve opened a window with the Swagger UI.

Instructions
Click “Authorize” near the top of the page, check all of the boxes in the pop up and hit “Authorize” again, and then input the appropriate credentials to authenticate. Make sure you close the subsequent pop up without clicking the “Sign Out” button.

You should now be able to execute the commands below by clicking the “Try it out” button next to the command of your choice. For a more detailed description of this authentication step, see this article on Authenticating in Swagger.

addDatasetAssetSpecification request body

{
"name": "asset3",
"rootTable": "table1",
"rootColumn": "column_1",
"follow": [],
"tables": [
{
"columns": [],
"name": "table1"
}
]
}

addDatasetAssetSpecification required parameters

  • rootTable: You must select a table present in your dataset as the root table, even if you include multiple tables in the asset.
  • rootColumn: You must select a root column from the root table.
  • follow: This parameter lists any relationships present between tables. In the above example, no relationships are listed - this parameter can be set to "empty" as shown above - but the parameter must still be included in this way.

Option 2. Including assets in your schema

You can create your datasets with assets already present. The article on dataset creation shows you how to use the createDataset API, and the article on setting up your schema shows you what the .JSON code for a schema looks like. If you want to create your dataset with the assets already present, simply include the .JSON object highlighted in the example below as part of your schema at the same level as your "tables" and "relationships" objects:

"schema": {
 "tables": [{
"name": "table1",
"columns": [{
"name": "column_1",
"datatype": "string"
},
{
"name": "column_2",
"datatype": "fileref"
},
{
"name": "column_3",
"datatype": "fileref"
}
]
}],
"assets": [{
"name": "asset1",
"tables": [{
"name": "table1",
"columns": [
"column_1",
"column_2"
]
}],
"rootTable": "table1",
"rootColumn": "column_1"
},
{
"name": "asset2",
"tables": [{
"name": "table1",
"columns": [
"column_1",
"column_3"
]
}],
"rootTable": "table1",
"rootColumn": "column_1"
}]
}

When creating datasets with preinstalled assets like this, don't forget that each asset needs to have a non-null value for the "rootTable" and "rootColumn" parameters. The "follow" parameter is not required if you're doing it this way, but if you include relationships, you'll want your assets to follow any relationships between tables included in those assets. To do that, add the "follow" parameter at the same level as the "rootTable" parameter, and set it with a list of relationships in square brackets:

"assets": [{
"name": "asset1",
"tables": [{
"name": "table1",
"columns": []
}],
"rootTable": "table1",
"rootColumn": "col1",
"follow": ["relation1", "relation2"]
}]
"relationships": [{
"name": "relation1",
"from": {
"table": "table1",
"column": "col1"
},
"to": {
"table": "table2",
"column": "col1"
}
},
{
"name": "relation 2",
"from": {
"table": "table1",
"column": "col2"
},
"to": {
"table": "table2",
"column": "col2"
}
}
]

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.