How to create a TDR dataset

Anton Kovalsky
  • Updated

Before you can ingest data into the data repo, you’ll need to create a Dataset into which you’ll ingest the data. Learn how in the step-by-step instructions below.

Remember to authorize Swagger every time you use itClick “Authorize” near the top of the page, check all of the boxes in the pop up and hit “Authorize” again. Then input the appropriate credentials to authenticate. Make sure you close the subsequent pop up without clicking the “Sign Out” button.

You should now be able to execute the commands below by clicking the “Try it out” button next to the command of your choice. For a more detailed description of this authentication step, see this article on Authenticating in Swagger.

All instructions related to Swagger require you to first authenticate yourself whenever you’ve opened a window with the Swagger UI.

Dataset creation - step-by-step instructions

Use the createDataset API endpoint.

createDataset parameters

  • You'll need at least one Billing profile ID, but you can include additional Billing profile IDs, if you want to allow for sharding file storage across billing accounts.
  • You can include the storage region for the dataset, if you want the data and metadata stored somewhere other than the default region.
  • You'll need to work out the code for your schema so that you can nest that code in the "schema" parameter. See How to configure a dataset schema for detailed instructions and a simple example of a schema .JSON.

createDataset request body

{
"cloudPlatform": "gcp",
"name": "dataset_name"
  "region": "us-central1"
"description": "string",
"defaultProfileId": "/* the profile id you generated when you created your billing profile */",
"schema": { /* A schema model such as the schema shown in this article*/ }
}

Tracking your Dataset creation and retrieving its information

Successfully submitting your request to create the dataset is also called successfully submitting a "job".

Successful submissions: What to expect

You'll see a response code below the "Execute" button (successful response codes are codes 200-202), and this response code will contain an "id" field . This is the job's ID, and you can use it to track the completion of this API request. The same is true for many other types of tasks done via the API - they launch jobs, and those jobs have their own job IDs. The progress of any such job can be tracked using the retrieveJob API endpoint in the Jobs section of the Swagger page.

2021-09-21_06-41-12.png

Once the job has finished running, you can use the retrieveJobResult endpoint in the repository section to retrieve the job’s information. If the job failed, the returned result will describe the errors that caused the failure. If the job succeeded, the result will describe the new TDR dataset. The “id” field of this result is the UUID of the dataset and this is a required parameter in all future API calls affecting the new dataset.

Finding the dataset's unique UUID

You may find it convenient that the UUID, which is unique to any given dataset, can also be found in the URL bar when you're viewing the data set through the Data Repo UI at data.terra.bio:

2021-09-21_06-51-01.png

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.