Creating a Dataset

Anton Kovalsky
  • Updated

This article includes instructions on using API commands through the Swagger UI. All instructions related to Swagger require you to first authenticate yourself whenever you’ve opened a window with the Swagger UI. Click “Authorize” near the top of the page, check all of the boxes in the pop up and hit “Authorize” again, and then input the appropriate credentials to authenticate. Make sure you close the subsequent pop up without clicking the “Sign Out” button. You should now be able to execute the commands below by clicking the “Try it out” button next to the command of your choice. For a more detailed description of this authentication step, see this article on Authenticating in Swagger.

Screen_Shot_2021-09-23_at_9.16.08_PM.png


 

1. Dataset creation

Before you can ingest data into the data repo, you’ll first need to create a Dataset into which you’ll ingest the data. This first step entails defining the structure of the data you’ll be ingesting by specifying the schema of the data.

The schema’s function is to outline the following:

  • The tables present in the dataset
  • The columns present in each table
  • Any relationships between those tables

Think of this step as creating a blank set of tables as a template for the data you’ll ingest later. In this step, you’ll specify the number and names of the data categories – the tables and columns within the tables – and also specify associations between columns in separate tables, if multiple tables contain the same data category (for instance, if you have a “patient” table and a “sample” table, and both tables contain a column the same set of patient IDs). This is the schema, and it’s defined using a .JSON format, which is inserted into the request body of the createDataset API endpoint as part of the "schema" parameter. 

 

{
"cloudPlatform": "gcp",
"name": "dataset_name"
  "region": "us-central1"
"description": "string",
"defaultProfileId": "/* the profile id you generated when you created your billing profile */",
"schema": { /* A schema model such as the schema shown in this article*/ }
}

 

Some notes on the parameters for this API:

  • You'll need at least one Billing profile ID, but you can include additional Billing profile IDs, if you want to allow for sharding file storage across billing accounts
  • You can include the storage region for the dataset, if you want the data and metadata stored somewhere other than the default region
  • You'll need to work out the code for your schema so that you can nest that code in the "schema" parameter. See this article on setting up your schema for detailed instructions and a simple example of a schema .JSON

 

2. Tracking your Dataset creation and retrieving its information

Successfully submitting your request to create the dataset is also called successfully submitting a "job".  You'll see a response code below the "Execute" button (successful response codes are codes 200-202), and this response code will contain an "id" field . This is the job's ID, and you can use it to track the completion of this API request. The same is true for many other types of tasks done via the API - they launch jobs, and those jobs have their own job IDs. The progress of any such job can be tracked using the retrieveJob API endpoint in the Jobs section of the Swagger page.

 

2021-09-21_06-41-12.png

 

Once the job has finished running, you can use the retrieveJobResult endpoint in the repository section to retrieve the job’s information. If the job failed, the returned result will describe the errors that caused the failure. If the job succeeded, the result will describe the new TDR dataset. The “id” field of this result is the UUID of the dataset and this is a required parameter in all future API calls affecting the new dataset. Note: You may sometimes find it convenient that UUID which is unique to any given dataset can also be found in the URL bar when you're viewing the data set through the Data Repo UI at data.terra.bio:

2021-09-21_06-51-01.png

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.