Before you can ingest data into the data repo, you’ll need to create a Dataset into which you’ll ingest the data. Learn how in the step-by-step instructions below.
If you prefer to use Swagger, see (option 2) Create a TDR dataset with APIs. This might be a good option if you are comfortable with APIs and complex JSONs.
Start by logging into data.terra.bio and clicking the Create Dataset button.
Step 1. Submit dataset information
In the intro form, complete the required fields and drop-downs for the dataset.
- Name (note that the name can only include letters, numbers, and underscores)
- Cloud Platform
- Billing Profile (this is the name of the billing profile you created in the previous step)
- Region (note that Terra's default region is us-central1)
You can also add a description, designate stewards and custodians, and choose secure monitoring.
Step 2. Build schema in TDR
Instructions in the browser walk through creating the dataset schema, table-by-table.
2.1. Use the blue button on the left to create a table. Repeat for each table in your schema.
2.2. Use the second blue button to add columns to each table. You will select the column name and datatype. Repeat for each attribute (column) in each table.
Data types (TDR, BigQuery, Azure Synapse)
When creating a dataset in TDR, you will need to supply the data type for each column. Use the table below (click to expand) will help guide your choices.
Most TDR types “pass-through” to BigQuery types of the same name. A few extra types are supported by the TDR, either as a convenience or to add more semantic information to the table metadata.
TRUE and FALSE
Variable length binary data
4-digit year, 1 or 2-digit month, and 1- or 2-digit date
Note: Datetime and Time data types do not care about timezone. BQ stores and returns them in the format provided.
Note: TDR currently only accepts timestamps in timezone UTC. BQ stores this value as a long. In the UI, we do the conversion to UTC timestamp. However, the result from the previous data endpoint is a long value. If you are directly using our endpoint, you will have to perform this conversion to have an understandable value.
Format: YYYY-[M]M-[D]D[( |T)[H]H:[M]M:[S]S[.F]][time zone]
Float and Float64 point to the same underlying data types, so they are equivalent.
For very large float data or for data where calculations will be performed on the data.
Stores UUIDs that map to an ingested file. This is translated to DRS URLS on snapshot create.
2.3. Scroll down to the JSON view. It may be useful to copy the entire content (screenshot below) somewhere safe, for record-keeping.
You can also use this JSON as-is to create your dataset using APIs. See Create a TDR dataset (Swagger/API option).
2.4. Click the blue Submit button to generate your dataset.
What to expect
You will get a note if your dataset creation fails for any reason. The most often cause of failure is an incorrect attribute name (attributes can only contain lowercase letters and underscores). If your dataset creates successfully, you can move on to the next step, ingestion!!