Data tables: Additional resources

Allie Cliffe
  • Updated

If you're interested in using Terra on Azure, please email terra-enterprise@broadinstitute.org.

Read on for more technical details and back-end information about data tables in Terra.

Workspace Data Service (Terra on Azure) details

A database for organizing your tabular data

Data tables are managed by the Workspace Data Service. When you create a workspace, a new Workspace Data Service instance is created with its own database. This infrastructure provides an isolated and scalable solution for working with tabular data. 

Data table terminology: Terra on Azure versus Google Cloud

Feature

Terra on Azure (WDS)

Terra on GCP (Entity Service)

Table unit Record Entity
Data table Record type Entity table
Primary ID in a data table Left-most column of TSV or explicitly defined by titling a column with "sys_name" Left-most column of TSV, defined with “entity:entity_id”
Schema / Data Model Unopinionated FireCloud, Flexible
Table relations terra-wds:/table_name/row_name {"entityType":"table_name", "entityName":"row_name"}

Data types

A data type is an attribute associated with a piece of data that tells a piece of software how to interpret its value. Assigning and understanding data types ensures that data is collected in the preferred format and that the value is what you expect.

Columns of data in data tables in Terra on Azure are automatically assigned one of the supported data types below.

Data type

Definition

Format in WDS

Array format

Record ID  The primary key of the record. Used to retrieve and update records.  You may specify a column using “sys_name” as the header; otherwise, WDS will default to using the first column of the TSV. All cells in this column must be unique. Set table
String

A sequence of characters. 

The most commonly used data type to store text. 

To explicitly create a string in WDS with a value that would otherwise resolve to a relation, number, boolean, date, or datetime, wrap your string in double quotes. [foo,bar,"\"baz\" is the best"]
Number An integer Unquoted numbers [2,4,6]
Float A number that has a decimal place 0.2 [0.2, 0.3, 0.4]
Boolean Binary values (e.g., true or false) WDS can also interpret different casings such as "TRUE", "fALSE", "True" "[true, false, true]"
Date A date value in ISO-8601 format

YYYY-MM-DD

For instance: 2011-12-03

[YYYY-MM-DD, YYYY-MM-DD,
YYYY-MM-DD]

Datetime A combined date and time value in ISO-8601 format

YYYY-MM-DDTHH:MM:SS

For instance: 2011-12-03T10:15:30

[YYYY-MM-DDTHH:MM:SS, YYYY-MM-DDTHH:MM:SS]
Relation To relate one record to another

terra-wds:/{table-name}/{row-id}

For example: terra-wds:/sample/sample1

Arrays of relations must all relate to the same record type.

For example,
["terra-wds:/type/1", "terra-wds:/type/2"]

Array A list, or collection of similar types of data, in a specific order.

Represented as an array using JSON syntax. That is, include all array values as a comma-delimited list inside square brackets [].

Values inside an array can be any of the datatypes above: relation, number, boolean, date, datetime, or string.

n/a

Where are data types used?

Data types are important when using data tables for inputs to workflows in Terra. Often, WDL workflow authors will specify an input with a specific data type.

For example, in the workflow below, the SRA_ID variable expects a string data type, and the machine_mem_gb variable (which allows the user to specify a specific memory size for a machine) expects an integer.

Screenshot of integer- and string-type variables in the workflow configuration form

Screenshot of the workflow configuration pane listing SRA_ID variable with type string circled and machine_mem_gb variable with integer data type circled

Dealing with Ambiguous Data

Generating large and complex datasets, as well as combining datasets from various sources, can create ambiguity in datasets, such as a column of data in a single table with two different data types.

Workspace Data Service will import ambiguous data without error

Upon first and concurrent imports, if a column consists of two or more data types, WDS will generally default to assigning that column as a string datatype.

Importing a TSV into an existing table can change column data types

Currently, you can’t edit a datatype to change the default behavior (though support for changing a datatype is coming).

How to resolve ambiguous data

This is a common occurrence, and you may clean up or harmonize data outside of Terra in your favorite spreadsheet program or with custom notebooks or workflows in Terra.

Data tables in Terra on Azure versus GCPThis new service means that data tables in Terra on GCP are not compatible with Terra on Azure. Therefore, you cannot currently download a TSV from Terra on Google and simply upload it to Terra on Azure. Learn how to create and upload a TSV in Terra on Azure.

You may notice that data tables in Terra on Azure currently do not have all of the features available on GCP. Never fear! We are still iteratively building the new Workspace Data Service, working towards parity (and more!) between this new service and everything currently available in Google.

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.