Read on for more technical details and back-end information about data tables in Terra.
Introduction to the Workspace Data Service (Terra on Azure)
Data tables are managed by the Workspace Data Service. When you create a workspace, a new Workspace Data Service instance is created with its own database. This infrastructure provides an isolated and scalable solution for working with tabular data.
A database for organizing your tabular data
Data table terminology: Terra on Azure versus Google Cloud
Terra on Azure (WDS)
Terra on GCP (Entity Service)
|Primary ID in a data table
|Left-most column of TSV or explicitly defined by titling a column with "sys_name"
|Left-most column of TSV, defined with “entity:entity_id”
|Schema / Data Model
A data type is an attribute associated with a piece of data that tells a piece of software how to interpret its value. Assigning and understanding data types ensures that data is collected in the preferred format and that the value is what you expect.
Columns of data in data tables in Terra on Azure are automatically assigned one of the supported data types below.
Format in WDS
|The primary key of the record. Used to retrieve and update records.
|You may specify a column using “sys_name” as the header; otherwise, WDS will default to using the first column of the TSV. All cells in this column must be unique.
A sequence of characters.
The most commonly used data type to store text.
|To explicitly create a string in WDS with a value that would otherwise resolve to a relation, number, boolean, date, or datetime, wrap your string in double quotes.
|[foo,bar,"\"baz\" is the best"]
|A number that has a decimal place
|[0.2, 0.3, 0.4]
|Binary values (e.g., true or false)
|WDS can also interpret different casings such as "TRUE", "fALSE", "True"
|"[true, false, true]"
|A date value in ISO-8601 format
For instance: 2011-12-03
|A combined date and time value in ISO-8601 format
For instance: 2011-12-03T10:15:30
|To relate one record to another
For example: terra-wds:/sample/sample1
Arrays of relations must all relate to the same record type.
|A list, or collection of similar types of data, in a specific order.
Represented as an array using JSON syntax. That is, include all array values as a comma-delimited list inside square brackets .
Values inside an array can be any of the datatypes above: relation, number, boolean, date, datetime, or string.
Where are data types used?
Data types are important when using data tables for inputs to workflows in Terra. Often, WDL workflow authors will specify an input with a specific data type.
For example, in the workflow below, the SRA_ID variable expects a string data type, and the machine_mem_gb variable (which allows the user to specify a specific memory size for a machine) expects an integer.
Screenshot of integer- and string-type variables in the workflow configuration form
Dealing with Ambiguous Data
Generating large and complex datasets, as well as combining datasets from various sources, can create ambiguity in datasets, such as a column of data in a single table with two different data types.
Workspace Data Service will import ambiguous data without error
Upon first and concurrent imports, if a column consists of two or more data types, WDS will generally default to assigning that column as a string datatype.
Currently, you can’t edit a datatype to change the default behavior (though support for changing a datatype is coming).
How to resolve ambiguous data
This is a common occurrence, and you may clean up or harmonize data in your favorite spreadsheet program or with custom notebooks or workflows in Terra.
Data tables in Terra on Azure versus GCPThis new service means that data tables in Terra on GCP are not compatible with Terra on Azure. Therefore, you cannot currently download a TSV from Terra on Google and simply upload it to Terra on Azure. Learn how to create and upload a TSV in Terra on Azure.
You may notice that data tables in Terra on Azure currently do not have all of the features available on GCP. Never fear! We are still iteratively building the new Workspace Data Service, working towards parity (and more!) between this new service and everything currently available in Google.