Once you have brought your own data into your Terra on Azure workspace blob storage or have URLs to Azure-hosted open-access data, you can organize it in a data table by generating and uploading a TSV. Read on for step-by-step instructions.
Step 1: Define your data model
Before importing tabular data into your workspace, it's helpful to think about how it is organized. This is especially true if you have several tables that may be related to each other. If you have only one type of data record and all your data can be kept in a single table, you can skip this step.
What is a data model?
The data model defines all the tables that hold the data, the types of data or metadata in each table, and the relationship between tables. You’ll design a data model that is most helpful for your unique analysis.
“Un-opinionated” data model
To accommodate diverse datasets in Terra, the Workspace Data Service uses an un-opinionated data model. This means you can name and organize your tables and data within tables however makes sense to you.
Data model structure
- One TSV row per data table row
- One TSV column per data table column
|In the example data model at left, data is organized into three tables that are related to one another through a unique ID. Below, we share some examples of what types of data might be in each table (i.e., column headers).|
- Phenotype (i.e., DiseaseType)
- Demographics (i.e., Age, SexAtBirth, etc.)
- ParticipantID (connects the sample to phenotypic data in the subject table)
- FASTAfile (link to a file in your workspace cloud storage)
Quality Control table
- Quality ControlID
- SampleID (connects the QC info to the right sample)
Step 2: Generate a TSV file
Data tables are like spreadsheets built into your workspace, so it’s not a surprise that you'll make a TSV in a spreadsheet editor to create a data table.
2.1. Start in your favorite spreadsheet editor.
2.2. Define column headers (data and metadata)
The first column in your spreadsheet will be your primary record identifier (for example, Sample). All rows in this column must be unique values.
Other columns can include metadata and links to where data live in your Azure storage blob.
2.3. Fill in data for each record (row)
Each row must have a unique record ID.
Large data files are referenced, rather than stored, in a table. To find the uniform resource identifier (URI) for files in cloud storage, follow the steps in Bring Your Own Data.
Metadata formatting for files in blob storage Azure file locations (URIs) are formatted as
Do not include the SAS token in the URI because it is temporary
Analyses in Terra will append SAS tokens on your behalf, as necessary.
Template TSV spreadsheet
Click to download an example TSV spreadsheet with the proper formatting.
A reproduction of the data is below, if you would like to copy-and-paste it into your own spreadsheet.
2.4. Once your spreadsheet is set up in your preferred editor, save your spreadsheet file in tab-separated values (TSV) or tab-delimited text format to your local machine. You can check that the TSV looks correct by opening it in a text editor (for example, BBEdit).
Step 3: Upload the TSV to your workspace to create a table
Once you clone a featured workspace or create a new workspace, Terra will automatically launch infrastructure for data tables.
Please note that it may take several minutes to requisition and set up the cloud resources for these. If you think something has gone wrong when launching your data table, you can use the “Troubleshoot” feature under the data tab.
See Data tables: Additional resources for more details.
3.1. Once data tables are launched, you’ll see the active “import button” in the top left section of the Data page.
3.2. To upload the TSV, you can either click Import Data and select Upload TSV, or click the Upload TSV link below the Tables tab.
3.3. In the popup window, you'll create a Table name and drag and drop a TSV file, or copy and paste the contents of a TSV.
3.4. Once you click the Start Import Job button, your data table should appear.
You can use the data tables functionality to organize data, use data as an input to a workflow, and organize outputs from workflows.
When you upload a TSV file to a data table, that TSV is not stored in your workspace blob storageData tables in Terra on Azure are hosted by a private database that allows you to scale to tables of any size, and provides helpful features such as sort and search. See Data tables: Additional resources for more details on data tables and the Workspace Data Services infrastructure that powers them.
Please sign in to leave a comment.