How to create a table from scratch

Derek Caetano-Anolles
  • Updated

Once you have brought your own data into your Terra on Azure workspace blob storage or have URLs to Azure-hosted open-access data, you can organize it in a data table by generating and uploading a TSV. Read on for step-by-step instructions.

Step 1: Define your data model

Before importing tabular data into your workspace, it's helpful to think about how it is organized. This is especially true if you have several tables that may be related to each other. If you have only one type of data record and all your data can be kept in a single table, you can skip this step.

What is a data model?

The data model defines all the tables that hold the data, the types of data or metadata in each table, and the relationship between tables. You’ll design a data model that is most helpful for your unique analysis.

“Un-opinionated” data model

To accommodate diverse datasets in Terra, the Workspace Data Service uses an un-opinionated data model. This means you can name and organize your tables and data within tables however makes sense to you. 

Data model structure

  • One TSV row per data table row
  • One TSV column per data table column
erra-on-Azure_Data-model_diagram.png In the example data model at left, data is organized into three tables that are related to one another through a unique ID. Below, we share some examples of what types of data might be in each table (i.e., column headers).

Participant table

  • ParticipantID
  • Phenotype (i.e., DiseaseType)
  • Demographics (i.e., Age, SexAtBirth, etc.)

Sample table

  • SampleID
  • ParticipantID (connects the sample to phenotypic data in the subject table)
  • SequencingMethod
  • ReadDepth
  • FASTAfile (link to a file in your workspace cloud storage)

Quality Control table

  • Quality ControlID
  • SampleID (connects the QC info to the right sample)
  • QCMetrics

Step 2: Generate a TSV file

Data tables are like spreadsheets built into your workspace, so it’s not a surprise that you'll make a TSV in a spreadsheet editor to create a data table.

2.1. Start in your favorite spreadsheet editor.

2.2. Define column headers (data and metadata)
The first column in your spreadsheet will be your primary record identifier (for example, Sample). All rows in this column must be unique values.

Other columns can include metadata and links to where data live in your Azure storage blob.

2.3. Fill in data for each record (row)
Each row must have a unique record ID.

Large data files are referenced, rather than stored, in a table. To find the uniform resource identifier (URI) for files in cloud storage, follow the steps in Bring Your Own Data.

Metadata formatting for files in blob storage Azure file locations (URIs) are formatted as https://{storage-account}.blob.core.windows.net/{storage-container}/path/to/file.

Do not include the workspace storage SAS token in the URI because it is temporary 
Analyses in Terra will append SAS tokens on your behalf, as necessary. 

Template TSV spreadsheet

Click to download an example TSV spreadsheet with the proper formatting.

A reproduction of the data is below, if you would like to copy-and-paste it into your own spreadsheet.

SampleID Accession Database uBAM
SRR11059940 SRR11059940 SRA https://azurefeaturedworkspace.blob.core.windows.net/featuredworkspacedata/SRR11059940.bam

2.4. Once your spreadsheet is set up in your preferred editor, save your spreadsheet file in tab-separated values (TSV) or tab-delimited text format to your local machine. You can check that the TSV looks correct by opening it in a text editor (for example, BBEdit).

Step 3: Upload the TSV to your workspace to create a table

Once you clone a featured workspace or create a new workspace, Terra will automatically launch infrastructure for data tables.

Please note that it may take several minutes to requisition and set up the cloud resources for these. If you think something has gone wrong when launching your data table, you can use the “Troubleshoot” feature under the data tab.

Screenshot of the Data tab before any data has been uploaded, with a message stating that data tables are unavailable.

See Data tables: Additional resources for more details.

3.1. Once data tables are launched, you’ll see the active “import button” in the top left section of the Data page.

Screenshot of the Data tab before any data has been uploaded.

3.2. To upload the TSV, you can either click Import Data and select Upload TSV, or click the Upload TSV link below the Tables tab. 
Screenshot of the Data tab once the 'Import Data' button on the left hand side has been clicked. A dropdown menu will appear with the option to upload a TSV file.

3.3. In the popup window, you'll create a Table name and drag and drop a TSV file, or copy and paste the contents of a TSV.
A screenshot of the 'Import Data Tables' pop-up window in the center of the screen. The text fields include 'Table Name', the option to import a TSV file or paste in the text, and a button to 'Start Import Job'.

3.4. Once you click the Start Import Job button, your data table should appear.

You can use the data tables functionality to organize data, use data as an input to a workflow, and organize outputs from workflows.

Screenshot of the Data tab once a TSV has been imported. The Tables section on the left hand side of the screen is now populated with various tables called 'arrays', 'participant' and 'sample'. The 'participant' table is selected and shown.

When you upload a TSV file to a data table, that TSV is not stored in your workspace blob storageData tables in Terra on Azure are hosted by a private database that allows you to scale to tables of any size, and provides helpful features such as sort and search. See Data tables: Additional resources for more details on data tables and the Workspace Data Services infrastructure that powers them. 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.