Set tables are a useful way to organize data when you want to group files for (repeat) analysis or if your analysis requires multiple input files to produce a single output. Learn when to use a set table - and how to create one - below.
Customizing your data tables: Records and sets
When you're creating a data table, what data you have, how it's currently organized, and how you plan to analyze the data downstream, will all impact the type and formatting of data tables you will set up.
Below are examples of custom tables and when you might use them.
Records and sets
There are two primary types of data tables in Terra: Record tables and set tables (sets of records).
Record tables versus set tables
- Record table
Each row includes a piece of data to analyze (samples, files, participants, specimens, etc.).
- Set table
Groups specific records from a record table together. Each row corresponds to one set and includes the unique IDs for the records in the set (from the record table).
When to use a record table
- When your data are logically organized by single records. Note: You can include an array in a cell, if you have multiple data files that are the same kind of metadata and are all associated with a single record.
- When you can run your analysis on single records (e.g., samples)
When to use a set table
- To organize and save a group of specific records for (repeat) downstream analysis
- To keep track of data files generated for a sample subset.
- When your workflow requires many data files (an array) to generate a single output
Example: How to create a set table (Terra on Azure)
Set tables always refer to records in a record table (i.e., a sample_set table references samples in a sample table; a specimen_set table references specimens in a specimen table). This means a set table can only be created after you've made and uploaded the record table it references.
The example below shows how to make a sample_set table, assuming a sample table already exists. The first column in the sample table is “sample_id”, which is the table’s primary key. These are unique identifiers for the contents of the table. The first column of the sample_set table is the unique ID for the set.
In the example below, the set table sample_set specifies all samples used to create a single jointly called Variant Call File (VCF). The first column is the unique ID for the set, the second column is the records in the set (from the record table).
1. Fill in the first two column headers in a spreadsheet editor: record_set and records. Note the "s" at the end of the second column header.
2. Give the first set a unique ID (first column).
3. Fill in the second column with an array of the records in the set following the formatting: [“terra-wds:/RecordTable/UniqueID1”, “terra-wds:/RecordTable/UniqueID2”, “terra-wds:/RecordTable/UniqueID3”]
A set table’s relation must reference the primary key of a record table.
4. Export/save the spreadsheet in “tab-separated values” (TSV) or “tab-delimited text" (TXT) format.
5. Open the data tab of your workspace. Remember that you must already have a record data table in your workspace that your set table will reference.
6. Click Import data and upload your Sample_set.tsv. You are able to name your table whatever you prefer; here, we choose to append “_set” to the record table name it references.
7. You'll see your set data table in the Tables section. The column with the array of relations will show the unique IDs of the records in the set from the record table.
Please sign in to leave a comment.