How to make a data table from scratch or a template

Allie Cliffe
  • Updated

Workspace tables are much like a spreadsheet built into the data page. So it's no surprise that you can use a spreadsheet editor to create and upload a file to generate a new table in your workspace. This "load file" is in "tab separated values" or "tab delimited text" format and is called a TSV file in Terra. 

Step 1. Make a TSV file in a spreadsheet editor

Open your favorite spreadsheet editor and follow the formatting examples shown below to make an entity table or set table.

Click the table type below for templates, examples, and required formatting.

  • Download a template load file (sample.tsv)  here.

    What's in an entity table?

    Entity tables keep track of data - historically input data for a workflow, like samples, participants, specimens, or files. The minimum sample table includes an ID column and column for metadata or data files (i.e., FASTQ, BAM, CRAM, etc. - whatever form your data take).

    When creating a table, you can use whatever name you wish for your table. Note that Terra will assume the first column header is the name of the table. 

    Creating a "sample" entity table

    The following shows an example data table that will have the name "sample" when uploaded to the Terra Data page. Each table row represents a different sample. As with all Terra tables, the first column contains entity IDs. The second column shows example cloud paths to BAM files in a Google bucket. 

    Sample TSV in a spreadsheet

    sample_id BAM
    participant1-blood gs://your-bucket-name/blood_sample_P1.bam
    participant1-spit gs://your-bucket-name/spit_sample_P1.bam
    participant2-blood gs://your-bucket-name/blood_sample_P2.bam
    participant2-spit gs://your-bucket-name/spit_sample_P2.bam

    Formatting requirements Parts in red (i.e., "_id") are optional. Note that Terra will append a `_id` to the end of the first column header when importing a TSV.

    In the rows, you'll use your own sample IDs (i.e., "your-participant1-blood") and the complete paths of the data files.

    Sample table in Terra

    Screenshot showing an example of an entity table in Terra, where each row represents a single sample. The first column contains the samples' IDs. The second column contains the path to each sample's BAM file in a Google bucket.

  • Why use an entity_set table?

    A single data table might hold a mix of different samples you want to analyze. Samples may differ in species, developmental stages, sequencing methods, or other criteria. If you only want to run a workflow analysis on a subset of the samples in your data table, consider making a set table.

    A set table allows you to organize and save sets of samples for (repeat) downstream analysis and keep track of data files that are generated for a sample subset. Set tables always refer to entities in an entity table (i.e., a sample_set table references samples in a sample table; a specimen_set table references specimens in a specimen table). Therefore, a set table can only be created after you've made and uploaded the entity table it references. 

    The example below shows you how to make a sample_set table, assuming you already have a sample table.

    Option 1: Generate a set in Terra

    1. Select the rows from the entity table to be included in the set.

    2. Click Edit (pencil icon) at left above the table. 

    3. Choose Save selection as set from the menu.
    Screenshot showing how to create a set from the samples stored in an existing entity table in Terra. First, use the checkboxes to select the rows of the samples you want to include in the set. Then, click the edit button with the pencil icon and select 'save selection as set'.

    Option 2: Create a sample_set table in a spreadsheet

    Download a template TSV file (membership.tsv) here.

    The first column is the unique ID for each set and the second column is the sample_id of the sample in that set (from the sample table).

    There is a row for every member of a set. In the example below, the sample_set table contains two sets: spit (contains the samples participant1-spit and participant2-spit) and blood (contains the samples participant1-blood and participant2-blood).

    Sample_set TSV in a spreadsheet editor

    membership:sample_set_id sample
    spit participant1-spit
    spit participant2-spit
    blood participant1-blood
    blood participant2-blood

    Formatting requirements Parts in red (i.e., "membership:sample_set_id") must be entered exactly as shown! You can replace "sample" with the name of the table for which you're making sets.

    You can customize the set IDs with your own values.

    Note: You must have a corresponding entity table (i.e., a sample table - in the example above) in the workspace. It contains links to the input data files from the samples in the set.

    Sample_set table in Terra

    The samples in each set are listed in the samples column, separated by a comma.

    Screenshot showing an example entity set table in Terra. Each row corresponds to one set of samples from the 'sample' entity table. The first column contains the sets' ids. The second column lists the samples that belong to each set.

    Other ways to create setsIn addition to manually creating set tables, you can create a set table on the fly when you're setting up a workflow analysis. Learn more in When to use a set table for a workflow.

    For hands-on practice using set tables for workflow input, see the Data Tables QuickStart Part 3 and Part 4.

Step 2. Save file as tab delimited text or tab-separated value

A load file has to be in "tab-separated values" or "tab-delimited text" format (Terra recognizes both). 

Your editor may give you a warning, but we assure you, it's fine! 
Screenshot showing one of the acceptable formats for a load file. The two acceptable formats are tab-delimited text (.txt) and tab-separated values (.tsv). 

What's the name of the table in your workspace?

It's worth noting that Terra ignores the actual file name; it's the "root entity" (in the first column header) that determines the table name in the data table. For example, if you save your table using the name table1.txt but the table's first column is named entity:bam_id, the table will be named bam in Terra.

Step 3. Upload TSV to workspace

3.1. Click on the Import Data button at the top of the left TABLES column (highlighted in the orange rectangle in the screenshot below).
Screenshot of the Data tab of an example workspace. The image is annotated with an orange arrow to highlight the Import Data button and an orange box to highlight the Upload TSV option from the menu.

3.2. Select the file import tab.

3.3. Drag or click to select your TSV file (circled in orange).

Screenshot of the window used to upload a load file to create a table. The image is annotated with an orange box to highlight the section used to upload a file and an orange arrow to highlight the 'start import job' button.

3.4. Click the Start import job button at the bottom right (will turn blue when you select a file).

When the upload is complete, you'll see the new data table listed in the left-hand panel of the Data tab. Click on the table's name to expand the table.

The participant table in the example above looks like this once imported into Terra:
Screenshot of the participant table in the Data tab of an example workspace.

Uploading and deleting set tables

Upload order: entities first, then sets

Because set tables reference entities in an entity table, you must upload the entity table first. For example, a sample_set table references a sample table. If you try to upload the sample_set table before the sample table, Terra will you give an error message.

Deleting entity tables deletes sets that reference the entities

Similarly, if you delete an entity table, Terra will automatically delete a set table that references it. In the example above, deleting the sample table will automatically delete the sample_set table.

Special tables: Pair (tumor-normal analysis)

If you're analyzing cancer data, you're familiar with tumor-normal pairs where a given participant has a sample from tumor tissue and one from normal tissue. To facilitate this type of analysis, Terra has predefined associations for participant, sample, and pair data tables. If you upload these tables in order, and specify that Terra should associate them, Terra will automatically link the tables together for use in workflows for somatic analysis.

Learn more in Adding pair tables to a workspace for tumor-normal analysis.

Download a template TSV file (pair.tsv) template here

Making tables programmatically 

You can automate the process of making and modifying tables using a special API called FISS. Learn how to do this in How to manage data with the FISS API.

Next steps

If you already have a data table in your workspace, you can modify it to meet your analysis needs. Learn more in How to modify and edit data tables

Maybe you're ready to perform an analysis but you need some workspace-level metadata like reference files. Read Creating Workspace Data tables to learn how to make a Workspace Data table that can be used in downstream WDL workflows.

Additional resources

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.