Workspace data tables (in the"DATA" tab) are a convenient way to reference and organize data in the cloud from different sources (including output files from previous analysis). You can populate with metadata (type and location of data) directly in the Terra interface or by uploading a tab-delimited load file.
- Understanding where your data are.. and are not
- Why use workspace data tables?
- Data table structure and format
- How to edit data table entries directly in Terra (small numbers of inputs)
- How to add a data table to the workspace (large numbers of inputs)
1. Understanding where your data are.. and are not
The diagram below shows how data can exist in the cloud - in the Workspace storage bucket, a data library bucket or other storage - but be separate from your workspace. Data tables contain metadata to connect the files in the cloud to the rest of your workspace.
You will know your data is NOT linked to the workspace data table if the Tables column is empty:
2. Why use workspace data tables?
Organizing large numbers of samples
When running a workflow analysis, you can manually put in complete direct paths for the input in the workflow code, but it's not a system that works well if you have more than a handful of files. Keeping all metadata in a workspace data table can save time and headache in the long run. They enable automation of back-to-back pipelines configured to read and write from the table. Tables can also be useful for organizing data as studies get more complex. The load file contains all you need to keep track of data, including intermediate outputs: what types (or "entities") of data you are working with, where the data are, and how the entities relate to each other (this can be useful if you have many samples from one participant, and perhaps many patients in a study, for example).
Side note on including additional metadata in a data table
Data tables aren't limited to data inputs for your workflows. They are flexible, intended to help organize all the relevant data you might need in the course of your study. You can include other useful information to your data table, such as phenotype data or links to other genomic data; the in-app table keeps it organized.
It works much like adding columns to a spreadsheet row corresponding to a particular entity. The column header describes what goes in the column. In the screenshot below, we've included a column labeled, "Participant," that links each sample to a particular individual in a study:
3. Data table structure and format
The data table has two parts: column headers that identify what's in that column, and rows of metadata. Each row corresponds to a different entity (a sample, or a participant, or a lane, for example).
Note: Your data table can be contain as much information as you need, but at minimum needs two columns: an id column and a data column (containing links to the input data files). You can include additional columns (to reference phenotype information, for example), and the data table will keep it all organized in one place. You can also configure workflows to write output metadata to the workspace table, which is useful for downstream analysis.
4. How to edit data table entries directly
Edit individual cells
If your workspace already includes a data table, you can edit the individual cells by clicking on the pencil icon in the cell you want to change:
5. How to add a data table to the workspace
5.1. Make a "load" file of metadata
To add table rows, or a new table entirely, you will need to generate a "load file" in tab-separated format and upload to the Data tab.
Helpful hint: You may find it easiest to use a spreadsheet editor to generate a load file. It keeps the values in columns instead of all jammed together in one line.
The most basic load file will look like this:
- Each row is one entity (sample, or participant)
- Each column is a particular variable (could be sample data in a Google bucket, phenotype data, participant ID)
- Each cell contains metadata such as the sample ID, the file path to data in a Google bucket, or phenotype data.
- The first column header has to have the format `entity:your_entity_id`. You can call your_entity whatever you want. For example, valid first column headers would be `entity:sample_id` or `entity:lane_number_id`.
The next column header would be the input variable name, such as `cram`
5.2. Save file in "tab-delimited format" on your local machine
5.3. Upload to the workspace Data tab
- Click on the "+" sign in the blue circle at the top of the left TABLES column
- Select either the "upload" or "paste" option
Once you've uploaded a load file to your workspace, you should see your data right away in the Data tab, under the Tables header. When uploaded, the data table will look like this: