Automatically generate data tables with file pointers in a workspace

Adam Mullen
  • Updated

What we're solving

Users that are new to the cloud or Terra have to learn several new concepts to get their data into workspaces to leverage best practice workflows. These include learning about cloud buckets, cloud file paths (which we call metadata), and data table TSV requirements. Setting up data tables and aligning metadata to file paths in a cloud bucket can often require a lot of manual effort, can result in copy/paste errors, and leads to general frustrations getting data into Terra.

Our goal is to improve common patterns of data uploads for users to automatically generate data tables in a workspace based on files that have been uploaded to the workspace’s bucket, starting with single and paired-end Illumina sequencing samples.

What's changing for you

When using the Terra Data Uploader Tool, there will be an option to autogenerate a data table instead of uploading your own TSV. Terra will attempt to automatically match known file patterns (see below) for single and paired-end sequencing and autogenerate a data table with sample identifiers linking to those files. We create the sample identifier by removing the file ending.

Examples that are currently supported

  • Single-end sequencing (e.g. Sample1_01.fastq.gz)

Results in a table with two columns: sample_id, read1

  • Paired-end sequencing (e.g. Sample1_01.fastq.gz and Sample1_02.fastq.gz)
    Results in a table with three columns: sample_id, read1, read2

Other supported formatting, based on known patterns from Illumina sequencing runs

  • sample01_1.fastq.gz and sample01_2.fastq.gz
  • sample01_R1.fastq.gz and sample01_R2.fastq.gz 
  • sample01_F.fastq.gz and sample01_R.fastq.gz 
  • sample01_R1.fastq and sample01_R2.fastq
  • SampleName_S1_L001_R1_001.fastq.gz and SampleName_S1_L001_R2_001.fastq.gz

We would love feedback on other file formats to support as well!

Benefits to the user experience

Enabling functionality that allows for automatically matching known file patterns and producing a data table makes aligning metadata rows in a data table to file paths in a bucket a lot easier, saves time, and reduces the potential for copy/paste mistakes during editing.

Try it Out

Go to https://app.terra.bio/#feature-preview to enable the “Autogenerate data table for single and paired-end sequencing” feature and let us know what you think!

Comments

0 comments

Please sign in to leave a comment.