Automatically generate data tables with file pointers in a workspace - Released 3/28/25

Adam Mullen

What we're solving

Users that are new to the cloud or Terra have to learn several new concepts to get their data into workspaces to leverage best practice workflows. These include learning about cloud buckets, cloud file paths (which we call metadata), and data table TSV requirements. Setting up data tables and aligning metadata to file paths in a cloud bucket can often require a lot of manual effort, can result in copy/paste errors, and leads to general frustrations getting data into Terra.

Our goal is to improve common patterns of data uploads for users to automatically generate data tables in a workspace based on files that have been uploaded to the workspace’s bucket, starting with single and paired-end Illumina and ONT sequencing samples.

What's changing for you

When using the Terra Data Uploader Tool, there will be an option to autogenerate a data table instead of uploading your own TSV. Terra will attempt to automatically match known file patterns (see below) for single and paired-end sequencing and autogenerate a data table with sample identifiers linking to those files. We create the sample identifier by removing the file ending.

Examples that are currently supported

  • Single-end sequencing (e.g. Sample1_01.fastq.gz)
    Results in a table with two columns: sample_id, read1
  • Paired-end sequencing (e.g. Sample1_01.fastq.gz and Sample1_02.fastq.gz)
    Results in a table with three columns: sample_id, read1, read2

Supported formatting, based on known patterns from Illumina sequencing runs

  • sample01_1.fastq.gz and sample01_2.fastq.gz
  • sample01_R1.fastq.gz and sample01_R2.fastq.gz 
  • sample01_F.fastq.gz and sample01_R.fastq.gz 
  • sample01_R1.fastq and sample01_R2.fastq
  • SampleName_S1_L001_R1_001.fastq.gz and SampleName_S1_L001_R2_001.fastq.gz

Supported formatting, based on known patterns from ONT sequencing runs:

  • barcode01.fastq.gz
  • Complete_barcode01.fastq.gz

We would love feedback on other file formats to support as well!

Benefits to the user experience

Enabling functionality that allows for automatically matching known file patterns and producing a data table makes aligning metadata rows in a data table to file paths in a bucket a lot easier, saves time, and reduces the potential for copy/paste mistakes during editing.

Try it Out

Go to the Terra Data Uploader Tool and use the option to autogenerate a data table instead of uploading your own TSV. Let us know what you think!

Comments

1 comment

  • Comment author
    Adam Mullen

    We added some additional support for identified ONT sequencing runs and made the feature generally available for all users as of 3/28/25! We are continuing to collect feedback about sequencing technologies and file patterns to support, so please submit a feature request or support ticket if something seems missing!

    0

Please sign in to leave a comment.