Upload data and populate the table with linked file paths

Anton Kovalsky
  • Updated

Learn how to upload data files to workspace storage and a corresponding data table all at once, right in a Terra workspace. If metadata in your table match the name of an uploaded file, the table will be updated with the full gs path to your file in workspace storage (Google bucket). This is critical since the gs path in the data table is needed to run workflows with inputs defined by a data table.

Why use the Data Uploader?

The data Uploader automates and simplifies the process of uploading data and a corresponding data table to a workspace. Although you can upload files directly in a workspace, usually you need to add all of the URLs (the "gs://bucket-id/filename" file paths) that point to uploaded files in the table manually or with scripting. The Data Uploader does this for you, so you don't have to worry about programmatically generating the list of bucket URLs.

How to use the Data Uploader

Diagram showing the steps for how to access the data uploaded. In order, the steps are (1) Access data uploader, (2) Create/choose data collection, (3) Upload files, and (4) Upload data table.

Steps: 1. Access data uploader 2. Create/choose data collection 3. Upload files 4. Upload data table (TSV)

You still need to create a TSV file to generate a table for the data To learn how, see How to make a data table from scratch or a template. Note: The TSV you upload to generate a table for your data will need to include the file name, just not the full path. In other words, the Data Uploader will replace your-data-file.vcf with gs:fc-3cf82dc8-62a0-4518-ac35-89e03debe3d7/your-data-file.vcf.


Video data uploader tutorial: Loading Illumina paired-end sequencing data to Terra

Step 1. Access Data Uploader and select a target workspace

You can access the Data Uploader directly, or from within a workspace.

To access directly

If you don't want to start from a workspace, go to https://app.terra.bio/#upload

To access from within a workspace

Click the Import Data button at the top of the Data page and select Open data uploader.
Screenshot of the Data Tab in a Terra workspace. The Import Data button at the top left of the page is highlighted.

When you first arrive at the Data Uploader you'll see a homescreen with the workspace you were in.
Screenshot of the Data Uploader page in Terra.

To change the destination workspace, click the Change link in the workspace card

In the selection screen, you can  search for workspaces based on workspace names, tags, or billing projects. Find the workspace to which you'd like to add data, and click on it to select it.
Screenshot of the Data Uploader page in Terra, with some data selected.

Step 2. Create/choose a data collection

Once you've selected the target workspace, you'll be prompted to either create a "collection" or select an existing one.

What is a data collection?

Collections are a way to organize your data files into groups; like files in local storage. You can use distinct collections if you add data for different organisms, different experimental methodologies, or different sequencing technologies to the same workspace. Each collection will have its own associated data table with metadata like the file ID, the URL of the data file in workspace storage, and any other useful details. 

Creating versus adding to a collection

If a new set of files is similar to an existing collection (i.e., its associated table includes roughly the same columns of metadata), you can just add to that collection. Otherwise, you can create a new collection. In the example shown in the screenshot below, the workspace doesn't have any existing collections to choose from, so you would create a new one.

2.1. Click Create a new collection (or select a collection from an existing one, if you have any)
Screenshot of the Data Uploader page in Terra with the 'Create a new collection' option highlighted.

2.2. Name your collection and click Create collection.
An image of the 'Create a New Collection' popup window with a field to input the Collection Name.

How to delete a collection

Note: This will delete the data uploaded to the workspace bucket! 

1. In the workspace dashboard, click Open bucket in browser link (right column under Cloud Information).

2. In Google Cloud console storage browser, click the Uploads file. 

3. Select the data collection to delete. 

4. Click the blue DELETE link and follow instructions. 

A moving gif image of someone selecting the 'Cloud Information' sidebar tab, and then clicking the 'Open bucket in browser' link.

Step 3. Upload files

Once you have a collection selected, if you scroll down, you'll see an area prompting you to upload your files. 

3.1. Upload your files, either by dragging-and-dropping them on the page, or clicking the blue plus button at the bottom right to browse the files from your computer:
An image highlighting the area of the screen where someone can drag and drop their files, as well as the plus sign icon in the bottom right hand side that they can select to browse the files on their computer for upload.

3.2. Once the upload starts, you'll see a progress bar, along with an option to abort the upload:
An image of the progress bar of an upload in progress.

Step 4. Upload data table (TSV)

The final step is to create a table in the workspace that lists the data and any associated metadata. You'll upload a TSV file that includes, at minimum, a unique ID for each file and the file names of the data you just uploaded.

To learn more, see How to make a data table from scratch or a template

4.1. After clicking NEXT > in the data file upload step, you'll see a prompt to upload your metadata TSV. You'll have the option to either drag-and-drop, or click the blue upload button to select files from your local machine. Note: Data Uploader will only accept .TSV or .TXT files.
A screenshot highlighting the area of the Data Uploader page stating it will only accept .TSV or .TXT files.

4.2. To complete this process, once the TSV file upload is complete, click Create table (or Update table).

A screenshot of the blue Create Table button.

What to expect in the target workspace Data page

The data files will be in workspace storage - in a directory called uploads in a folder named however you named the collection. You can see these files by navigating to the Files icon under Other data (left-hand side).

The metadata will be in a data table. 

Finding data and metadata in the workspace
A moving gif image showing someone navigating through the Data Tables and Files sections of the Data Page within Terra.
Where are the files?

  • Data files are in workspace storage.
  • Corresponding metadata (links to data files) are in the data table. 

 

Was this article helpful?

2 out of 2 found this helpful

Comments

0 comments

Please sign in to leave a comment.