Learn how to upload data files to workspace storage and a corresponding data table all at once, right in a Terra workspace. If metadata in your table match the name of an uploaded file, the table will be updated with the full gs path to your file in workspace storage (Google bucket). This is critical since the gs path in the data table is needed to run workflows with inputs defined by a data table.
Why use the Data Uploader?
The data Uploader automates and simplifies the process of uploading data and a corresponding data table to a workspace. Although you can upload files directly in a workspace, usually you need to add all of the URLs (the "gs://bucket-id/filename" file paths) that point to uploaded files in the table manually or with scripting. The Data Uploader does this for you, so you don't have to worry about programmatically generating the list of bucket URLs.
How to use the Data Uploader
Steps: 1. Access data uploader 2. Create/choose data collection 3. Upload files 4. Upload data table (TSV)
You still need to create a TSV file to generate a table for the data To learn how, see How to make a data table from scratch or a template. Note: The TSV you upload to generate a table for your data will need to include the file name, just not the full path. In other words, the Data Uploader will replace your-data-file.vcf
with gs:fc-3cf82dc8-62a0-4518-ac35-89e03debe3d7/your-data-file.vcf
.
Video data uploader tutorial: Loading Illumina paired-end sequencing data to Terra
Step 1. Access Data Uploader and select a target workspace
You can access the Data Uploader directly, or from within a workspace.
To access directly
If you don't want to start from a workspace, go to https://app.terra.bio/#upload.
To access from within a workspace
Click the Import Data button at the top of the Data page and select Open data uploader.
When you first arrive at the Data Uploader you'll see a homescreen with the workspace you were in.
To change the destination workspace, click the Change link in the workspace card
In the selection screen, you can search for workspaces based on workspace names, tags, or billing projects. Find the workspace to which you'd like to add data, and click on it to select it.
Step 2. Create/choose a data collection
Once you've selected the target workspace, you'll be prompted to either create a "collection" or select an existing one.
What is a data collection?
Collections are a way to organize your data files into groups; like files in local storage. You can use distinct collections if you add data for different organisms, different experimental methodologies, or different sequencing technologies to the same workspace. Each collection will have its own associated data table with metadata like the file ID, the URL of the data file in workspace storage, and any other useful details.
Creating versus adding to a collection
If a new set of files is similar to an existing collection (i.e., its associated table includes roughly the same columns of metadata), you can just add to that collection. Otherwise, you can create a new collection. In the example shown in the screenshot below, the workspace doesn't have any existing collections to choose from, so you would create a new one.
2.1. Click Create a new collection (or select a collection from an existing one, if you have any)
2.2. Name your collection and click Create collection.
How to delete a collection
Note: This will delete the data uploaded to the workspace bucket!
1. In the workspace dashboard, click Open bucket in browser link (right column under Cloud Information).
2. In Google Cloud console storage browser, click the Uploads file.
3. Select the data collection to delete.
4. Click the blue DELETE link and follow instructions.
Step 3. Upload files
Once you have a collection selected, if you scroll down, you'll see an area prompting you to upload your files.
3.1. Upload your files, either by dragging-and-dropping them on the page, or clicking the blue plus button at the bottom right to browse the files from your computer:
3.2. Once the upload starts, you'll see a progress bar, along with an option to abort the upload:
Step 4. Upload data table (TSV)
The final step is to create a table in the workspace that lists the data and any associated metadata. You'll upload a TSV file that includes, at minimum, a unique ID for each file and the file names of the data you just uploaded.
To learn more, see How to make a data table from scratch or a template.
4.1. After clicking NEXT > in the data file upload step, you'll see a prompt to upload your metadata TSV. You'll have the option to either drag-and-drop, or click the blue upload button to select files from your local machine. Note: Data Uploader will only accept .TSV or .TXT files.
4.2. To complete this process, once the TSV file upload is complete, click Create table (or Update table).
What to expect in the target workspace Data page
The data files will be in workspace storage - in a directory called uploads in a folder named however you named the collection. You can see these files by navigating to the Files icon under Other data (left-hand side).
The metadata will be in a data table.
Finding data and metadata in the workspace
Where are the files?
- Data files are in workspace storage.
- Corresponding metadata (links to data files) are in the data table.