How to use files from Terra library data in workflow?



  • Sushma Chaluvadi

    Hi Oliver,

    When you import data from 1000 Genomes, for example, there are no physical copies made of those files which is why you do not see them in your Google bucket. Rather, what is copied is metadata pointing to the location where the actual files live.

    To use the files you would use the Attributes column of your Workflow. For example, my Workflow requires a sample input denoted in the "Process multiple workflows from: Sample" section:

    If I continue to scroll down, I will see that there are some Attributes that I need to fill out. This column is where you tell your Workflow which Data table to look at and which column to read from (for each row of that table):

    In the above screenshot, you will see that there are 2 variables that this Workflow looks for (amongst others that I did not screenshot). One is input_bam and the other is input_bam_index. Which files should be assigned to each of these variables? That is what the Attributes column does.

    this.analysis_read_bam will point to the Sample table to the analysis_ready_bam column for each row, or Sample, in the Sample table.

    this.analysis_ready_bam_index will point to the Sample table to the analysis_read_bam_index column for each row, or Sample, in the Sample table.

    "this." points to the Sample table because that is what was chosen in the "Process multiple workflows from" drop-down menu. Had that listed "Participant", this.analysis_read_bam would point to the Participant table - this may fail if the Participant table does not contain that column.

    Below is a screenshot of the Sample table in this Workspace:

    You can see that the columns are named analysis_read_bam and analysis_ready_bam_index.

    So depending on what the column names are in your Sample data table, you would list that in your Workflow Attributes.

    Please not that this is a very general overview. If you would like to learn more about how to set up your Workflow and your Data Model, there is detailed documentation here:


  • Oliver Ruebenacker


    Thank you for the response! That makes sense that files are not copied to my workspace, since a reference to another workspace, if that one is public, should be sufficient.

    However, my problem is that I cannot find the data files anywhere and don't know how to refer to them.

    For example, the data referred to here:

    Where are the files, and how can I refer to them?


    Best, Oliver

  • Sushma Chaluvadi

    Hi Oliver,

    I see now that when you export data from the 1000 Genomes, it pastes into your Data Model a BigQuery query. Using this data is slightly different. This is a video to a talk that walks through the steps on how to do this. This is based on a Workspace that we used as an example in the workshop that you are welcome to clone and follow along the video:

    I hope this helps,


  • Oliver Ruebenacker

    Basically, there are no files from the 1000 Genomes project available in Terra's public library?

  • Sushma Chaluvadi


    Yes, that is correct - Terra does not physically store any files from the 1000 Genomes project. Terra's Data Explorer allows users to explore the data that is hosted by 1000 Genomes and then export the metadata that points to the location of the physical files.

Please sign in to leave a comment.

Powered by Zendesk