How to import metadata to a workspace data table

Allie Hajian
  • Updated

You can import metadata into your workspace data table by either copying from an existing workspace or importing a file. This article walks through both options.

Copying from an existing workspace

  • Go to the workspace you want to import metadata from. You will notice that you can only import data from workspaces that are compatible with the Authorization domain you have set.
  • In the DATA tab, select the tables (participants, samples, pairs, or sets) you want. Importing sets will bring over all the data required for the set. For example, if you import a sample set, the sample and participant data linked to the set will also be copied over.
  • Select "Download tsv" to save the entire table to your local machine:
    Download_tsv_Screen_Shot.png
  • Go to DATA tab in the workspace where you want the table to be
  • Click on the "+" sign in the left TABLES column and follow the prompts to upload the tsv file from your local machine.

    Import conflicts

    Note that import conflicts can occur if you already have an entity table in your workspace that matches what you are importing. Terra will notify you that the entity already exists in the workspace.

Note about data files

Copying tables (metadata) from another workspace will not import any linked files into your workspace bucket. Rather, it will refer to file paths in the bucket of the workspace you copied. If that workspace bucket is deleted, your workspace data model will no longer refer to an existing bucket path.

Importing a file

To import metadata corresponding to a particular entity type, you will upload text files in tab-separated-value (or tab-delimited) format. You must use separate files for each entity type.

The row of each file must contain the appropriate field names in their respective column headers. For more information on required file formats, see this article.  

Uploading an array of files or strings

You may be in the situation of having multiple files - i.e. strings of metadata - that belong to one participant, sample, pair, or set. For example, say you have been given genotyping files in VCF format for a collection of samples, with a total of twenty-two files per sample set. You don’t want to create a new column in the sample set data table for each file because that's time-consuming. You would also have to launch the analysis in Terra repeatedly to run on each file. To ensure you can run on each item in the array without manual intervention., you want to buildWDL that inputs an array of VCF files. 

To get the array into your data table, you can write WDL code that will output a file of file paths or strings into an array format. This requires a file with a list of file paths or strings as the input. A task in your WDL can read the lines of the file, output it to your data model as an array, then you can use the method configuration to assign it to a workspace attribute (“workspace.X”) or an attribute of the participant, sample, pair, or set that you are running on (“this.X”).

Here are two examples you can alter to fit your use case 

In the example above, the input would be a file that has a list of VCF file paths, one per line using “gs://” format.

Example 1: manipulating the array

The code below has a command portion left blank so that you can manipulate the array if you desire. This WDL will copy your files to the virtual machine the task spins up, which makes sense if you are manipulating the array of files further. The 50 GB disk size is to account for copying those files to the virtual machine. You will want to change for your use case. 

Example 1’s Method and Method Configuration are published in the Methods Repository.

workflow fof_usage_wf {
   File file_of_files
   call fof_usage_task {
    input:
     fof = file_of_files
  }
   output {
    Array[File] array_output = fof_usage_task.array_of_files
   }
}
task fof_usage_task {
   File fof
   Array[File] my_files=read_lines(fof)
   command {
   #do stuff with arrays below
   #....
   }
   runtime {
       docker : "ubuntu:16.04"
       disks: "local-disk 50 HDD"
       memory: "2 GB"
   }
   output {
    Array[File] array_of_files = my_files
    }  
}

Example 2: without manipulating the array:

workflow fof_usage_wf {
   File file_of_files
   Array[File] array_of_files = read_lines(file_of_files)

   output {
    Array[File] array_output = array_of_files
   }
}

Importing arrays into your data model directly with a TSV is not currently available. We are working on functionality to make this easier to do in the web interface.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.