How to add an array of files to a data table

Allie Cliffe
  • Updated

Explore options for creating arrays of files or strings in a table cell. 

When to use arrays as attributes in table cells

You may want to use an array of cells in a data table if you have multiple data files of the same type that belong to one attribute in a single entity. Arrays are especially useful if your workflow takes an array as input.

That's a mouthful! For example, if you have genotyping files in VCF format for a collection of samples, each sample might have a total of twenty-two VCF files, one for each chromosome. Your workflow may take in an array (all twenty-two files) as input and generate a single output file for each sample.

Example: Sample with multiple files (an array) as input to a workflow

diagram of popup to add arrays as inputs in Terra data page with list type attributes

You don’t want a separate column in the sample data table for each file: that's time-consuming, and requires launching workflows repeatedly to run separately on each file.

Read on to learn how to set up your sample table to include an array. 

Option 1: Create an array in Terra (small numbers of files)

If you only have a small number of input files for a small number of samples, you can create an array in the sample table in Terra using the "list" attribute type. 

1. Upload or copy your entity table from another workspace.

2. Add an attribute column with a single file (as a placeholder).

3. Click on the pencil icon to edit the cell that will contain the array of input files.

4. Select type string and check Value is a list.
Edit-attribute_Add-list-of-input-files_Screen_shot.png

5. Add all files in the array, one at a time, using Add item.

6. When done, select Save changes

Option 2: Upload a TSV file with arrays

Follow the directions here to generate a sample.tsv file. 

Array formatting requirements

The array in your spreadsheet must have the format:

["gs://file-directory/file1-name","gs://file-directory/file2-name","etc."]
  • Array values must be between []
  • Each file URL must be in double quotes
  • File URLs must be separated by a comma

Upload the TSV file by clicking the Import data button at the top left of the Data page and choosing Upload TSV from the menu. 

Import an array with a WDL

To run on each file without manual intervention, you want 1) a WDL that inputs an array of VCF files and 2) a way to input an array of files.

To get the array into your data table, you can write WDL code that will output a file of file paths or strings into an array format. This requires a file with a list of file paths or strings as the input. A task in your WDL can read the lines of the file, output it to your data model as an array, then you can use the method configuration to assign it to a workspace attribute (“workspace.X”) or an attribute of the participant, sample, pair, or set that you are running on (“this.X”).

To generate a table with an array programmatically, you can use a WDL. In the example above, the input would be a file that has a list of VCF file paths, one per line using “gs://” format.

Example 1: Manipulating the array with a WDL

The code below has a command portion left blank so that you can manipulate the array if you desire. This WDL will copy your files to the virtual machine the task spins up, which makes sense if you are manipulating the array of files further. The 50 GB disk size is to account for copying those files to the virtual machine. You will want to change for your use case. 

Example 1’s WDL and configuration (JSON) are published in the Methods Repository.

workflow fof_usage_wf {
   File file_of_files
   call fof_usage_task {
    input:
     fof = file_of_files
  }
   output {
    Array[File] array_output = fof_usage_task.array_of_files
   }
}
task fof_usage_task {
   File fof
   Array[File] my_files=read_lines(fof)
   command {
   #do stuff with arrays below
   #....
   }
   runtime {
       docker : "ubuntu:16.04"
       disks: "local-disk 50 HDD"
       memory: "2 GB"
   }
   output {
    Array[File] array_of_files = my_files
    }  
}

Example 2: without manipulating the array:

workflow fof_usage_wf {
   File file_of_files
   Array[File] array_of_files = read_lines(file_of_files)

   output {
    Array[File] array_output = array_of_files
   }
}

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.