Raw genomics data is in the form of many reads from the sequencer. Since it would be messy and time-consuming to type in the location of every one of these data files as input for a WDL, the input is often a 'list' file. This article is a step-by-step guide to how to create a list file of reads for input to a workflow.
What is a list file?
A list file is just a list of all the data, where each row is a is a link to an unmapped BAM file in the cloud. It is the expected input to 1_Processing-For-Variant-Discovery, for example.
If you open a list file in a text editor, it looks like this:
Make a list file of data in your Google bucket using gsutil
1. Open a terminal configured to run gsutil.
For detailed instructions on how to run gsutil in your terminal, see Moving data to/from a Google bucket.
2. Output a list of the bam files (in a Google bucket) to a local file.
To copy to a file named `ubams.list` use the following command:
gsutil ls gs://your_data_Google_bucket_id/ > ubams.list
Note that you will need to replace `your_data_Google_bucket_id` with the path to your workspace Google bucket (or wherever your data are). You can copy your workspace bucket path to your clipboard by clicking the clipboard icon at the far right of your dashboard tab under `Google bucket`.
To save to a different list file name, replace "ubams.list" in the command line above with the filename of your choice. Just remember to use that filename in the commands below!!
3. Copy ubams.list to your workspace Google bucket from your local machine.
gsutil cp ubams.list gs://your_data_Google_bucket_id/
You can verify that the list file is in your workspace bucket by opening your Google bucket in a browser from the dashboard page (right column).