Raw genomics data is in the form of many reads from the sequencer. Since it would be messy and time-consuming to type in the location of every one of these data files as input for a WDL, the input is often a 'list' file. This article is a step-by-step guide to how to create a list file of reads for input to a workflow.
What is a list file?
A list file is just a list of all the data, where each row is a is a link to an unmapped BAM file in the cloud. E.g., it is the expected input to 1_Processing-For-Variant-Discovery.
If you open a list file in a text editor, it looks like this:
Make a list file of data in a Google bucket using gcloud storage
1. Open a terminal configured to run gcloud storage.
For detailed instructions on how to run gcloud storage in your terminal, see Moving data to/from a Google bucket.
2. Output a list of the bam files (in a Google bucket) to a local file.
To copy to a file named `ubams.list` use the following command:
gcloud storage ls gs://your_data_Google_bucket_id/ > ubams.list
Note: You need to replace `your_data_Google_bucket_id` with the path to your workspace Google bucket (or wherever your data are). You can copy your workspace bucket path to your clipboard by clicking the clipboard icon at the far right of your dashboard tab under `Google bucket`.
To save to a different list file name, replace "ubams.list" in the command line above with the filename of your choice. Just remember to use that filename in the commands below!!
3. Copyubams.list to your workspace Google bucket from your local machine.
gcloud storage cp ubams.list gs://your_data_Google_bucket_id/
You can verify that the list file is in your workspace bucket by opening your Google bucket in a browser from the dashboard page (right column).