I decided to finally give cloud computing a try using the $300 "trial use" credit that Google offers for new users. I wanted to see how far I could get with the GATK4 joint genotyping workflow. This is a summary of what I did (all workflows run via the Terra GUI):
1. Created a Google storage bucket.
2. Uploaded 469 BAM files (totaling ~3 Tb) and my reference genome (mosquito) to the bucket.
3. Launched a few small tests of HaplotypeCaller on a some tiny test intervals of the genome.
4. Ran HaplotypeCaller successfully on one BAM file (completed in ~14 hours).
5. Ran HaplotypeCaller successfully on a batch of 10 more BAM files (completed in <24 hours).
6. Downloaded the resulting g.vcf.gz files from the Google bucket to my local file system.
Now I find that I have used $263 of my $300 credit. That rather surprised me. My question is: did I do something wrong, or is that really how much cloud computing costs? Since my ultimate goal was to do joint genotyping on the full set of 469 genomes, this doesn't scale very well for me. Extrapolating linearly, the full analysis would end up costing me over $10,000. Comments? Suggestions?
Please sign in to leave a comment.