This may be related to the earlier bug where tasks were hanging during localization of BAMs. In my case tasks are not hanging, but rather reporting a failure. While intermittent, when scattering a job, where each scattered job takes as input the same BAM, there is a high probability a failure will occur; e.g. when scattering mutect1 over 10 VMs, the first time I launched the workflow I got two failures. I then relaunched and got 3 failures. I wanted to run the workflow WITHOUT call caching in order to get an estimate of the cost of the entire workflows. I find I cannot do that because the only way to successfully get through the workflow is to run it repeatedly with call caching.
The BAMs are whole exomes, but still large, between 30 and 50 GB.