Custom Machine type does not exist
We are getting the following error in the workflow with ID: 38793b59-f230-44f3-8b7c-29199d7c69b5.
message: Task workflowBiobakery.strainphlanTree:7:3 failed. The job was stopped before the command finished. PAPI error code 3. Execution failed: creating instance: inserting instance: Invalid value for field 'resource.machineType': 'zones/us-central1-f/machineTypes/custom-102-675840'. Custom Machine type with name 'custom-102-675840' does not exist.
What's the solution?
Damian
Comments
8 comments
Hmm, interesting. The tasks requests cpu: 32 and memory: "660GB". What can be causing the glitch?
Damian
Can you share the workflow or workspace? It might be helpful to see what is happening.
See here: https://portal.firecloud.org/#workspaces/rjxmicrobiome/rjxmicrobiome/monitor/1ee8c41a-b787-4881-84c6-ddfe15b4130e
Thank you!
Damian,
From the Google Custom VM documentation I was able to find that the reason you are seeing the error about the custom machine is because the memory you allocated, 66GB, is not a multiple of 256MB.
Here is the document that describes all the limitations/rules for custom machine types:
https://cloud.google.com/compute/docs/instances/creating-instance-with-custom-machine-type#create
Hi Sushma,
660GB that we requested is a many times multiply of 256MB. My understanding is that Cromwell / Firecloud should be able to adjust the requested machine settings to the closest and cheapest available.
The error we observed is disturbing in that we asked for XGBs and were offered XTBs of memory that could have a dramatic impact on the cost. We don't have a capacity to monitor all 1000s-10.000s of instances that we often start in Firecloud to check if the allocated resources are in the acceptable range of the requested resources.
Could someone investigate what happened?
Damian
Damian,
We were able to see in the operations log for one of the failed calls that us-central1-f was being used for the VM but the zone does not seem to have the ability to support 102 CPUs but us-central1-c and us-central1-b seem to have the capacity to support your request.
Here is a screenshot of the operations log which shows that it is looking in zones/us-central1-f:
This could be why this machine type is unable to be created/"exist". Also a quick note about the TB - we were mistaken about that I apologize about that confusion.
Sushma
Hi Sushma,
Thank you investigating this further, it also makes me less anxious that the TB diagnosis was a mistake :)
I checked our task specification and it doesn't list any zone specification - can Cromwell not figure it out itself which zones to choose for requesting machines?
Damian
Damian,
Cromwell does choose defaults and I believe it defaults to us-central1-b according to Cromwell documentation. I think that in the log there are some back up zones that are cycled through and perhaps the first two zones were unavailable and it went to the third where it failed because that zone specifically did not have the custom machine capability. It might be worth explicitly setting a zone that contains the resources that you will need for the custom machine that you have listed above.
Here is a link to regions and zones: https://cloud.google.com/compute/docs/regions-zones/
Looks like us-central1-a, us-central1-b, and us-central1-c might be the ones to try.
Sushma
Please sign in to leave a comment.