Support Regional workspace buckets Completed

Post author
Matt Bookman

This is related to https://broadinstitute.zendesk.com/hc/en-us/community/posts/360041486831-Specify-your-region-for-compute-storage, but is a more targeted request.

It would be great to have support for Regional workspace buckets. Workspace buckets today are strictly US Multi-regional.

For life scienes projects, we recommend that people store large data (such as WGS) in GCS Regional buckets (and by convention use us-central1). This drops storage costs significantly over Multi-Regional buckets especially when data is large.

The per-month cost of storing a GB of data in multi-regional is $0.026 and regional is $0.02. So if you are storing 100 TB, you can either pay $2,662.40 or $2,007.04 per month (per the Google Cloud Pricing Calculator). 100 TB is easy to accumulate with genomic data. 100 TB is easy to accumulate with intermediate files from running the 5 dollar genome and the joint genotyping workflows.

One more thing to note is that if you have your own Regional Cloud Storage bucket, but use Terra for data processing, you'll pay to move data (egress charges of $0.01/GB) from the Multi-regional bucket to Regional. Extending the 100 TB example above, you'll pay ~$1,000 to move your data, but then immediately start saving ~$650 per month.

If Terra supported Regional workspace buckets then you could move from your workspace bucket to your 'permanent" bucket without charges (assuming they are in the same region).

 

Note that if your data and compute are both in the same region, such as us-central1, there is no practical loss of functionality versus multi-regional storage. The use cases documented by Google Cloud:

Multi-Regional Storage

Storing data that is frequently accessed ("hot" objects) around the world, such as serving website content, streaming videos, or gaming and mobile applications.

Regional

Storing frequently accessed data in the same region as your Google Cloud DataProc or Google Compute Engine instances that use it, such as for data analytics.

Most life sciences data is the latter.

 

Thanks!

Comments

0 comments

Please sign in to leave a comment.