Learn how to share data with a large group, while avoiding data transfer out (formerly "egress") charges.
Source material for this article was contributed by Willy Nojopranoto and the Verily Life Sciences solutions team as part of the design and engineering rollout of Terra support for data regionality. |
Overview: How to share your data without transfer charges
Making a Google Cloud Storage bucket public or sharing it with a group is an easy way to increase your data's impact. Unfortunately, if other users copy that data out of the bucket's Cloud Storage region, you may have to pay a data transfer out network charge.
Enabling Requester Pays on the bucket protects you from these charges by requiring the end user to pay instead. However, this is not always the ideal solution because end users may not be aware of data transfer charges until after they've incurred them.
Fortunately, it is possible to avoid data transfer charges altogether by using Google Cloud's VPC Service Controls. This document provides instructions on how to create a service perimeter around the cloud project that contains your bucket.
Service perimeters may not work with Requester Pays bucketsThere is a limitation on service perimeters when using the Requester Pays feature. When using the Requester Pays feature with a storage bucket inside a service perimeter that protects the Cloud Storage service, you cannot identify a project to pay that is outside the perimeter. The target project must be in the same perimeter as the storage bucket or in a perimeter bridge with the bucket's project.
How to configure the VPC
Before you begin
Creating the Access Level and Perimeter requires an access policy for your organization. If it doesn't exist yet, create an access policy for your organization. Organizations can have only one access policy. If you attempt to create an access policy and one already exists for your organization, you'll receive an error.
Recommended environment variables for the configuration process
$ export PROJECT_NUMBER=<The project number>
$ export PROJECT_ID=<The project ID>
$ export ORGANIZATION_ID=<The organization ID>
$ export POLICY_ID=<The project access policy ID>
$ export PROJECT_ADMIN_EMAIL=<Project administrator email>
# You can retrieve your ORGANIZATION_ID with this command:
$ curl -X POST -H "Authorization: Bearer \"$(gcloud auth
application-default print-access-token)\"" -H "Content-Type:
application/json; charset=utf-8"
https://cloudresourcemanager.googleapis.com/v1/projects/${PROJECT_NUMBER}:getAncestry
# This will return:
#{
# "ancestor": [
# {
# "resourceId": {
# "type": "project",
# "id": <PROJECT_ID>
# }
# },
# {
# "resourceId": {
# "type": "organization",
# "id": <ORGANIZATION_ID>
# }
# }
# ]
#}
# You can retrieve your POLICY_ID with this command:
$ gcloud access-context-manager policies list \
--organization=${ORGANIZATION_ID}
# This will return:
# NAME ORGANIZATION TITLE ETAG
# <POLICY_ID> <ORGANIZATION_ID> <POLICY_TITLE> <POLICY_ETAG>
Step 1: Create Access Level
First, you'll create an Access Level to allow access from the IP ranges of VMs in us-central1
. The IP ranges are publicly available from https://www.gstatic.com/ipranges/cloud.json.
Note: Restricting access to only these IP ranges will block the use of the Cloud Console to view the bucket. To continue using the Cloud Console, we give our individual account access.
First, create a file named us_central.yaml that contains the following.
$ head us_central.yaml
- members:
- user:${PROJECT_ADMIN_EMAIL}
- ipSubnetworks:
- 8.34.210.0/24
- 8.34.212.0/22
- 8.34.216.0/22
- 8.35.192.0/21
<snip>
You can get the full list of us-central1 IP ranges with something like this.
$ curl https://www.gstatic.com/ipranges/cloud.json | \
jq -r '.prefixes | .[] | {scope: .scope, ip: .ipv4Prefix} | select(.scope ==
"us-central1") | {ip} | .[]'
Or if you prefer to use Python instead of jq, use the code below.
$ curl https://www.gstatic.com/ipranges/cloud.json | \
python3 -c '
import sys, json
prefixes = json.load(sys.stdin)["prefixes"]
for p in prefixes:
if p["scope"] == "us-central1":
print(p["ipv4Prefix"])
Finally, use gcloud to create the access level.
$ gcloud access-context-manager levels create us_central1_only \
--title=us_central1_only \
--basic-level-spec=us_central.yaml \
--policy=${POLICY_ID} \
--combine-function="or"
Step 2: Create a Perimeter
Next, create a perimeter that uses the above access level. This perimeter will be placed around test-project and enforced on the Google Cloud Storage service.
$ gcloud access-context-manager perimeters create new_perimeter \
--title=new_perimeter \
--resources=projects/${PROJECT_NUMBER} \
--access-levels=us_central1_only \
--restricted-services=storage.googleapis.com \
--policy=${POLICY_ID}
Example: GCS Configuration
The following example demonstrates the configuration for Google Cloud Storage. For concreteness, this example is for data stored in the us-central1
region. VPC service controls are added to prevent data transfer outside of this region.
Example Overview
In this example, we have an organization named testorg.net
. In it, there is a project named test-project
.
When you put a project into a service perimeter, you can restrict the usage of Google Cloud services such as Cloud Storage. This prevents data in Cloud Storage from leaving the perimeter. However, we also apply an Access Level, which allows for specific access to services inside the perimeter. The Access Level created in this example allows data transfer in of requests from specific IP ranges. We don't specify any data transfer out rules, so only virtual machines (VMs) allowed in through the access level can download the Cloud Storage data.
Google Cloud Storage Cloud Resources
In test-project
, there is a bucket named test-data-bucket
. Our goal in this example is to create a perimeter such that we can restrict access on test-data-bucket
only to VMs in us-central1
.
Test examples
These are from VMs outside of the org.
From a us-central1 VM (success)
willyn@willyn-test:~$ curl http://metadata.google.internal/computeMetadata/v1/instance/zone -H
"Metadata-Flavor: Google"
projects/426023965843/zones/us-central1-a
willyn@willyn-test:~$ gsutil cp gs://test-data-bucket/test.log .
Copying gs://test-data-bucket/test.log...
/ [1 files][534.3 KiB/534.3 KiB]
Operation completed over 1 objects/534.3 KiB.>
From a European VM (fail)
willyn@europe-west2-london-instance:~$ curl http://metadata.google.internal/computeMetadata/v1/instance/zone -H
"Metadata-Flavor: Google"
projects/426023965843/zones/europe-west2-c
willyn@europe-west2-london-instance:~$ gsutil cp gs://test-data-bucket/test.log .
AccessDeniedException: 403 Request is prohibited by organization's policy. vpcServiceControlsUniqueIdentifier: 2n91jQ3T3Rh1jjZe4GWlMHJdNPB0QMg8fi14q44_v5OZut6mkRnFeQ
From a workstation when NOT logged in as PROJECT_ADMIN_EMAIL (fail)
$ gsutil cp gs://test-data-bucket/test.log .
AccessDeniedException: 403 Request is prohibited by organization's policy.
vpcServiceControlsUniqueIdentifier: 2n91jQ3T3Rh1jjZe4GWlMHJdNPB0QMg8fi14q44_v5OZut6mkRnFeQ
Notes & Caveats
VPC Service perimeters are available only to projects with a Cloud Organization. See Google's documentation for Creating and managing organizations to learn more.
Management of VPC service perimeters requires organization-level permissions. If you don't have permissions at this level, consult with your organization's IT administrators to set up VPC service perimeters around a dedicated data-sharing project and work with them to configure it.
Putting a project in the service perimeter places all Cloud Storage buckets or Artifact Registry registries in the project inside the perimeter. Thus, you should create a dedicated project (without other Cloud services enabled) for buckets and registries in the same location with the same restrictions.
The configuration settings in this article restrict direct copies from bucket to bucket, even if the bucket is in the same region. If you want to copy an image from one registry to another, in the above example, you can pull the image to a VM in us-central1 and then push it to any target registry to which you have access.
Note that the Storage Transfer Service API is not restricted, because the Storage Transfer Service eventually calls Cloud Storage APIs, which will be checked appropriately.
It is also possible to Configure the Artifact Registry to prevent data transfer charges. The process is similar to the steps outlined in this article, but specific to Artifact Registry access.