Here you'll learn about how to use Blob Storage, Azure's equivalent of a Google Bucket. A Blob is a cloud-based storage location to which you can upload data in one of two ways, depending on the amount of data you're uploading. If you're uploading less than 5 GB at a time, you can use the file manager (option 1). For uploads exceeding 5 GB, you should use the command line (option 2).
What is Blob Storage?
Azure Blob Storage is Microsoft's object storage solution for the cloud. Blob Storage is optimized for storing massive amounts of unstructured data. Unstructured data is data that doesn't adhere to a particular data model or definition, such as text or binary data.
Shared Access Signature (SAS) Tokens are used in Azure to access storage containers’ blobs. The dashboard of your Terra on Azure workspace generates a SAS token that you can use to read from and write to your storage blob (“Storage SAS URL”). The tokens available in your dashboard expire after eight hours. You can also use multiple valid tokens concurrently. You should not share your SAS token with others.
Storage container URL format
The storage SAS URL includes both the storage container (in blue) with an appended temporary SAS token (after the ? symbol - in red).
SAS URL example
Option 1: Upload using Terra’s File Manager
The file manager is designed as a user friendly interface for the convenience of all types of users.
This method supports uploads up to 5GB per fileIf you need to upload a larger file, see AzCopy Command Line Interface or the Microsoft Storage Explorer App.
1. Select the folder icon on the right-hand panel from anywhere in your Terra on Azure workspace.
2. Selecting the folder will open up a new screen where you can upload your files to your workspace’s Azure storage container.
3. Select upload to select a file to upload from your local machine.
4. After uploading your file, you should see it appear as a list.
5. Click the link to open a pop up window with additional details (such as the file size and Azure storage location), and provides an option to download the file to your local machine.
Option 2: Upload using AzCopy Command Line Interface (CLI)
Using the AzCopy command line interface (CLI) tool may be less comfortable for those with less experience using terminal commands, but is necessary when uploading more than 5 GB of data at a time.
1. Download azcopy and set up on your local machine.
2. Copy files from your local machine to your Azure blob storage using the command line interface (CLI).
You can find this URL in your Workspace Dashboard under Cloud Information (copy the "Storage SAS URL”). Remember that your SAS token expires after 8 hours.
azcopy copy [source] [destination] [flags]
azcopy copy /Users/user/Downloads/SRR17259545/SRR17259545_1.fastq.gz
3. Once your upload is complete, you can check what is available in your storage blob with this command:
azcopy list “https://[account].blob.core.windows.net/[container]/[path/to/blob]?[SAS]”
What to expect
This outputs the filenames, but does not include the full path URL. For example:
INFO: SRR17259545/SRR17259545_1.fastq.gz; Content Length: 27.67 MiB
To reference this data in an analysis in your workspaceYou will need to concatenate the Storage Container URL and the path to your file without the SAS token.
You can also get the path to the file by going to the Workspace File Manager shown in the section above.
Upload with Microsoft Azure Storage Explorer App
1. Download and set up Microsoft Azure Storage Explorer locally (No need to sign in!).
2. Create a workspace on Terra on Azure.
3. Go to the Workspace Dashboard.
4. Click cloud info (in the right column).
5. Click `copy to clipboard` for “Storage SAS URL”
- SAS tokens currently expire after 8 hours
- You can have more than 1 valid SAS token for a storage blob at a time
- Don’t share your SAS token with anyone, because anyone would be able to access your bucket!
6. Go to your local Storage Explorer app.
7. Under `Storage accounts`, `attach a resource`.
8. In the pop up, select `blob container`.
9. How will you connect? Select `Shared access signature URL (SAS)`.
10. In display name, you can write whatever.
11. Paste the Storage SAS URL from your Terra on Azure Workspace.
This URL includes
- Your permissions on the blob (Read, Add, Create, Write, Delete, List)
- The blob storage ID/”resource name”
- SAS token
12. Select “Connect”.
What to expect
A new popup should show you successfully added new connection
Copy data from a Google bucket to Azure storage container (AzCopy)
Step 1: Set up AzCopy on your local machine
See Microsoft documentation on AzCopy for step-by-step instructions.
Step 2: Set up authentication with Google Cloud:
2.1. Create a Service Account in Google Cloud Console.
2.2. Select the billing project you will use.
2.3. Create a secure key to use for authentication. This key will generally be downloaded locally in JSON format. Don’t share this key with anyone.
2.4. After you have a service key, set the GOOGLE_APPLICATION_CREDENTIALS environment variable to absolute path to the service account key file (citation)
2.5. In your Terra on Google workspace, use the file management system or data table to find the URL for the data you wish to copy to Azure.
2.6. Select “View this file in the Google Cloud Storage Browser”.
2.7. Click on the file and copy the Authenticated URL. It will start with ‘https://’
Use AzCopy to move data from Google Cloud to Azure:
1. Log into your Terra on Azure workspace where you would like your data to be placed.
2. On the dashboard of this workspace, under “Cloud information” copy the temporary SAS token associated with your workspace’s cloud storage.
3. Now that you can authenticate into both cloud storage containers, open your local terminal to use azcopy to perform the data copy. Be mindful that this will incur egress charges. Azure generally charges $0.08 and Google charges $0.11 for egress 1 GB of data (citation).
4. Using azcopy, you can perform the command:
azcopy copy 'https://storage.cloud.google.com/<bucket-name>/<directory-name>'
Note: You must use a signed URL from Google Cloud (e.g., starting with https://). A gs:// URL will not work
Please sign in to leave a comment.