Learn how to use Blob storage, Azure's equivalent of a Google Bucket. A Blob is a cloud-based storage location to which you can upload unstructured data in one of two ways: 1) using the file manager (uploading less than 5 GB at a time) or 2) using the command line (for uploads exceeding 5 GB).
What is Blob Storage?
Azure Blob (Binary Large OBject) Storage is Microsoft's cloud-based object storage solution. Blob Storage is optimized for storing massive amounts of unstructured data (data that doesn't adhere to a particular data model or definition, such as text or binary data).
Blob Storage access
You'll use Shared Access Signature (SAS) Tokens to access storage containers’ Blobs. The dashboard of your Terra on Azure workspace generates a SAS token (in the form of a Storage SAS URL) you can use to read from and write to your storage blob.
SAS token caveats
- The tokens available in your dashboard expire after eight hours.
- You can also use multiple valid tokens concurrently.
- You should not share your SAS token with others.
Storage container URL format
The storage SAS URL includes both the storage container (in blue) with an appended temporary SAS token (after the ? symbol - in red).
SAS URL example
Option 1: Upload using Terra’s File Manager (up to 5BG per file)
The file manager is designed as a user-friendly interface for the convenience of all users.
1. Select the folder icon on the right side panel from anywhere in your Terra on Azure workspace.
2. Selecting the folder will open up a new screen where you can upload your files to your workspace’s Azure storage container.
3. Select upload to select a file to upload from your local machine.
4. After uploading your file, you should see it appear as a list under the Name column.
5. Click the link to open a pop-up window with additional details (such as the file size and Azure storage location), and an option to download the file to your local machine.
Option 2: Microsoft Azure Storage Explorer App
2.1. Download and set up Microsoft Azure Storage Explorer locally.
No need to sign in!
2.2. Go to Dashboard of Terra on Azure Workspace.
2.3. Click Cloud info (on the right-hand side).
2.4. Click copy to clipboard (file icon) to the right of Storage SAS URL.
SAS token caveats-SAS tokens currently expire after 8 hours
- You can have more than 1 valid SAS token for a storage blob at a time.
- Don’t share your SAS token with anyone, because anyone would be able to access your workspace storage!
2.5. Go to your local Storage Explorer app.
2.6. Under Storage accounts, choose attach a resource.
2.7. In pop-up, select blob container.
2.8. How will you connect? Select Shared access signature URL (SAS).
2.9. In the display name field, you can write whatever.
2.10. Paste the Storage SAS URL from your Terra on Azure Workspace in the Blob container SAS URL field.
What the SAS URL includes- Your permissions on the blob (Read, Add, Create, Write, Delete, List)
- Blob storage ID/”resource name”
- SAS token
2.11. Select Connect.
What to expect/do next
A new pop-up should show you successfully added the new connection. You can use the built-in file directory to transfer large files to your workspace cloud storage.
Option 3: Upload using AzCopy Command Line Interface (CLI)
Using the AzCopy command line interface (CLI) tool may be less comfortable if you have less experience using terminal commands, but is necessary when uploading more than 5 GB of data at a time.
1. Download azcopy and set it up on your local machine.
2. Copy files from your local machine to your Azure blob storage using the command line interface (CLI).
azcopy copy [source] [destination] [flags]
The destination is (remember the double quotes!)
You can find this URL in your Workspace Dashboard under Cloud Information (copy the Storage SAS URL by clicking the copy icon). Remember that your SAS token expires after 8 hours.
azcopy copy /Users/user/Downloads/SRR17259545/SRR17259545_1.fastq.gz
3. Once your upload is complete, you can check what is available in your storage blob with this command:
azcopy list “https://[account].blob.core.windows.net/[container]/[path/to/blob]?[SAS]”
What to expect
This outputs the filenames but does not include the full path URL. See the example below.
INFO: SRR17259545/SRR17259545_1.fastq.gz; Content Length: 27.67 MiB
To reference this data in an analysis in your workspaceYou will need to concatenate the Storage Container URL and the path to your file without the SAS token.
You can also get the path to the file by going to the Workspace File Manager shown in the section above.
Copy data from a Google bucket to an Azure storage container (AzCopy)
Step 1: Set up AzCopy on your local machine
For step-by-step instructions, see Microsoft documentation on AzCopy.
Step 2: Set up authentication with Google Cloud
2.2. Select the Terra Billing project you will use.
2.3. Create a secure key to use for authentication. This key will generally be downloaded locally in JSON format. Don’t share this key with anyone.
2.4. After you have a service key, set the GOOGLE_APPLICATION_CREDENTIALS environment variable to the absolute path to the service account key file (citation)
2.5. In your Terra on Google Workspace, use the file management system or data table to find the URL for the data you wish to copy to Azure.
2.6. Select View this file in the Google Cloud Storage Browser.
2.7. Click on the file and copy the Authenticated URL. It will start with ‘https://’
Use AzCopy to move data from Google Cloud to Azure
1. Log into your Terra on Azure Workspace where you would like your data to be placed.
2. On the dashboard of this workspace, under Cloud Information, copy the temporary SAS token associated with your workspace’s cloud storage.
3. Now that you can authenticate into both cloud storage containers, use azcopy in your local terminal to copy the data. Be mindful that this will incur egress charges. Azure generally charges $0.08, and Google charges $0.11 to egress 1 GB of data (citation).
4. You can perform the command below using azcopy.
azcopy copy 'https://storage.cloud.google.com/<bucket-name>/<directory-name>'
Note: You must use a signed URL from Google Cloud (e.g., starting with https://). A gs:// URL will not work.