For longer-term storage of files generated from an interactive Cloud Environment (i.e. JupyterLab), you can copy them from your local storage disk to the workspace blob storage. This will prevent any data loss if you delete or recreate your virtual machine (VM). This article provides step-by-step instructions on how to copy files to blob storage using the Azcopy command-line tool.
Why copy generated data to workspace storage?
Below are the primary reasons you might want to copy data generated in a notebook analysis to workspace storage (or to external blob storage).
Save files for longer-term storage
Files on the VM disk (including generated data from Jupyter Lab) will be deleted if you recreate or delete your Jupyter Cloud Environment. Copying to workspace storage preserves them, and also allows you to move to less-expensive long-term cloud storage.
Share generated data with collaborators
Virtual Machines on Terra are single-tenant, which means that each user has their own Cloud Environment that is inaccessible to others. In order to share your files with collaborators, the data needs to reside in your workspace blob storage.
Notebook (i.e., .ipynb) files are autosaved to workspace storageWhen working in JupyterLab on Terra, your notebook files are regularly autosaved and synced to your workspace blob storage. That means you do not normally need to worry about manually copying those notebook files to your workspace storage. Notebook files are autosaved every 120 seconds, and auto-synced to your workspace storage any time an autosave or manual save occurs.
However, output files (e.g., matrices) that reside in your local VM storage disk are not synced to your workspace blob storage. To preserve those files, you will want to copy or transfer them using a tool like Azcopy.
Using AzCopy on the command-line
AzCopy is a command-line tool used to move data to and from Azure storage. More information can be found on the Microsoft documentation on using azcopy. Below we provide instructions to save files from your Cloud Environment local storage disk to your workspace blob storage.
Step 1. Locate you workspace blob storage destination URL
1.1. The standard format for your workspace storage destination URL is:
https://[account].blob.core.windows.net/[container]/[path/to/blob]?[SAS]
You can find your exact workspace storage URL in your workspace Dashboard, selecting Cloud Information, and Storage SAS URL (image below).
1.2. Click on the “Copy to Clipboard” icon next to the URL, and then you are ready for Step 2.
Your SAS token expires after 8 hours and will need to be refreshed in your code.
Step 2. Copy from VM local storage to workspace blob storage (AzCopy)
The format for running the azcopy
command follows the same format:
azcopy [command] [source] [destination] [flag-name]
Parameters
- The
[command]
is the AzCopycopy
command. - The
[source]
describes the file location in your VM storage - The
[destination]
is the URL for the workspace blob storage -
[flags]
refers to the 'Option to Modify' operation.
A real-world example, with an "example_file.txt" file as the source, might look like this:
azcopy copy “/home/jupyter/example_file.txt” “https://lz1f6e31c08b3141f0842d12.blob.core.windows.net/sc-a097890e-fff8-468f-9ac6-9c74a5bec47f?sv=2021-06-08&spr=https&st=2023-01-19T19%3A52%3A55Z&se=2023-01-20T04%3A07%3A55Z&sr=c&sp=racwdl&sig=f%2B2wHMEjiO2IZGFaoYVw5oLhnHBuW%2F0J%2B74zOoKpgK0%3D”
Note that you will need to introduce an exclamation mark !
if you are running AzCopy commands outside of the console. So, you would use !azcopy
instead of azcopy
when you are writing out your command.
Step 3. Verify that your file was copied successfully to workspace blob storage
If you've followed the previous steps, then AzCopy will have copied the file of interest to workspace storage. However, to make sure you've written out the commands, directories, and destination URLs properly, you will want to verify that the file was actually copied over.
3.1. Use the AzCopy list
command, with the following format:
azcopy list [destination]
3.2. Using the same destination URL for the workspace blob in the previous step, run the following command to list out the contents of the workspace.
azcopy list “https://lz1f6e31c08b3141f0842d12.blob.core.windows.net/sc-a097890e-fff8-468f-9ac6-9c74a5bec47f?sv=2021-06-08&spr=https&st=2023-01-19T19%3A52%3A55Z&se=2023-01-20T04%3A07%3A55Z&sr=c&sp=racwdl&sig=f%2B2wHMEjiO2IZGFaoYVw5oLhnHBuW%2F0J%2B74zOoKpgK0%3D
What to expect
Once executed, this command will output the file names, but will not include the full path URL. For example, it might looks like this:
INFO: example_file.txt; Content Length: 270.67 KiB
You should now have a functional copy of your interactive analysis data in your workspace (blob) storage!