How to save data from an interactive analysis to workspace storage

Derek Caetano-Anolles
  • Updated

For longer-term storage of files generated from an interactive Cloud Environment (i.e. JupyterLab), you can copy them from your Cloud Environment's persistent disk to the workspace blob storage. This will prevent any data loss if you delete or recreate your Cloud Environment. This article provides step-by-step instructions on how to copy files to blob storage using the AzCopy command-line tool.

Why copy generated data to workspace storage?

Terra's Cloud Environments come with a detachable persistent disk as the default storage location for all data generated during your interactive analysis. The disk is persistent and detachable because when you delete a Cloud Environment, you are given the option to detach that disk so that it may be reattached to another Cloud Environment later on. However, there are several reasons to regularly copy any data you don't wish to lose from that persistent disk to another location.

Below are the primary reasons you might want to copy data generated in a notebook analysis to workspace storage (or to external blob storage).  

Save files for longer-term storage

Files on the persistent disk (including generated data from JupyterLab) will be permanently deleted if you delete that disk, so transferring them is a best practice we highly recommend. Copying to workspace storage preserves them, and also allows you to move to less-expensive long-term cloud storage. 

Share generated data with collaborators

Virtual Machines on Terra are single-tenant, which means that each user has their own Cloud Environment that is inaccessible to others. In order to share your files with collaborators, the data needs to reside in your workspace blob storage.

Notebook files (such as .ipynb files) are autosaved to workspace storageWhen working in JupyterLab on Terra, your notebook files are regularly autosaved and synced to your workspace blob storage. That means you do not normally need to worry about manually copying those notebook files to your workspace storage. Notebook files are autosaved every 120 seconds, and auto-synced to your workspace storage any time an autosave or manual save occurs.

However, output files (e.g., matrices) that reside in your Cloud Environment's persistent disk are not synced to your workspace blob storage. To preserve those files, you will want to copy or transfer them using a tool like AzCopy.

Using AzCopy on the command-line

AzCopy is a command-line tool used to move data to and from Azure storage. More information can be found on the Microsoft documentation on using AzCopy. Below we provide instructions to save files from your Cloud Environment local storage disk to your workspace blob storage.

Step 1. Locate you workspace blob storage destination URL

1.1. The standard format for your workspace storage destination URL is:

https://[account].blob.core.windows.net/[container]/[path/to/blob]?[SAS]

You can find your exact workspace storage URL in your workspace Dashboard, selecting Cloud Information, and Storage SAS URL (image below).

Screenshot of the 'Cloud Information' section of the workspace dashboard. It shows fields for the Cloud Name, the Resource Group ID, the Storage Container URL, and (most pertinent for this article) the Storage SAS URL.

1.2. Click on the “Copy to Clipboard” icon next to the URL, and then you are ready for Step 2.

Your SAS token expires after 8 hours and will need to be refreshed in your code.

Step 2. Open a terminal from within JupyterLab

You can access the terminal in JupyterLab by selecting File > New > Terminal from the menu at the top. 

Terra-on-Azure_How-to-open-a-terminal-in-JupyterLab_Screenshot.png

Step 3. Use AzCopy to move data from persistent disk into workspace blob storage

The format for running the azcopy command follows the same format:

azcopy [command] [source] [destination] [flag-name]

Parameters

  • The [command] is the AzCopy copy command.
  • The [source] describes the file location in your Cloud Environment persistent disk
  • The [destination] is the URL for the workspace blob storage
  • [flags] refers to the 'Option to Modify' operation.

A real-world example, with an "example_file.txt" file as the source, might look like this:

azcopy copy “/home/jupyter/example_file.txt” “https://lz1f6e31c08b3141f0842d12.blob.core.windows.net/sc-a097890e-fff8-468f-9ac6-9c74a5bec47f?sv=2021-06-08&spr=https&st=2023-01-19T19%3A52%3A55Z&se=2023-01-20T04%3A07%3A55Z&sr=c&sp=racwdl&sig=f%2B2wHMEjiO2IZGFaoYVw5oLhnHBuW%2F0J%2B74zOoKpgK0%3D”

Note that you will need to introduce an exclamation mark ! if you are running AzCopy commands outside of the console. So, you would use !azcopy instead of azcopy when you are writing out your command.

Step 4. Verify that your file was copied successfully to workspace blob storage

If you've followed the previous steps, then AzCopy will have copied the file of interest to workspace storage. However, to make sure you've written out the commands, directories, and destination URLs properly, you will want to verify that the file was actually copied over.

4.1. Use the AzCopy list command, with the following format:

azcopy list [destination]

4.2. Using the same destination URL for the workspace blob in the previous step, run the following command to list out the contents of the workspace.

azcopy list “https://lz1f6e31c08b3141f0842d12.blob.core.windows.net/sc-a097890e-fff8-468f-9ac6-9c74a5bec47f?sv=2021-06-08&spr=https&st=2023-01-19T19%3A52%3A55Z&se=2023-01-20T04%3A07%3A55Z&sr=c&sp=racwdl&sig=f%2B2wHMEjiO2IZGFaoYVw5oLhnHBuW%2F0J%2B74zOoKpgK0%3D

What to expect

Once executed, this command will output the file names, but will not include the full path URL. For example, it might looks like this:

INFO: example_file.txt; Content Length: 270.67 KiB

You should now have a functional copy of your interactive analysis data in your workspace (blob) storage!

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.