Learn how to move files from local storage to your workspace Cloud Environment storage (or from the Cloud Environment to local storage).
Overview: Cloud Environment "Persistent Disk" storage
The Cloud Environment is the virtual system (compute and storage) that runs an interactive analysis on Terra (i.e., Galaxy, Jupyter Notebooks, or RStudio). The Persistent Disk (PD) is storage mounted to the Cloud Environment virtual machine (VM) that can be detached and attached to another VM. It is similar to USB storage on a personal computer. The PD exists in the cloud, but for security reasons is isolated from the rest of the cloud: no inbound communication is possible from the internet into the VM or associated PD storage.
Reasons to copy data from local storage to the interactive analysis PDIf you want to run an interactive analysis (Galaxy, Jupyter Notebook or RStudio) on data stored locally, first, you need to upload the data to the Cloud Environment PD.
Reasons to copy/move from the PD to local storage1. To back up generated data from an interactive analysis in Terra to local storage.
2. To allow a colleague to access generated data from an interactive analysis.
To learn more about the Terra ecosystem (infrastructure) see, Terra architecture and where your files live in it.
Overview: How to move data between local and Cloud Environment storage
Because files in local storage and in the Cloud Environment PD have no cloud identifier, you can only move data along the paths with arrows in the diagram above. There is no way to communicate directly between the PD and local storage. Thus, transfers between local storage and the PD require two steps (with moving to the workspace bucket an intermediate step in both cases).
Why does one-way communication work?You can push data from the PD to a workspace or external bucket or pull data from the workspace or external bucket into the PD using the Cloud Environment terminal. Files stored on the persistent disk are visible because it runs on the VM where the PD is mounted.
Similarly, you can push data from local storage to a workspace or external bucket or pull data from the workspace or external bucket into local storage using command line tools in a local terminal instance.
How to do it
You'll use a command-line tool in the Cloud Environment terminal instance. There are many tools (gsutil, ftp, or ssh) built into the workspace terminal instance, including gsutil, sshe, ftp. See the step-by-step instructions below to copy/move files using gsutil.
How to transfer between local storage and PD
-
This includes copying from your PD to your institutional HPC or personal computer. Because both local storage and the PD are isolated from the rest of the internet, this is a two-step process. Copying to the Workspace bucket is an intermediate step.
1. Move/copy files from local storage to the Workspace bucket following directions here.
2. Move/copy files from the Workspace bucket to the Persistent Disk following directions here.
-
This includes copying from your PD to your institutional HPC or personal computer. Because both the PD and local storage are isolated from the rest of the internet, this is a two-step process. Copying to the Workspace bucket is an intermediate step.
When would you do this?This is useful if you want to back up data generated in an interactive analysis (Galaxy, notebook, or RStudio) to local storage.
1. First, move/copy files from the PD to workspace bucket
Use instructions here. Remember to reverse the order in the gsutil command so the local path is first and the workspace bucket is second.2. Move/copy files from the workspace bucket to local storage
You will use gsutil running in a terminal on your local system to pull data from the workspace bucket. Follow the directions in this article.