How (and why) to save RStudio data to workspace storage

Allie Cliffe
  • Updated

Your RStudio Cloud Environment includes a detachable persistent disk (PD) that maintains generated data even when you recreate the virtual machine. There are times, however, that you will want to copy the data to more permanent cloud storage: when archiving data, for example, or to allow collaborators or workflows access. This article describes how to move data from the RStudio persistent disk storage to a Google bucket (including your workspace storage) when working in RStudio in Terra. 

Why copy RStudio-generated data to workspace or external storage

Below are the primary reasons you might want to copy data generated in RStudio (stored in the RStudio persistent disk by default) to workspace storage (or external Google bucket).  

To use as input for a workflow

Files generated by RStudio are not automatically saved in workspace storage (Google bucket) and are not accessible outside your personal virtual RStudio Cloud Environment.

To share data with collaborators (including in a shared workspace)

Note that you will need to copy data to workspace storage if you want colleagues to have access. This is true even if you are working in a shared workspace, since each user has their own unique workspace Cloud Environment and Persistent Disk.

To archive data

If you want to save valuable data, or archive data, especially if you want to copy it to less expensive Nearline or Coldline storage, you will first need to copy it to an external bucket. 

To safeguard data when re-creating or deleting the Persistent Disk (PD)

If you delete your PD or reconfigure your Cloud Environment in some ways (decreasing your VM memory or persistent disk), you can lose all or some generated data unless you explicitly save your output to workspace or external storage (i.e. Google bucket).

Don't lose data when running both Jupyter and RStudio!Note that you will have to recreate the Cloud Environments when swapping between RStudio and Jupyter in the same workspace. To protect data  (Jupyter and RStudio have a shared persistent disk), it is important to only make changes that will maintain your PD data integrity Ii.e. increasing disk size and keeping the same disk type). To learn more, see Updating your RStudio Cloud Environment without losing data.

How to copy RStudio data to workspace storage

To move generated data to permanent cloud storage, follow the directions below. Note that this can be workspace storage or an external Google bucket. 

1. Work in the built-in RStudio terminal

You can access a bash terminal from the Terminal tab in the main RStudio pane:
RStudio-terminal-function_Screen_shot.png

2. Set the variable "bucket" for the destination storage

Setting a variable makes it so you can copy/paste the commands from the documentation. 

To use the workspace bucket for storage, run the command bucket="$WORKSPACE_BUCKET".

To save data to an external Google bucket, run the command bucket="$gs://<your-bucket-name>".

WORKSPACE_BUCKET is an environment variable that is pre-defined when using the terminal in Terra. Using environment variables lets RStudio grab the workspace Google bucket directly. This helps avoid hardcoding these variables into the code to move the data. Use the syntax below:

3. Save files to "bucket" with bash commands 

Note: workspace storage is a Google bucket, so basic bash commands in the RStudio terminal need to be preceded by "gsutil."

To copy all files generated in the notebook into the bucket, use the command:
gsutil cp * "$bucket"

To make sure the files are in the bucket, you can run the following:
gsutil ls "$bucket"

Be careful when copying all filesUsing `*` can mean copying a lot of large files, which can be expensive. Be sure to check the size of the files in the bucket after copying! If you want to copy individual files, you can replace `*` with the file name to copy.

Additional resources

To learn more about your workspace Cloud Environment storage, see Detachable Persistent Disks

For additional bash capabilities, see Using the terminal and interactive shell in Terra.

A deeper dive: Terra's Cloud Environment To understand what's under the hood and why RStudio and notebooks have these characteristics, see this article about key notebook components or this article about key notebook operations.

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.