How (and why) to save RStudio data to workspace storage

Allie Cliffe
  • Updated

Your RStudio Cloud Environment includes a detachable persistent disk that maintains generated data even when you re-create the virtual machine that your analysis ran on. Sometimes, though, you will want to copy data files to more permanent cloud storage. This article describes how to move data from the RStudio persistent disk storage to a Google bucket (including your workspace storage) when working in RStudio in Terra. 

Why copy RStudio-generated data to workspace or external storage?

The data that you generate when you run an analysis in RStudio are stored in the RStudio virtual machine's persistent disk (PD) by default. Here are the primary reasons why you may want to copy that data to your workspace's permanent storage (or an external Google bucket).  

To analyze the data with another Terra tool

Files saved to the persistent disk are not accessible outside of your virtual RStudio Cloud Environment, even to other Terra analysis tools in your own workspace. Your files must be saved to your workspace storage bucket or another bucket outside of Terra in order to access them from a workflow, Jupyter notebook, or Galaxy analysis.

To share data with collaborators (including in a shared workspace)

You need to copy data to workspace storage if you want colleagues to have access to it. This is true even if you are working in a shared workspace, since each user has their own unique workspace Cloud Environment and Persistent Disk.

To archive data

If you want to save valuable data for the long term -- especially if you want to copy it to less expensive Nearline or Coldline storage -- you need to copy it to an external bucket first. 

To safeguard data when re-creating or deleting the Persistent Disk (PD)

If you delete your PD or reconfigure your Cloud Environment in some ways (for example, decreasing your virtual machine's memory or the size of your persistent disk), you can lose all or some of the data on your PD, unless you explicitly save your output to workspace or external storage (i.e., Google bucket).

How to copy RStudio data to workspace storage

To move generated data to permanent cloud storage, follow the directions below. Note: the permanent cloud storage can be the workspace's storage or an external Google bucket. 

1. Open the built-in Bash terminal in the RStudio app

You can access a bash terminal from the Terminal tab in the main RStudio pane.
Screenshot of the main RStudio pane in a Terra workspace with the terminal tab on the left side at the top highlighted with an arrow.

2. Copy your file(s) to using gcloud commands

To copy all of your files to the workspace bucket, run the following command in the bash terminal:

gcloud storage cp * "$WORKSPACE_BUCKET"

To copy a single file, replace the * with a specific file's name; for example, gcloud storage cp helloWorld.txt "$WORKSPACE_BUCKET".

To copy all of your files to a Google bucket outside of your Terra workspace, run the following command in the bash terminal:

gcloud storage cp * "$gs://<your-bucket-name". You can find your bucket's name through the Google Cloud console.

Be careful when copying all filesUsing * can mean copying a lot of large files, which can be expensive. Be sure to check the size of the files in the bucket after copying! If you want to copy individual files, you can replace * with the file name to copy.

What to expect

Once the files have copied over to the new storage bucket, you should be able to list them by running the following command in the bash terminal:

gcloud storage ls "$WORKSPACE_BUCKET" (if you copied the files to your workspace bucket), or gcloud storage ls "$gs://<your-bucket-name" (if you copied the files to an external Google bucket).

If you copied your files to your workspace's storage bucket, you can also see them in the Data tab, under the Files section at the bottom of the left-hand panel.

Additional resources

To learn more about your workspace Cloud Environment storage, see Overview: Cloud environment storage (detachable persistent disks)

For additional bash capabilities, see Using the terminal and interactive shell in Terra.

For complete details on all gcloud storage options, see Google's documentation

A deeper dive: Terra's Cloud Environment To understand what's under the hood and why RStudio and notebooks have these characteristics, see this article about key notebook components or this article about key notebook operations.

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.