Starting and customizing your RStudio app

Anton Kovalsky
  • Updated

The integrated RStudio app runs on a virtual machine or clusters of machines in your workspace Cloud Environment. This article gives step-by-step instructions for starting your RStudio app, customizing the VM and storage, and working with RStudio in Terra.

If you use R frequently, you’ll be happy to know that Terra has integrated RStudio capabilities so you can spin up a Cloud Environment pre-configured to run the server version of RStudio right on the Terra platform. The integration is a result of our collaboration with the Bioconductor team as part of the AnVIL project.

Why use RStudio in Terra?     

      - Richer IDE experience for R development
      - Support for launching RShiny apps
      - First-class Bioconductor support
      - Includes variable explorer, R Markdown editor, debugger, terminal
      - Git integration 

For some additional context, check out our Terra Blog post on the subject, or watch this video tutorial on using RStudio on Terra for analysis. 

Starting RStudio

1. Go to the Analyses tab of your workspace.

2. Click the cloud icon in the sidebar.

3. Click the gear icon under RStudio/Bioconductor in the Cloud Environment Details pane. 

4. To use the default environment, click the Create button. 

5. To customize the cloud compute and storage, click the Customize button. 

The RStudio Cloud Environment will take a few minutes to start up. You'll see an RStudio logo in the sidebar. When the blue dot below the logo turns green, the RStudio app is ready! You'll get a popup in the top right.

RStudio-Launch_Screen_shot.png

6. Click the Launch RStudio button to access RStudio. 

RStudio-First-screen_Screen_shot.png

Customizing your RStudio app

If the default RStudio Cloud Environment doesn't fit your needs, you can customize the size of the VM  and dedicated storage in your workspace. You can modify the Cloud Environment at any time, even if you've already started working in the RStudio app.

To adjust the virtual environment and/or compute power of your application, first click on the gear icon in the cloud icon in the sidebar

Then, to access the RStudio configuration form, click on the Customize button at the bottom right of the (RStudio) Cloud Environment pane.
Customizing-the-cloud-environment_Default-settings_Screen_shot.png

You'll specify what you want in the configuration form and let Terra recreate your Cloud Environment with the new specifications.

1. Set the compute power

In the RStudio Cloud Environment configuration pane, you'll see options for setting up the compute power of your virtual machine. If the defaults are not adequate for your needs, you can select a custom compute, where you can specify the primary CPUs, memory, disk sizes and type and location you need. You can spin up a Spark cluster of parallel machines, and specify the number of secondary machines and their CPUs, memory, and disk sizes. To configure a custom compute power, follow the steps below.

3.1. In the Cloud Computer Profile section of the Jupyter Cloud Environment form, choose the specification of your primary machine.

To configure a custom compute power, follow the steps below.

1.1. Select Customize from the bottom of the RStudio Cloud Environment menu.

1.2. In the new form that appears, choose the specification of your primary machine. See the example below. 

  • CPUs: 8
  • Memory (GB): 30
  • Disk Size (GB): 100

If you only want one virtual machine and have no other customizations, you're done!

Spark VM instructions

1.3. To configure as a Spark cluster (for parallel processing), first select Spark cluster from the Compute type list.

spark-cluster.png

1.4. Fill in the values for the Worker config.

  • Workers: 120
  • Preemptibles: 100
  • CPUs: 4
  • Memory (GB): 15
  • Disk size: 500

Finding the cost

The cost of the requested compute power will be displayed in the blue section at the top of the form. For example, when requesting a Spark cluster, your screen will look something like this.custom-compute.png

Cost-saving recommendationsSize your compute power appropriately
You pay a fixed amount while a notebook is running, whether or not you are doing active calculations (note, however, that Terra automatically pauses RStudio after twenty minutes of browser inactivity).The cost is based on the compute power of your virtual machine or cluster, not how much computation is being done. So you want to have enough power to do your computations in a reasonable amount of time, but not a lot of extra that you will be paying for and not using.   

Start small and scale up
Because you generally don't lose any data if you increase resources (CPUs or disk sizes, for example), it is generally best to start small and increase as needed. 

To learn more about controlling cloud costs in a notebook, see Controlling cloud costs - sample use cases.

Step 2: Choose the Persistent Disk size and type

If the default PD is too large (and you don't want to pay for the extra) or too small, you can adjust in the Cloud Environment setup form. See Detachable persistent disks to learn more about detachable persistent disks for notebook applications in Terra.

You can also choose from a standard or solid state disk (SSD). While solid state disks are much quicker for data retrieval, they are significantly more expensive. 

Step 3. Choose other VM options

Below are a number of additional customizations you can make to your Jupyter Cloud Environment. 

GPUs

Terra supports the use of graphics processing units (GPUs) - special processing units optimized for linear algebra computations, such as matrix multiplication - when using Jupyter notebook cloud environments. To learn more, see Getting started with GPUs in a Jupyter Cloud Environment.

Autopause

RStudio Cloud Environments will automatically pause when there is no web browser activity for 30 minutes. To learn more about how autopause on Terra works by default and how and why you can manually override the default settings, see Preventing runaway costs with Cloud Environment autopause.

Long-running jobs? When to adjust the default auopauseNote that autopause will shut down your RStudio Cloud Environment if there is no browser activity even if you are running calculations at the time. For this reason, if you are running long calculations and know you will not be at your browser, you can adjust the default autopause value. Beware, however, that expanding the autopause time can leave you vulnerable to runaway costs!

VM location (i.e. GCP compute region)

Your Cloud Environment will default to the workspace bucket region, but you can choose a different location for your Cloud Environment VM. To learn more, see Customizing where your data are stored and analyzed

Note that if you change the location from the value proposed by the UI, you may incur egress charges if your bucket location and interactive analysis Cloud Environment location are different.

Step 4. Save, and recreate your Cloud Environment

Don't forget to save the configuration after changing any values. This will recreate the application compute with the new values, which can take up to ten minutes. 

It's not necessary to guess upfront the resources you're going to need to do your work. You can start with minimal settings, then dial them up if you run into limitations. 

RStudio Cloud Environment considerationsChanging the Cloud Environment can mean files generated or stored in the application memory will be lost when Terra recreates it. To avoid this, make sure to keep your Persistent Disk (default) and only increase resources.  We also recommend copying all valuable files to Workspace storage (Google bucket).

If you don't have a Persistent Disk, see the section, What real-time updates can you make to Cloud Environment compute resources without losing data? to understand what changes you can make without losing generated data.

Updating your RStudio VM in real time (without losing data)

Your RStudio Cloud Environment comes with storage (the persistent disk, or PD) that is kept by default when you delete or re-create the Cloud Environment. As long as you don't choose to delete your this storage, there are many changes you can make - even while RStudio is running or if you transition to working in Jupyter - without worrying about losing any data.

You will not lose your data if you pause the Cloud Environment When you pause your RStudio app, the VM or cluster goes away but the Persistent Disk does not. In fact, when you re-open RStudio, the Cloud Environment VM creates more quickly as the disk does not need to be recreated.

Changes that don't put data at risk

Below are changes you can make to the virtual environment where your notebook or RStudio analysis runs without losing any data stored in the PD. 

  • Increase or decrease the # of CPUs or VM memory

    During this update, Terra will pause the RStudio Cloud Environment, update, and then restart. The update will take a couple of minutes to complete, and you will not be able to continue editing or running RStudio while it's completing.

  • Increase the disk size

    Note that decreasing the disk size can result in lost data.

  • Change the number of workers (Spark cluster - number of workers is > 2)

    During this update, you can continue to work in RStudio without pausing your Cloud Environment. When the update is finished, you will see a confirmation banner. 

Cloud Environment changes that can cause you to lose workNote that this applies no matter what kind of interactive analysis you are running, including RStudio, Jupyter Notebooks, or Galaxy. Please back up files as appropriate.

     Decreasing the Persistent Disk size

     Deleting the Persistent Disk (when re-creating or deleting the Cloud Environment)

How to change BOTH CPU/memory and number of workers (Spark VM)

Note that if you want to modify both the workers and CPU/memory, we advise doing this sequentially. 

1. First, update the CPUs/memory.

2. Wait for the Notebook Cloud Environment to restart.

3. Then adjust the workers.

Using Terminal in RStudio

You can (still) use a terminal in an RStudio instance. Since the terminal icon in the Cloud Environment widget has been replaced with the RStudio logo, you access the terminal in the way RStudio users traditionally do - by clicking the "Terminal" tab to the right of the "Console" tab in the RStudio interface itself:
2021-03-26_0941.png

Saving RStudio files

When you save files using the RStudio interface, these files are saved to the "/rstudio" subdirectory of your "/home" folder on your Persistent Disk (PD). 

For a more detailed understanding of where your files live within the Terra ecosystem, check out this article.

Autosave function

Terra automatically saves .Rmd files to the workspace bucket every 10 seconds. Background synching between RStudio cloud environment and workspace storage mean changes persist even when you delete the PD. You can see the autosave status indicated by the color and content below the RStudio logo (top left of the screen). 

Copying to another location

For detailed instructions on copying files from your interactive environment to your workspace storage (or external Google bucket), see this article.
Screen_Shot_2021-03-17_at_3.18.59_PM.png

Switching between RStudio and Jupyter Notebooks

The RStudio app is not compatible with the Jupyter notebooks listed in your Notebooks tab. If you try to open a Jupyter Notebook after you’ve created an RStudio environment, you’ll get a message prompting you to update your cloud environment to a Jupyter-based image.
Screen_Shot_2021-03-17_at_12.08.01_PM.png

Shared storage

However, if you've kept your PD, once you've replaced your Cloud Environment with a Jupyter-compatible configuration, you should be able to open a notebook, open the terminal view into your virtual machine through that notebook, and see that the files you saved to your persistent disk are still there:
Screen_Shot_2021-03-18_at_12.58.10_PM.png

Was this article helpful?

1 out of 1 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.