The integrated RStudio app runs on a virtual machine or clusters of machines in your workspace Cloud Environment. This article gives step-by-step instructions for starting your RStudio app, customizing the VM and storage, and working with RStudio in Terra.
If you use R frequently, you’ll be happy to know that Terra has integrated RStudio capabilities so you can spin up a Cloud Environment pre-configured to run the server version of RStudio right on the Terra platform. The integration is a result of our collaboration with the Bioconductor team as part of the AnVIL project.
Why use RStudio in Terra? - Richer IDE experience for R development
- Support for launching RShiny apps
- First-class Bioconductor support
- Includes variable explorer, R Markdown editor, debugger, terminal
- Git integration
For some additional context, check out our Terra Blog post on the subject, or watch this video tutorial on using RStudio on Terra for analysis.
Starting RStudio in a workspace
1. To launch an instance of RStudio, start in a workspace where you’re able to launch your own Cloud Environments (i.e. you have “can compute” access).
2. Click on the Cloud Environment widget at the top right of your screen.
3. Click the Customize button at the bottom right of the Cloud Environment pane.
4. Click on the Application Configuration drop-down menu.
5. Scroll down to the Community-Maintained RStudio Environments section and select the latest available version of Rstudio.
6. Press the Create or Update button at the bottom right of the widget.
The environment will take a few minutes to start up. Once it’s ready, the widget will have a pause button next to an RStudio logo.
7. To open your RStudio instance, click on the RStudio logo.
1. Go to the Analyses tab of your workspace.
2. Click the cloud icon in the sidebar.
3. Click the gear icon under RStudio/Bioconductor in the Cloud Environment Details pane.
4. To use the default environment, click the Create button.
5. To customize the cloud compute and storage, click the Customize button.
The RStudio Cloud Environment will take a few minutes to start up. You'll see an RStudio logo in the sidebar. When the blue dot below the logo turns green, the RStudio app is ready! You'll get a popup in the top right.
6. Click the Launch RStudio button to access RStudio.
Customizing your RStudio app
If the default RStudio Cloud Environment doesn't fit your needs, you can customize the size of the VM and dedicated storage in your workspace. You can modify the Cloud Environment at any time, even if you've already started working in the RStudio app.
To adjust the virtual environment and/or compute power of your application, first click on the gear icon:
- (Notebooks tab display)
in the widget at the top right of your workspace
- (Analyses tab display)
the cloud icon in the sidebar
Then, to access the RStudio configuration form, click on the Customize button at the bottom right of the (RStudio) Cloud Environment pane.
You'll specify what you want in the configuration form and let Terra recreate your Cloud Environment with the new specifications.
1. Set the compute power
In the RStudio Cloud Environment configuration pane, you'll see options for setting up the compute power of your virtual machine. If the defaults are not adequate for your needs, you can select a custom compute, where you can specify the primary CPUs, memory, disk sizes and type and location you need. You can spin up a Spark cluster of parallel machines, and specify the number of secondary machines and their CPUs, memory, and disk sizes. To configure a custom compute power, follow the steps below.
3.1. In the Cloud Computer Profile section of the Jupyter Cloud Environment form, choose the specification of your primary machine.
To configure a custom compute power, follow the steps below.
1.1. Select Customize from the bottom of the RStudio Cloud Environment menu.
1.2. In the new form that appears, choose the specification of your primary machine. See the example below.
- CPUs: 8
- Memory (GB): 30
- Disk Size (GB): 100
If you only want one virtual machine and have no other customizations, you're done!
Spark VM instructions
1.3. To configure as a Spark cluster (for parallel processing), first select Spark cluster from the Compute type list.
1.4. Fill in the values for the Worker config.
- Workers: 120
- Preemptibles: 100
- CPUs: 4
- Memory (GB): 15
- Disk size: 500
Finding the cost
The cost of the requested compute power will be displayed in the blue section at the top of the form. For example, when requesting a Spark cluster, your screen will look something like this.
Cost-saving recommendationsSize your compute power appropriately
You pay a fixed amount while a notebook is running, whether or not you are doing active calculations (note, however, that Terra automatically pauses RStudio after twenty minutes of browser inactivity).The cost is based on the compute power of your virtual machine or cluster, not how much computation is being done. So you want to have enough power to do your computations in a reasonable amount of time, but not a lot of extra that you will be paying for and not using.
Start small and scale up
Because you generally don't lose any data if you increase resources (CPUs or disk sizes, for example), it is generally best to start small and increase as needed.
To learn more about controlling cloud costs in a notebook, see Controlling cloud costs - sample use cases.
Step 2: Choose the Persistent Disk size and type
If the default PD is too large (and you don't want to pay for the extra) or too small, you can adjust in the Cloud Environment setup form. See Detachable persistent disks to learn more about detachable persistent disks for notebook applications in Terra.
You can also choose from a standard or solid state disk (SSD). While solid state disks are much quicker for data retrieval, they are significantly more expensive.
Step 3. Choose other VM options
Below are a number of additional customizations you can make to your Jupyter Cloud Environment.
Terra supports the use of graphics processing units (GPUs) - special processing units optimized for linear algebra computations, such as matrix multiplication - when using Jupyter notebook cloud environments. To learn more, see Getting started with GPUs in a Jupyter Cloud Environment.
RStudio Cloud Environments will automatically pause when there is no web browser activity for 30 minutes. To learn more about how autopause on Terra works by default and how and why you can manually override the default settings, see Preventing runaway costs with Cloud Environment autopause.
Long-running jobs? When to adjust the default auopauseNote that autopause will shut down your RStudio Cloud Environment if there is no browser activity even if you are running calculations at the time. For this reason, if you are running long calculations and know you will not be at your browser, you can adjust the default autopause value. Beware, however, that expanding the autopause time can leave you vulnerable to runaway costs!
VM location (i.e. GCP compute region)
Your Cloud Environment will default to the workspace bucket region, but you can choose a different location for your Cloud Environment VM. To learn more, see Customizing where your data are stored and analyzed.
Note that if you change the location from the value proposed by the UI, you may incur egress charges if your bucket location and interactive analysis Cloud Environment location are different.
Step 4. Save, and recreate your Cloud Environment
Don't forget to save the configuration after changing any values. This will recreate the application compute with the new values, which can take up to ten minutes.
It's not necessary to guess upfront the resources you're going to need to do your work. You can start with minimal settings, then dial them up if you run into limitations.
RStudio Cloud Environment considerationsChanging the Cloud Environment can mean files generated or stored in the application memory will be lost when Terra recreates it. To avoid this, make sure to keep your Persistent Disk (default) and only increase resources. We also recommend copying all valuable files to Workspace storage (Google bucket).
If you don't have a Persistent Disk, see the section, What real-time updates can you make to Cloud Environment compute resources without losing data? to understand what changes you can make without losing generated data.
Updating your RStudio VM in real time (without losing data)
Your RStudio Cloud Environment comes with storage (the persistent disk, or PD) that is kept by default when you delete or re-create the Cloud Environment. As long as you don't choose to delete your this storage, there are many changes you can make - even while RStudio is running or if you transition to working in Jupyter - without worrying about losing any data.
You will not lose your data if you pause the Cloud Environment When you pause your RStudio app, the VM or cluster goes away but the Persistent Disk does not. In fact, when you re-open RStudio, the Cloud Environment VM creates more quickly as the disk does not need to be recreated.
Changes that don't put data at risk
Below are changes you can make to the virtual environment where your notebook or RStudio analysis runs without losing any data stored in the PD.
Increase or decrease the # of CPUs or VM memory
During this update, Terra will pause the RStudio Cloud Environment, update, and then restart. The update will take a couple of minutes to complete, and you will not be able to continue editing or running RStudio while it's completing.
Increase the disk size
Note that decreasing the disk size can result in lost data.
Change the number of workers (Spark cluster - number of workers is > 2)
During this update, you can continue to work in RStudio without pausing your Cloud Environment. When the update is finished, you will see a confirmation banner.
Cloud Environment changes that can cause you to lose workNote that this applies no matter what kind of interactive analysis you are running, including RStudio, Jupyter Notebooks, or Galaxy. Please back up files as appropriate.
Decreasing the Persistent Disk size
Deleting the Persistent Disk (when re-creating or deleting the Cloud Environment)
How to change BOTH CPU/memory and number of workers (Spark VM)
Note that if you want to modify both the workers and CPU/memory, we advise doing this sequentially.
1. First, update the CPUs/memory.
2. Wait for the Notebook Cloud Environment to restart.
3. Then adjust the workers.
Using Terminal in RStudio
You can (still) use a terminal in an RStudio instance. Since the terminal icon in the Cloud Environment widget has been replaced with the RStudio logo, you access the terminal in the way RStudio users traditionally do - by clicking the "Terminal" tab to the right of the "Console" tab in the RStudio interface itself:
Saving RStudio files
When you save files using the RStudio interface, these files are saved to the "/rstudio" subdirectory of your "/home" folder on your Persistent Disk (PD). Make sure you use the terminal to move these files if you're going to delete or replace your PD when deleting or updating your Cloud Environment.
For a more detailed understanding of where your files live within the Terra ecosystem, check out this article.
In some cases, you may want to copy data from your interactive cloud environment to another location, to keep from losing work while deleting or modifying your persistent disk. For detailed instructions on copying files from your interactive environment to your workspace storage (Google bucket), see this article.
Switching between RStudio and Jupyter Notebooks
The RStudio app is not compatible with the Jupyter notebooks listed in your Notebooks tab. If you try to open a Jupyter Notebook after you’ve created an RStudio environment, you’ll get a message prompting you to update your cloud environment to a Jupyter-based image.
However, if you've kept your PD, once you've replaced your Cloud Environment with a Jupyter-compatible configuration, you should be able to open a notebook, open the terminal view into your virtual machine through that notebook, and see that the files you saved to your persistent disk are still there: