The integrated RStudio app runs on a virtual machine or clusters of machines in your workspace Cloud Environment. This article gives step-by-step instructions for starting your RStudio app, customizing the virtual machine (VM) and storage, and working with RStudio in Terra.
If you use R frequently, you’ll want to know that Terra has integrated RStudio capabilities so you can spin up a Cloud Environment preconfigured to run the server version of RStudio right on the Terra platform. The integration is a result of our collaboration with the Bioconductor team as part of the AnVIL project.
Why use RStudio in Terra? - Richer IDE experience for R development
- Support for launching RShiny apps
- First-class Bioconductor support
- Includes variable explorer, R Markdown editor, debugger, terminal
- Git integration
For some additional context, check out our Terra Blog post on the subject, or watch this video tutorial on using RStudio on Terra for analysis.
Starting RStudio in a workspace
1. To launch an instance of RStudio, start in a workspace where you can launch your own Cloud Environments (i.e., you have “can compute” access).
2. Click on the Cloud Environment widget at the top right of your screen.
3. Click the Customize button at the bottom right of the Cloud Environment pane.
4. Click on the Application Configuration drop-down menu.
5. Scroll down to the Community-Maintained RStudio Environments section and select the latest available version of Rstudio.
6. Press the Create or Update button at the bottom right of the widget.
The environment will take a few minutes to start up. Once it’s ready, the widget will have a pause button next to an RStudio logo.
7. To open your RStudio instance, click on the RStudio logo.
1. Go to the Analyses tab of your workspace.
2. Click the cloud icon in the sidebar.
3. Click the gear icon under RStudio/Bioconductor in the Cloud Environment Details pane.
4. To use the default environment, click the Create button.
5. To customize the cloud compute and storage, click the Customize button.
The RStudio Cloud Environment will take a few minutes to start up. You'll see an RStudio logo in the sidebar. When the blue dot below the logo turns green, the RStudio app is ready! You'll get a popup in the top right.
6. Click the Launch RStudio button to access RStudio.
Customizing your RStudio app
If the default RStudio Cloud Environment doesn't fit your needs, you can customize the size of the virtual machine (VM) and dedicated storage in your workspace. You can modify the Cloud Environment at any time, even if you've already started working in the RStudio app.
To adjust the virtual environment and/or compute power of your application, first click on the gear icon in the cloud icon in the sidebar.
Then, to access the RStudio configuration form, click on the Customize button at the bottom right of the (RStudio) Cloud Environment pane.
Specify what you want in the configuration form and let Terra recreate your Cloud Environment with the new specifications.
1. Set the compute power
In the RStudio Cloud Environment configuration pane, you'll see options for setting up the compute power of your virtual machine. If the defaults are inadequate, select a custom compute, where you can specify the primary CPUs, memory, disk sizes and type and location you need. You can spin up a Spark cluster of parallel machines, and specify the number of secondary machines and their CPUs, memory, and disk sizes. To configure a custom compute power, follow the steps below.
3.1. In the Cloud Computer Profile section of the Jupyter Cloud Environment form, choose the specification of your primary machine.
To configure a custom compute power, follow the steps below.
1.1. Select Customize from the bottom of the RStudio Cloud Environment menu.
1.2. In the new form that appears, choose the specification of your primary machine. See the example below.
- CPUs: 8
- Memory (GB): 30
- Disk Size (GB): 100
If you only want one virtual machine and have no other customizations, you're done!
Spark VM instructions
1.3. To configure as a Spark cluster (for parallel processing), first select Spark cluster from the Compute type list.
1.4. Fill in the values for the Worker config.
- Workers: 120
- Preemptibles: 100
- CPUs: 4
- Memory (GB): 15
- Disk size: 500
Finding the cost
The cost of the requested compute power will be displayed in the blue section at the top of the form. For example, when requesting a Spark cluster, your screen will look something like this.
Cost-saving recommendationsSize your compute power appropriately
You pay a fixed amount while a notebook is running, whether or not you are doing active calculations. (Note: Terra automatically pauses RStudio after twenty minutes of browser inactivity). The cost is based on the compute power of your virtual machine or cluster, not on how much computation is being done. So you want to have enough power to do your computations in a reasonable amount of time, but not a lot of extra that you pay for and don't use.
Start small and scale up
Generally, because you don't lose data if you increase resources (e.g., CPUs or disk sizes), it is best to start small and increase as needed.
To learn more about controlling cloud costs in a notebook, see Controlling cloud costs - sample use cases.
Step 2: Choose the Persistent Disk size and type
If the default PD is too large (and you don't want to pay for the extra) or too small, you can adjust in the Cloud Environment setup form. See Detachable persistent disks to learn more about detachable persistent disks for notebook applications in Terra.
You can also choose from a standard or solid state disk (SSD). While solid state disks are much quicker for data retrieval, they are significantly more expensive.
Step 3. Choose other VM options
Below are a number of additional customizations you can make to your Jupyter Cloud Environment.
Terra supports the use of graphics processing units (GPUs) - special processing units optimized for linear algebra computations, such as matrix multiplication - when using Jupyter Notebook Cloud Environments. To learn more, see Getting started with GPUs in a Jupyter Cloud Environment.
RStudio Cloud Environments automatically pause when there is no web browser activity for 30 minutes. To learn more about how autopause on Terra works by default and how and why you can manually override the default settings, see Preventing runaway costs with Cloud Environment autopause.
Long-running jobs? When to adjust the default auopauseNote: Autopause will shut down your RStudio Cloud Environment if there is no browser activity even if you are running calculations at the time. For this reason, if you are running long calculations and know you will not be at your browser, you can adjust the default autopause value. Beware - expanding the autopause time can leave you vulnerable to runaway costs!
VM location (i.e., Google Cloud compute region)
Your Cloud Environment will default to the workspace bucket region, but you can choose a different location for your Cloud Environment VM. To learn more, see Customizing where your data are stored and analyzed.
Note: If you change the location from the value proposed by the UI, you may incur egress charges if your bucket location and interactive analysis Cloud Environment location are different.
Step 4. Save and re-create your Cloud Environment
Don't forget to save the configuration after changing any values. This will re-create the application compute with the new values, which can take up to ten minutes.
You don't have to guess upfront the resources you need to do your work. You can start with minimal settings, then dial them up if you run into limitations.
RStudio Cloud Environment considerationsChanging the Cloud Environment can mean files generated or stored in the application memory will be lost when Terra re-creates it. To avoid this, make sure to keep your Persistent Disk (default) and only increase resources. We also recommend copying all valuable files to Workspace storage (Google bucket).
If you don't have a Persistent Disk, see the section, What real-time updates can you make to Cloud Environment compute resources without losing data? to understand what changes you can make without losing generated data.
Updating your RStudio VM in real time (without losing data)
Your RStudio Cloud Environment comes with storage (the persistent disk, or PD) that is kept by default when you delete or re-create the Cloud Environment. As long as you don't choose to delete this storage, there are many changes you can make - even while RStudio is running or if you transition to working in Jupyter - without worrying about losing data.
You will not lose your data if you pause the Cloud Environment When you pause your RStudio app, the VM or cluster goes away but the Persistent Disk does not. In fact, when you reopen RStudio, the Cloud Environment VM creates more quickly as the disk does not need to be re-created.
Changes that don't put data at risk
Below are changes you can make to the virtual environment where your notebook or RStudio analysis runs without losing data stored in the PD.
Increase or decrease the # of CPUs or VM memory
During this update, Terra will pause the RStudio Cloud Environment, update, and then restart. The update will take a couple of minutes to complete; you cannot continue editing or running RStudio while it's completing.
Increase the disk size
Note: Decreasing the disk size can result in lost data.
Change the number of workers (Spark cluster - number of workers is > 2)
During this update, you can continue to work in RStudio without pausing your Cloud Environment. When the update is finished, you will see a confirmation banner.
Cloud Environment changes that can cause you to lose workNote: This applies to any kind of interactive analysis you run, including RStudio, Jupyter Notebooks, or Galaxy. Please back up files as appropriate.
Decreasing the Persistent Disk size
Deleting the Persistent Disk (when re-creating or deleting the Cloud Environment)
How to change BOTH CPU/memory and number of workers (Spark VM)
Note: If you want to modify both the workers and CPU/memory, we advise doing this sequentially.
1. First, update the CPUs/memory.
2. Wait for the Notebook Cloud Environment to restart.
3. Then adjust the workers.
Using Terminal in RStudio
You can (still) use a terminal in an RStudio instance. Since the terminal icon in the Cloud Environment widget has been replaced with the RStudio logo, you access the terminal the way RStudio users traditionally do - by clicking the "Terminal" tab to the right of the "Console" tab in the RStudio interface itself:
Saving RStudio files
When you save files using the RStudio interface, these files are saved to the "/rstudio" subdirectory of your "/home" folder on your Persistent Disk (PD).
For a more detailed understanding of where your files live within the Terra ecosystem, check out Terra architecture and where your files live in it.
Terra automatically saves .Rmd files to the workspace bucket every 10 seconds. With background synching between RStudio cloud environment and workspace storage, changes persist even when you delete the PD. You can see the autosave status indicated by the color and content below the RStudio logo (top left of the screen).
Copying to another location
For detailed instructions on copying files from your interactive environment to your workspace storage (or external Google bucket), Saving data from interactive analysis to workspace storage.
Switching between RStudio and Jupyter Notebooks
The RStudio app is not compatible with the Jupyter Notebooks listed in your Notebooks tab. If you try to open a Jupyter Notebook after you’ve created an RStudio environment, you’ll get a message prompting you to update your Cloud Environment to a Jupyter-based image.
However, if you've kept your PD, once you replace your Cloud Environment with a Jupyter-compatible configuration, you can open a notebook, open the terminal view into your virtual machine through that notebook, and see that the files you saved to your persistent disk are still there: