Interactive analysis apps in Terra - Jupyter Notebooks, RStudio, and Galaxy - run on a "laptop in the cloud". Each user's laptop in the cloud is composed of virtual machines (VMs) or clusters of machines + software + disk storage (collectively called your workspace Cloud Environment). This article provides an overview of the Cloud Environment components that run your interactive analyses.
To learn how to create and customize your analysis app VM, see articles for Jupyter, RStudio, and Galaxy.
Analyses components overview
Integrated analysis apps (Galaxy, Jupyter, and RStudio) run in a workspace Cloud Environment (your "laptop in the cloud) made up of a virtual machine (VM) compute engine (CPUs or GPUs) plus software and a persistent storage disk. The diagram below illustrates the distinct components that make up your analysis app cloud environment. You can customize the ones in white boxes. Note: Software can only be customized for Jupyter.
Your laptop in the cloud is unique to you and inaccessible to colleaguesEach shared Workspace has its own shared infrastructure (data tables and workspace Bucket, for example) but each user configures their own unique laptop in the cloud resources when starting a new interactive analysis session. Your Cloud Environment resources (including persistent disk storage) resources are not transferable to other users, even those in a shared workspace.
Customizable components
Cloud Environment VM (compute profile) | Software (application configuration) | Storage (persistent disk)
VM (compute profile)
The compute profile is the CPU and RAM available to run your application, which determines how much processing can be done at a time. To balance cost and functionality, you can customize the compute profile in all analysis apps on Terra. You can also enable GPUs.
Estimating your computational needs
Featured and template workspaces and notebooks include recommended and project-specific compute profiles, as well as estimated costs to run (where possible). Since it is fairly straightforward to adjust the compute power, you can estimate an initial power and dial it up or down as needed. Just be careful to save any generated data you want to keep when re-creating the cloud environment (see below).
Compute power comes with a cost! Find the balance that's right for you. More compute power costs more, and you don't want to request (and pay for) significantly more than your computation needs. Running a high-powered notebook costs a certain amount per unit time no matter what computations it does. You don't need (or want to pay for) a high-performance parallel Spark cluster if you're running a simple, nonparallel computation.
To learn more, see Overview: Controlling Cloud costs on Terra.
If your analysis is running slow
This could mean the CPUs and memory allotted are insufficient for the computations you're doing. You can test this by rerunning some of the analysis with a higher compute or RAM, as a test.
Saving time versus money
It may be worth the cost (Google Cloud fees) of increasing the compute power so your analysis will complete quicker in real time (see the blog post Understand hardware and regulatory limits and trade-offs).
Compute Location
Your analysis app VM runs in a Google Cloud location. By default, Cloud Environments will run in the us-central1
region. If your workspace Bucket is located outside the US, you can modify the location of your Cloud Environment.
Recommended best practice is to choose the same location for your workspace bucket and Cloud Environment to minimize cross-region egress costs. To learn more, see US regional versus Multi-regional US buckets: trade-offs.
The location of a Cloud Environment cannot be changed once created
If you want a new location, you must create a new Cloud Environment for your laptop in the cloud.
Software (application configuration)
The application configuration is the sum total of software, packages, libraries, and dependencies that are preinstalled in the Cloud Environment container. These fall into two distinct categories: the Interactive Analysis Operating System as well as user-managed software.
Interactive Analysis (IA) operating system (OS)
This includes the preinstalled (that is, Terra-installed) sum total of software, packages, libraries, and dependencies. The IDEs (Jupyter, R Studio) fit into this category, as do base libraries like Python or R.
Note: within this classification there currently exists many 3rd party software packages like Bioconductor.
User managed software
This includes all user-installed software, packages, libraries, and dependencies - what the users install to extend their “application configuration” beyond the default.
Note: within this classification there currently exists 3rd party software like plink.
Your analysis app Cloud Environment is unique to you!Your colleagues might not have the same software packages (if you're running a Jupyter app) or be able to access anything stored in your Cloud Environment Persistent Disk (i.e., generated data) - even in a shared workspace.
To learn how to standardize software when running Jupyter, see Starting and customizing your Jupyter app. To learn more about how to share data generated in an interactive app, see How and why to save data generated in a notebook or RStudio to your workspace bucket.
Jupyter software
Jupyter environments are fully customizable with both preconfigured and custom environment options. To learn more, see Starting and customizing your Jupyter app.
RStudio software
RStudio uses the community-maintained RStudio image (R, Bioconductor and Python). To learn more, see Starting and customizing your RStudio app.
Galaxy software
Galaxy instances use the latest version of Galaxy. To learn more, see Starting and customizing Galaxy on Terra.
Storage (Persistent Disk)
Your Cloud Environment comes with Persistent Disk (PD) storage by default, that stores files in your Cloud Environment even after you delete the VM or cluster. The PD can be kept when you delete your Cloud Environment, and can be reattached to a newly created VM.
What's stored in the PD
- User-managed packages like plink
- Analysis files themselves (‘.ipynb’ and ‘.R’ files, for example)
- Input files necessary for analysis
- Generated outputs
The persistent disk as storage lets you keep your notebook code packages, input files necessary for analysis, and generated outputs - without having to move anything to permanent cloud storage.
Data in PD is not available outside the user-specific Cloud EnvironmentBecause the PD is not accessible from outside each individual's laptop in the cloud (the Cloud Environment), data generated in a notebook cannot be used as input for a workflow analysis, and it's not accessible by other collaborators in a shared workspace.
The one exception is the Analysis files themselves (.ipynb and .R files, for example), which Terra automatically syncs to the workspace Bucket to make them shareable. All other PD data must be manually migrated to the workspace Bucket to be used as input for a workflow analysis.
The data in the PD is also app-specific (as well as user-specific). You cannot port data from your Jupyter PD to your RStudio PD, for example.
To learn more about saving data generated in a notebook to permanent cloud storage (for access outside the Cloud Environment or archiving), see How (and why) to save data generated in a notebook to a Workspace bucket.
To see your Cloud Environment configuration
Each user's Cloud Environment settings can be differentTo ensure a consistent analysis environment across all team Cloud Environments, we strongly recommend using one of the default Cloud Environments, or using a startup script or custom Docker to standardize your analysis environment.
1. Go to the Analyses tab and click the cloud icon in the right sidebar.
2. In the Cloud Environment Details pane, select the gear icon for the application you are running (or want to run).
Customizing your analysis app Cloud Environment
If no analysis app is running, you'll expose the app's default Cloud Environment. The default environment will allow you to run many analyses without any changes.
-
The default Jupyter environment includes up-to-date versions of GATK, Python and R, 1 CPU with 3.75 GB of RAM, and 50 GB of persistent disk storage.
The cost of the components (VM when running or paused, or the persistent disk) is displayed near the bottom of the form.
To customize your Jupyter VM (including using clusters!), you'll use the Customize button at the bottom right of the form. For step-by-step instructions, see Starting and customizing your Jupyter app.
-
The default RStudio environment includes up-to-date versions of RStudio, Bioconductor, and Python, 1 CPU with 3.75 GB of RAM, and 50 GB of persistent disk storage.
The cost of the components (VM when running or paused, or the persistent disk) is displayed in a blue section near the bottom of the form.
You can customize your RStudio Cloud Environment by selecting the Customize button at the bottom right. For step-by-step instructions, see Creating and customizing your RStudio VM.
-
The default Galaxy environment includes up-to-date version of Galaxy, 1 node with 8 CPUs, 52 GB of RAM, and 500 GB of persistent disk storage.
The cost of the components (VM when running or paused, or the persistent disk) is displayed in blue at the top of the form.
You can change the Galaxy compute profile right in this pane. For step-by-step instructions, see Starting and customizing Galaxy on Terra.
Be aware of laptop in the cloud cost!Remember that you will be charged for Cloud Environments as long as they exist, whether or not you run calculations! Terra has a autopause feature for Jupyter and RStudio Cloud Environments that prevents your laptop in the cloud from running if you're not doing any calculations (see Preventing runaway costs with autopause for more details).
Note that Galaxy has no autopause feature!! You can check the status of all your Cloud Environments and persistent disks on the Cloud Environments page.