Interactive analysis apps - Jupyter Notebooks, RStudio, and Galaxy - run on virtual machines or clusters of machines in your workspace Cloud Environment. This article gives an overview of the Cloud Environment components that run your analyses. To learn how to create and customize your analysis app VM, see articles for Jupyter, RStudio and Galaxy.
Analyses components overview
Integrated analysis apps (Galaxy, Jupyter and RStudio) run in a workspace Cloud Environment virtual Machine (VM). The diagram below illustrated the distinct components that make up your analysis app cloud environment. You can customize the ones in white boxes (note that software can only be customized for Jupyter).
Read on for more details about the customizable components.
- Cloud Environment VM (compute profile)
- Software (application configuration)
- Storage (persistent disk)
Your analysis app Cloud Environment is unique to you!Your colleagues might not have the same software packages (if you're running a Jupyter app) or be able to access anything stored in your Cloud Environment Persistent Disk (i.e. generated data) - even in a shared workspace.
To learn how to standardize software when running Jupyter, see Starting and customizing your Jupyter app. To learn more about how to share data generated in an interactive app, see How and why to save data generated in a notebook or RStudio to your workspace bucket.
Cloud environment VM (compute profile)
The compute profile is the CPU and RAM available to run your application, which determines how much processing can be done at a time. To balance cost and functionality, you can customize the compute profile in all analysis apps on Terra.
Estimating your computational needs
Featured and template workspaces and notebooks include recommended and project-specific configurations, as well as estimated costs to run (where possible). Since it is fairly straightforward to adjust the compute power, you can estimate an initial power to try and then dial it up or down as needed. Just be sure to be careful to save any generated data you want to keep when recreating the cloud environment (see below).
Compute power comes with a cost! Find the balance that's right for you.Note that more compute power costs more, and you don't want to request (and pay for) significantly more than your computation needs. Running a high-powered notebook costs a certain amount per unit time no matter what computations it does. You don't need (or want to pay for) a high-performance parallel Spark cluster if you're running a simple, non-parallel computation.
To learn more, see Understanding and controlling Cloud costs.
If your analysis is running slow
This could mean the CPUs and memory allotted are insufficient for the computations you're doing.
Saving time versus saving money
It may be worth the cost (GCP fees) of increasing the compute power so your analysis will complete quicker in real time (see the blog post Understand hardware and regulatory limits and trade-offs).
Your analysis app VM runs in a Google Cloud Platform (GCP) location. By default, Cloud Environments will run in the
us-central1 region. If your workspace bucket is located outside of the US, you will be able to modify the location of your Cloud Environment.
Recommended best practice is to choose the same location for your workspace bucket and Cloud Environment to minimize cross-region egress costs. To learn more, see US regional versus Multi-regional US buckets: trade-offs.
Note that the location of a Cloud Environment cannot be changed once created. To have a new location you must create a new Cloud Environment.
Software (application configuration)
The application configuration is the sum total of software, packages, libraries, and dependencies that are pre-installed in the Cloud Environment container.
Jupyter environments are fully customizable with both pre-configured and custom environment options. To learn more, see Starting and customizing your Jupyter app.
RStudio uses the community-maintained RStudio image (R, Bioconductor and Python). To learn more, see Starting and customizing your RStudio app.
Galaxy instances use the latest version of Galaxy. To learn more, see Starting and customizing Galaxy on Terra.
Storage (Persistent Disk)
Your Cloud Environment comes with Persistent Disk (PD) storage by default that stores files in your Cloud Environment even after you delete the VM or cluster. The PD can be kept when you delete your Cloud Environment, and reattached to a newly created VM.
The persistent disk as storage lets you keep the packages your notebook code is built upon, input files necessary for your analysis, and generated outputs - without having to move anything to permanent cloud storage.
Data in PD is not available outside your individual Cloud EnvironmentNote that because the PD is not accessible from outside the Cloud Environment, data generated in a notebook cannot be used as input for a workflow analysis, and it is not accessible by other collaborators using a shared workspace.
To learn more about saving data generated in a notebook to permanent cloud storage (for access outside the Cloud Environment or archiving), see How (and why) to save data generated in a notebook to a Workspace bucket.
To see your Cloud Environment configuration
1. Go to the Analyses tab and click the cloud icon in the right sidebar.
2. In the Cloud Environment Details pane, select the gear icon for the application you are running (or want to run).
Customizing your analysis app Cloud Environment
If no analysis app is running, you'll expose the app's default Cloud Environment. The default environment will allow you do run many analyses without any changes.
Each collaborator's Cloud Environment settings can be differentTo ensure a consistent analysis environment across all team Cloud Environments, we strongly recommend using one of the default Cloud Environments, or using a startup script or custom Docker to standardize your analysis environment.
The default Jupyter environment includes up-to-date versions of GATK, Python and R, 1 CPU with 3.75 GB of RAM, and 50 GB of persistent disk storage.
The cost of the components (VM when running or paused, or the persistent disk) is displayed near the bottom of the form.
To customize your Jupyter VM (including using clusters!), you'll use the Customize button at the bottom right of the form. For step-by-step instructions, see Creating and customizing your Jupyter VM.
The default RStudio environment includes up-to-date versions of RStudio, Bioconductor, and Python, 1 CPU with 3.75 GB of RAM, and 50 GB of persistent disk storage.
The cost of the components (VM when running or paused, or the persistent disk) is displayed in a blue section near the bottom of the form.
You can customize your RStudio Cloud Environment by selecting the Customize button at the bottom right. For step-by-step instructions, see Creating and customizing your RStudio VM.
The default Galaxy environment includes up-to-date version of Galaxy, 1 node with 8 CPUs, 52 GB of RAM, and 500 GB of persistent disk storage.
The cost of the components (VM when running or paused, or the persistent disk) is displayed in blue at the top of the form.
You can change the Galaxy compute profile right in this pane. For step by step instructions, see Starting and customizing Galaxy on Terra.
Be aware of cost!Remember that you will be charged for Cloud Environments as long as they exist, regardless of whether you are running any calculations! You can check the status of all of your Cloud Environments and persistent disks on the Cloud Environments page.