Interactive analysis apps - Jupyter Notebooks, RStudio, and Galaxy - run on virtual machines (VMs) or clusters of machines in your workspace Cloud Environment. This article provides an overview of the Cloud Environment components that run your analyses. To learn how to create and customize your analysis app VM, see articles for Jupyter, RStudio and Galaxy.
Analyses components overview
Integrated analysis apps (Galaxy, Jupyter and RStudio) run in a workspace Cloud Environment virtual machine (VM). The diagram below illustrates the distinct components that make up your analysis app cloud environment. You can customize the ones in white boxes. Note: Software can only be customized for Jupyter.
Read on for more details about the customizable components.
- Cloud Environment VM (compute profile)
- Software (application configuration)
- Storage (persistent disk)
Your analysis app Cloud Environment is unique to you!Your colleagues might not have the same software packages (if you're running a Jupyter app) or be able to access anything stored in your Cloud Environment Persistent Disk (i.e., generated data) - even in a shared workspace.
To learn how to standardize software when running Jupyter, see Starting and customizing your Jupyter app. To learn more about how to share data generated in an interactive app, see How and why to save data generated in a notebook or RStudio to your workspace bucket.
Cloud environment VM (compute profile)
The compute profile is the CPU and RAM available to run your application, which determines how much processing can be done at a time. To balance cost and functionality, you can customize the compute profile in all analysis apps on Terra.
Estimating your computational needs
Featured and template workspaces and notebooks include recommended and project-specific configurations, as well as estimated costs to run (where possible). Since it is fairly straightforward to adjust the compute power, you can estimate an initial power and dial it up or down as needed. Just be careful to save any generated data you want to keep when re-creating the cloud environment (see below).
Compute power comes with a cost! Find the balance that's right for you. Note: More compute power costs more, and you don't want to request (and pay for) significantly more than your computation needs. Running a high-powered notebook costs a certain amount per unit time no matter what computations it does. You don't need (or want to pay for) a high-performance parallel Spark cluster if you're running a simple, nonparallel computation.
To learn more, see Overview: Controlling Cloud costs on Terra.
If your analysis is running slow
This could mean the CPUs and memory allotted are insufficient for the computations you're doing.
Saving time versus money
It may be worth the cost (Google Cloud fees) of increasing the compute power so your analysis will complete quicker in real time (see the blog post Understand hardware and regulatory limits and trade-offs).
Your analysis app VM runs in a Google Cloud location. By default, Cloud Environments will run in the
us-central1 region. If your workspace bucket is located outside the US, you can modify the location of your Cloud Environment.
Recommended best practice is to choose the same location for your workspace bucket and Cloud Environment to minimize cross-region egress costs. To learn more, see US regional versus Multi-regional US buckets: trade-offs.
Note: The location of a Cloud Environment cannot be changed once created. If you want a new location, you must create a new Cloud Environment.
Software (application configuration)
The application configuration is the sum total of software, packages, libraries, and dependencies that are preinstalled in the Cloud Environment container.
Jupyter environments are fully customizable with both preconfigured and custom environment options. To learn more, see Starting and customizing your Jupyter app.
RStudio uses the community-maintained RStudio image (R, Bioconductor and Python). To learn more, see Starting and customizing your RStudio app.
Galaxy instances use the latest version of Galaxy. To learn more, see Starting and customizing Galaxy on Terra.
Storage (Persistent Disk)
Your Cloud Environment comes with Persistent Disk (PD) storage by default, that stores files in your Cloud Environment even after you delete the VM or cluster. The PD can be kept when you delete your Cloud Environment, and can be reattached to a newly created VM.
The persistent disk as storage lets you keep your notebook code packages, input files necessary for analysis, and generated outputs - without having to move anything to permanent cloud storage.
Data in PD is not available outside the user's Cloud Environment. Note: Because the PD is not accessible from outside the Cloud Environment, data generated in a notebook cannot be used as input for a workflow analysis, and it is not accessible by other collaborators using a shared workspace.
To learn more about saving data generated in a notebook to permanent cloud storage (for access outside the Cloud Environment or archiving), see How (and why) to save data generated in a notebook to a Workspace bucket.
To see your Cloud Environment configuration
Each user's Cloud Environment settings can be differentTo ensure a consistent analysis environment across all team Cloud Environments, we strongly recommend using one of the default Cloud Environments, or using a startup script or custom Docker to standardize your analysis environment.
Jupyter and RStudio
Click the gear icon at the top right of any workspace to see the current Cloud Environment details.
If none is running, it will be the default environment, which can be customized.Galaxy
In the Notebooks tab, click the Create a Cloud Environment for Galaxy button at the left.
1. Go to the Analyses tab and click the cloud icon in the right sidebar.
2. In the Cloud Environment Details pane, select the gear icon for the application you are running (or want to run).
Customizing your analysis app Cloud Environment
If no analysis app is running, you'll expose the app's default Cloud Environment. The default environment will allow you to run many analyses without any changes.
The default Jupyter environment includes up-to-date versions of GATK, Python and R, 1 CPU with 3.75 GB of RAM, and 50 GB of persistent disk storage.
The cost of the components (VM when running or paused, or the persistent disk) is displayed near the bottom of the form.
To customize your Jupyter VM (including using clusters!), you'll use the Customize button at the bottom right of the form. For step-by-step instructions, see Starting and customizing your Jupyter app.
The default RStudio environment includes up-to-date versions of RStudio, Bioconductor, and Python, 1 CPU with 3.75 GB of RAM, and 50 GB of persistent disk storage.
The cost of the components (VM when running or paused, or the persistent disk) is displayed in a blue section near the bottom of the form.
You can customize your RStudio Cloud Environment by selecting the Customize button at the bottom right. For step-by-step instructions, see Creating and customizing your RStudio VM.
The default Galaxy environment includes up-to-date version of Galaxy, 1 node with 8 CPUs, 52 GB of RAM, and 500 GB of persistent disk storage.
The cost of the components (VM when running or paused, or the persistent disk) is displayed in blue at the top of the form.
You can change the Galaxy compute profile right in this pane. For step-by-step instructions, see Starting and customizing Galaxy on Terra.
Be aware of cost!Remember that you will be charged for Cloud Environments as long as they exist, whether or not you run calculations! You can check the status of all your Cloud Environments and persistent disks on the Cloud Environments page.
Learn more about Terra's Jupyter notebook environment
- Key components
- Key operations
- Best Practices
Please sign in to leave a comment.