Your interactive analysis VM (Cloud Environment)

Allie Hajian
  • Updated

Interactive analysis apps - Jupyter Notebooks, RStudio, and Galaxy - run on virtual machines (VMs) or clusters of machines in your workspace Cloud Environment. This article provides an overview of the Cloud Environment components that run your analyses.

To learn how to create and customize your analysis app VM, see articles for Jupyter, RStudio, and Galaxy.

Analyses components overview

Integrated analysis apps (Galaxy, Jupyter, and RStudio) run in a workspace Cloud Environment virtual machine (VM). The diagram below illustrates the distinct components that make up your analysis app cloud environment. You can customize the ones in white boxes. Note: Software can only be customized for Jupyter.

Diagram showing two could environments in a workspace, each with a fixed boot disk plus customizable VM, software, and persistent disk storage. The workspace also includes shared workspace storage in the form of a Googl bucket

Read on for more details about the customizable components.

Your analysis app Cloud Environment is unique to you!Your colleagues might not have the same software packages (if you're running a Jupyter app) or be able to access anything stored in your Cloud Environment Persistent Disk (i.e., generated data) - even in a shared workspace.

To learn how to standardize software when running Jupyter, see Starting and customizing your Jupyter app. To learn more about how to share data generated in an interactive app, see How and why to save data generated in a notebook or RStudio to your workspace bucket.

Cloud environment VM (compute profile)

The compute profile is the CPU and RAM available to run your application, which determines how much processing can be done at a time. To balance cost and functionality, you can customize the compute profile in all analysis apps on Terra.

Estimating your computational needs

Featured and template workspaces and notebooks include recommended and project-specific configurations, as well as estimated costs to run (where possible). Since it is fairly straightforward to adjust the compute power, you can estimate an initial power and dial it up or down as needed. Just be  careful to save any generated data you want to keep when re-creating the cloud environment (see below).

Compute power comes with a cost! Find the balance that's right for you. More compute power costs more, and you don't want to request (and pay for) significantly more than your computation needs. Running a high-powered notebook costs a certain amount per unit time no matter what computations it does. You don't need (or want to pay for) a high-performance parallel Spark cluster if you're running a simple, nonparallel computation.

To learn more, see Overview: Controlling Cloud costs on Terra.

If your analysis is running slow

This could mean the CPUs and memory allotted are insufficient for the computations you're doing.

Saving time versus money

It may be worth the cost (Google Cloud fees) of increasing the compute power so your analysis will complete quicker in real time (see the blog post Understand hardware and regulatory limits and trade-offs).

Compute Location

Your analysis app VM runs in a Google Cloud location. By default, Cloud Environments will run in the us-central1 region. If your workspace bucket is located outside the US, you can modify the location of your Cloud Environment.

Recommended best practice is to choose the same location for your workspace bucket and Cloud Environment to minimize cross-region egress costs. To learn more, see US regional versus Multi-regional US buckets: trade-offs.

The location of a Cloud Environment cannot be changed once created

If you want a new location, you must create a new Cloud Environment.

Software (application configuration) 

The application configuration is the sum total of software, packages, libraries, and dependencies that are preinstalled in the Cloud Environment container.

Jupyter software

Jupyter environments are fully customizable with both preconfigured and custom environment options. To learn more, see Starting and customizing your Jupyter app.

RStudio software

RStudio uses the community-maintained RStudio image (R, Bioconductor and Python). To learn more, see Starting and customizing your RStudio app.

Galaxy software

Galaxy instances use the latest version of Galaxy. To learn more, see Starting and customizing Galaxy on Terra.

Storage (Persistent Disk)

Your Cloud Environment comes with Persistent Disk (PD) storage by default, that stores files in your Cloud Environment even after you delete the VM or cluster. The PD can be kept when you delete your Cloud Environment, and can be reattached to a newly created VM.

The persistent disk as storage lets you keep your notebook code packages,  input files necessary for analysis, and generated outputs - without having to move anything to permanent cloud storage.

Data in PD is not available outside the user's Cloud EnvironmentBecause the PD is not accessible from outside the Cloud Environment, data generated in a notebook cannot be used as input for a workflow analysis, and it is not accessible by other collaborators using a shared workspace.

To learn more about saving data generated in a notebook to permanent cloud storage (for access outside the Cloud Environment or archiving), see How (and why) to save data generated in a notebook to a Workspace bucket

To see your Cloud Environment configuration

Each user's Cloud Environment settings can be differentTo ensure a consistent analysis environment across all team Cloud Environments, we strongly recommend using one of the default Cloud Environments, or using a startup script or custom Docker to standardize your analysis environment. 

1. Go to the Analyses tab and click the cloud icon in the right sidebar. 

Screenshot highlighting the analyses tab in a Terra workspace with an arrow pointing to the cloud icon in the right sidebar

2. In the Cloud Environment Details pane, select the gear icon for the application you are running (or want to run).

Screenshot of the Cloud Environment Details pane highlighting the gear icons and logos for Jupyter, RStudio, and Galaxy

Customizing your analysis app Cloud Environment

If no analysis app is running, you'll expose the app's default Cloud Environment. The default environment will allow you to run many analyses without any changes.

  • The default Jupyter environment includes up-to-date versions of GATK, Python and R, 1 CPU with 3.75 GB of RAM, and 50 GB of persistent disk storage.

    The cost of the components (VM when running or paused, or the persistent disk) is displayed near the bottom of the form.

    Screenshot of the default Jupyter cloud environment pane listing the default values for software, VM and persistent disk sizes and types, and an arrow pointing to the blue customize button at the bottom right that allows you to create a custom environment

    To customize your Jupyter VM (including using clusters!), you'll use the Customize button at the bottom right of the form. For step-by-step instructions, see Starting and customizing your Jupyter app.

  • The default RStudio environment includes up-to-date versions of RStudio, Bioconductor, and Python, 1 CPU with 3.75 GB of RAM, and 50 GB of persistent disk storage.
    Screesnhot of default RStudio cloud environment pane listing the default values for software, VM and persistent disk sizes and types and location

    The cost of the components (VM when running or paused, or the persistent disk) is displayed in a blue section near the bottom of the form.

    You can customize your RStudio Cloud Environment by selecting the Customize button at the bottom right. For step-by-step instructions, see Creating and customizing your RStudio VM.

  • The default Galaxy environment includes up-to-date version of Galaxy, 1 node with 8 CPUs, 52 GB of RAM, and 500 GB of persistent disk storage.
    Screenshot of Galaxy Cloud Environment pane with the default Galaxy software and fields to select the cloud compute profile and persistent disk size and type

    The cost of the components (VM when running or paused, or the persistent disk) is displayed in blue at the top of the form.

    You can change the Galaxy compute profile right in this pane. For step-by-step instructions, see  Starting and customizing Galaxy on Terra.

Be aware of cost!Remember that you will be charged for Cloud Environments as long as they exist, whether or not you run calculations! You can check the status of all your Cloud Environments and persistent disks on the Cloud Environments page.

Additional resources

Learn more about Terra's Jupyter Notebook environment

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.