Jupyter Notebooks run on virtual machines (VMs) or clusters of machines in your Jupyter Cloud Environment. You can adjust the configuration of your Jupyter app to fit your computational needs. This article gives step-by-step instructions for customizing your Jupyter Cloud Environment virtual machine, installed software, and storage (i.e., Persistent Disk).
Launch a Jupyter Cloud Environment virtual machine (VM)
Follow the step-by-step instructions below.
1. Start in the Analyses tab of your workspace.
2. Click the cloud icon in the right sidebar.
3. In the Cloud Environment Details pane, click the gear icon (Environment settings) under the Jupyter logo. This will surface the Jupyter Cloud Environment default pane (below).
4. Click the Create button to start a Jupyter Cloud Environment with the default settings.
Once you click Create, it will take a few minutes for the Jupyter Cloud Environment to start.
You can also get to the (Jupyter) Cloud Environment pane by clicking the notebook name.
How to customize your Jupyter Cloud Environment
If the default or project-specific environments don't fit your needs, you can customize many aspects of your Jupyter app in Terra:
- VM size, type, and location (compute profile)
- Software (application configuration)
- Size and region of dedicated storage (persistent disk)
You'll specify what you want in the Cloud Environment customization pane (steps below) and let Terra recreate your cloud environment with the new specifications. Scroll down for more details about each customization option.
When can you change your Jupyter app? You can modify your Jupyter Cloud Environment at any time, even if you've already started working in a notebook. See Updating your Jupyter app in real tine without losing data (below).
Most updates that involve increasing Cloud Environment resources will preserve any previous work. This is why we recommend starting with the minimum resources you think you need and scaling up if it's not enough.
Step 1: Access the Jupyter Cloud Environment customization form
1.1. Start from the Jupyter Cloud Environment pane (steps above).
1.2. If you haven't yet created or customized a cloud environment, you will see the defaults in the pane.
1.3. Select the Customize button at the bottom right.
When you select Customize - or if Jupyter is running already - you'll see the Jupyter Cloud Environment configuration form (screenshot below). Note: It has fewer options and is much simpler to adjust in Terra than the equivalent Google Cloud interface!
Step 2. Choose the software (application configuration)
Terra has several categories of preconfigured software (Jupyter application) setups - plus a custom option - available in the drop-down menu. You will find included versions and libraries in each preconfigured option by clicking the "What’s installed on this environment?" link below the dropdown.
Why use a preconfigured application configuration?
Using the same software application configurations ensures everyone has the same computational environment and gets the same results (when inputting the same data and using the same analysis tools, of course!). The software application configurations in the dropdown are curated and up to date, so if you can use one, it's an easy way to keep collaborators on the same page.
Categories of application configurations
- Terra-maintained Jupyter environments
- Community-maintained Jupyter environments (verified partners)
- Custom environments
Customizing your installed software and packages
If one of the preconfigured application options doesn't meet your needs, you can make your own custom application configuration (i.e., preinstall software and dependencies in the VM) with a Docker image or startup script.
Why use a custom Docker or startup script to install software and dependencies?If your analysis requires software packages that are not part of the default or preconfigured configurations, you could start your interactive application by installing the ones you need on the Persistent Disk. However, this approach can turn into a maintenance headache if you have multiple notebooks that require the same configuration commands. Besides, it's much harder to make sure all collaborators working on the same project (each with their own Jupyter Cloud Environment) have the same software and dependencies.
See Standardizing a custom RStudio or Jupyter Cloud Environment for more details and step-by-step instructions.
Step 3. Adjust the compute power
Continuing down the Jupyter Cloud Environment configuration form, you'll see options for setting up the compute power of your virtual machine. Default values are adequate for many typical analyses. If the defaults are not adequate for your needs, you can select a custom compute, where you can specify the primary CPUs, memory, disk sizes, and type and location you need. You can spin up a Spark cluster of parallel machines, and specify the number of secondary machines as well as their CPUs, memory, and disk sizes.
To configure a custom compute power, follow the steps below.
3.1. In the Cloud Computer Profile section of the form, choose the specification of your primary machine.
Single machine example custom compute
- CPUs: 8
- Memory (GB): 30
- Disk size (GB): 100
If you only want one virtual machine and no other customizations, you're done!
Spark VM instructions
3.2. To configure as a Spark cluster (for parallel processing), first select Spark cluster from the Compute type list.
3.3. Fill in the values for the Worker config.
Spark cluster example compute values
- Workers: 120
- Preemptibles: 100
- CPUs: 4
- Memory (GB): 15
- Disk size (GB): 500
Finding the VM cost
The cost of the requested compute power will be displayed in a blue section at the top of the form. For example, when requesting a Spark cluster, your screen will look something like this:
Cost-saving recommendationsSize your compute power appropriately
You pay a fixed amount while a notebook is running, whether or not you are doing active calculations. (Note: Terra automatically pauses a notebook after twenty minutes of inactivity).The cost is based on the compute power of your virtual machine or cluster, not how much computation is being done. So choose enough power to do your computations in a reasonable amount of time, but not excessive power that you pay for and don't use.
Start small and scale up
Generally, you don't lose data if you increase resources (e.g, CPUs or disk sizes), so it's best to start small and increase as needed.
To learn more about controlling cloud costs in a notebook, see Controlling cloud costs - sample use cases.
Step 4 (Optional): Other Cloud Environment customizations
Below are a number of additional customizations you can make to your Jupyter Cloud Environment.
Terra supports the use of graphics processing units (GPUs) - special processing units optimized for linear algebra computations, such as matrix multiplication - when using Jupyter Notebook Cloud Environments. To learn more, see Getting started with GPUs in a Jupyter Cloud Environment.
Jupyter Cloud Environments will automatically pause when there is no web browser or kernel activity for 30 minutes. To learn more about how autopause on Terra works by default - and how and why you can manually override the default settings - see Preventing runaway costs with Cloud Environment autopause.
VM location (Google Cloud region)
Your Cloud Environment VM will default to the workspace bucket region, but you can choose a different location in the configuration pane. To learn more, see Customizing where your data are stored and analyzed.
Note: If you change the location, you may incur egress charges if your bucket location and interactive analysis Cloud Environment location are different.
Persistent Disk size and type
If the default PD is too large (and you don't want to pay for the extra) or too small (and you need more), you can adjust the size in the Jupyter Cloud Environment setup form.
You can also choose between a standard or solid state disk (SSD). SSDs cost more, but are faster to process data. The increased speed may be worth the cost. See Detachable persistent disks to learn more about detachable persistent disks for notebook applications in Terra.
Step 5. Save, and re-create your Cloud Environment
Don't forget to save the configuration after changing any values. This will re-create the application compute with the new values, which can take up to ten minutes.
You can further customize using a Docker image or startup script to standardize the environment you need. It's like having your own preconfigured environment instead of just those in the dropdown. See detailed instructions in Standardizing a custom RStudio or Jupyter environment.
It's not necessary to guess upfront the resources you're going to need to do your work. You can start with minimal settings, then dial them up if you run into limitations.
Jupyter Cloud Environment considerationsChanging the Cloud Environment can mean files generated or stored in the application memory will be lost when Terra re-creates the Cloud Environment. To avoid this, make sure to keep your Persistent Disk (default) and only increase resources.
Also, we recommend copying all valuable files to Workspace storage (Google bucket).
Updating your Jupyter VM in real time (without losing data)
Your Jupyter Cloud Environment comes with storage (persistent disk, or PD) that is kept by default when you delete or re-create the Cloud Environment. As long as you don't choose to delete your PD storage, there are many changes you can make - even while your Jupyter Cloud Environment is running or if you transition to working in RStudio - without worrying about losing data.
Changes that don't put data at risk
Below are all changes you can make to the virtual environment where your notebook or RStudio analysis runs without losing data stored in the PD.
- Increase or decrease the # of CPUs or VM memory
During this update, the Notebook will pause the cloud environment, update, and then restart. The update will take a couple of minutes to complete, and you cannot edit or run the Notebook while it completes.
- Increase the disk size (note that decreasing the disk size can result in lost data)
- Change the number of workers (when running a Spark cluster and the number of workers is > 2)
During this update, you can continue to work in your notebook without pausing your cloud environment. When the update is finished, you will see a confirmation banner.
Cloud Environment changes that can cause you to lose workNote: This applies to any kind of interactive analysis you run, including RStudio, Jupyter Notebooks, or Galaxy. Please back up files as appropriate.
- Decreasing the Persistent Disk size
- Deleting the Persistent Disk (when re-creating or deleting the Cloud Environment)
Changing BOTH CPU/memory and number of workers (Spark VM)
Note: If you want to modify both the workers and CPU/memory, we advise doing this sequentially.
1. First, update the CPUs/memory.
2. Wait for the Notebook Cloud Environment to restart.
3. Then adjust the workers.
Additional resources: To learn more about your workspace Cloud Environment storage, see Detachable Persistent Disks.
Please sign in to leave a comment.