Jupyter Notebooks run on virtual machines or clusters of machines in your Jupyter Cloud Environment. You can adjust the configuration of your Jupyter app to fit your computational needs. This article gives step-by-step instructions for customizing your Jupyter Cloud Environment VM, installed software, and storage (i.e. Persistent Disk).
Starting a Jupyter VM
Follow step-by-step instructions below depending on whether you are using the Notebooks tab display or opting into the new Analyses tab display.
1. First click on the gear icon in the widget at the top right of your workspace.
This will reveal the default environment (below).
2. To start Jupyter with the default settings, click the Create button at the top right of the form.
1. Start in the Analyses tab of your workspace.
2. Click the cloud icon in the right sidebar.
3. In the Cloud Environment Details pane, click the gear icon (Environment settings) under the Jupyter logo. This will surface the Jupyter Cloud Environment default pane (below).
4. Click the Create button to start a Jupyter Cloud Environment with the default settings.
Once you click Create, it will take a few minutes for the Jupyter Cloud Environment to start.
You can also get to the (Jupyter) Cloud Environment pane by clicking the notebook name.
Customizing your Jupyter Cloud Environment
If the default or project-specific environments don't fit your needs, you can customize many aspects of your Jupyter app in Terra:
- VM size, type, and location (compute profile)
- Software (application configuration)
- Size and region of dedicated storage (persistent disk)
You'll specify what you want in the Cloud Environment customization pane (steps below) and let Terra recreate your cloud environment with the new specifications. Scroll down for more details about each customization option.
When can you change your Jupyter app? You can modify your Jupyter Cloud Environment at any time, even if you've already started working in a notebook. See Updating your Jupyter app in real tine without losing data (below).
Most updates that involve increasing Cloud Environment resources will preserve any previous work. This is why we recommend starting with the minimum resources you think you will need and scaling up if it's not enough.
Step 1: Access the Cloud Environment customization form
1.1. Start from the Cloud Environment pane (above). f you haven't yet created or customized a cloud environment, you will see the the defaults in the form. Select the Customize button at the bottom right.
1.2. When you select Customize, or if Jupyter is running already, you'll see the Cloud Environment configuration pane (screenshot below). Note that it has fewer options and is much simpler to adjust in Terra than the equivalent Google Cloud Platform interface!
1.1. Start from the Jupyter Cloud Environment pane (steps above).
1.3. If you haven't yet created or customized a cloud environment, you will see the the defaults in the form. Select the Customize button at the bottom right.
When you select Customize - or if Jupyter is running already - you'll see the Jupyter Cloud Environment configuration pane (screenshot below). Note that it has fewer options and is much simpler to adjust in Terra than the equivalent Google Cloud Platform interface!
Step 2. Choose the software (application configuration)
Terra has several categories of pre-configured Jupyter application setups - plus a custom option - available in the drop-down menu. You will find what versions and what libraries are included in each pre-configured option by clicking the "What’s installed on this environment?" link below the the drop-down.
Why use a pre-configured application configuration?
Using the same software application configurations is a way to make sure everyone has the same computational environment and gets the same results (when inputting the same data and using the same analysis tools, of course!). The software application configurations available in the dropdown are curated and up-to-date, so if you can use one, it's an easy way to keep collaborators on the same page.
Categories of application configurations
- Terra-maintained Jupyter environments
- Community-maintained Jupyter environments (verified partners)
- Custom environments
Customizing your installed software and packages
If one of the pre-configured application options doesn't meet your needs, you can make your own custom application configuration (i.e. pre-install software and dependencies in the VM) with a Docker image or startup script.
Why use a custom Docker or startup script to install software and dependencies?If your analysis requires software packages that are not part of the default or pre-configured configurations, you could start your interactive application by installing the ones you need on the Persistent Disk. However, this approach can turn into a maintenance headache if you have multiple notebooks that require the same configuration commands. It is also much harder to make sure all collaborators working on the same project (each with their own Jupyter Cloud Environment) has the same software and dependencies.
See Standardizing a custom RStudio or Jupyter Cloud Environment for more details and step-by-step instructions.
Step 3. Adjust the compute power
Continuing down the Jupyter Cloud Environment configuration pane, you'll see options for setting up the compute power of your virtual machine. If the defaults are not adequate for your needs, you can select a custom compute, where you can specify the primary CPUs, memory, disk sizes and type and location you need. You can spin up a Spark cluster of parallel machines, and specify the number of secondary machines and their CPUs, memory, and disk sizes. To configure a custom compute power, follow the steps below.
3.1. In the Cloud Computer Profile section of the Jupyter Cloud Environment form, choose the specification of your primary machine. See the example below.
- CPUs: 8
- Memory (GB): 30
- Disk size (GB): 100
If you only want one virtual machine and no other customizations, you're done!
Spark VM instructions
3.2. To configure as a Spark cluster (for parallel processing), first select Spark cluster from the Compute type list.
3.3. Fill in the values for the Worker config.
- Workers: 120
- Preemptibles: 100
- CPUs: 4
- Memory (GB): 15
- Disk size (GB): 500
Finding the VM cost
The cost of the requested compute power will be displayed in a blue section at the top of the form. For example, when requesting a Spark cluster, your screen will look something like this:
Cost-saving recommendationsSize your compute power appropriately
You pay a fixed amount while a notebook is running, whether or not you are doing active calculations (note, however, that Terra automatically pauses a notebook after twenty minutes of inactivity).The cost is based on the compute power of your virtual machine or cluster, not how much computation is being done. So you want to have enough power to do your computations in a reasonable amount of time, but not a lot of extra that you will be paying for and not using.
Start small and scale up
Because you generally don't lose any data if you increase resources (CPUs or disk sizes, for example), it is generally best to start small and increase as needed.
To learn more about controlling cloud costs in a notebook, see Controlling cloud costs - sample use cases.
Step 4 (Optional): Other Cloud Environment customizations
Below are a number of additional customizations you can make to your Jupyter Cloud Environment.
Terra supports the use of graphics processing units (GPUs) - special processing units optimized for linear algebra computations, such as matrix multiplication - when using Jupyter notebook cloud environments. To learn more, see Getting started with GPUs in a Jupyter Cloud Environment.
Jupyter Cloud Environments will automatically pause when there is no web browser or kernel activity for 30 minutes. To learn more about how autopause on Terra works by default - and how and why you can manually override the default settings - see Preventing runaway costs with Cloud Environment autopause.
VM location (GCP region)
Your Cloud Environment VM will default to the workspace bucket region, but you can choose a different location in the configuration pane. To learn more, see Customizing where your data are stored and analyzed.
Note that if you change the location from the value proposed by the UI, you may incur egress charges if your bucket location and interactive analysis Cloud Environment location are different.
Persistent Disk size and type
If the default PD is too large (and you don't want to pay for the extra) or too small (and you need more), you can adjust the size in the Jupyter Cloud Environment setup form.
You can also choose between a standard or solid state disk (SSD). SSD's cost more, but are faster to process data. The increased speed may be worth the cost, for you. See Detachable persistent disks to learn more about detachable persistent disks for notebook applications in Terra.
Step 5. Save, and recreate your Cloud Environment
Don't forget to save the configuration after changing any values. This will recreate the application compute with the new values, which can take up to ten minutes.
You can further customize using a Docker image or startup script to standardize the environment you need. It's like having your own pre-configured environment instead of just those in the dropdown. See detailed instructions in Standardizing a custom RStudio or Jupyter environment.
It's not necessary to guess upfront the resources you're going to need to do your work. You can start with minimal settings, then dial them up if you run into limitations.
Jupyter Cloud Environment considerationsChanging the Cloud Environment can mean files generated or stored in the application memory will be lost when Terra recreates the Cloud Environment. To avoid this, make sure to keep your Persistent Disk (default) and only increase resources. We also recommend copying all valuable files to Workspace storage (Google bucket).
Updating your Jupyter VM in real time (without losing data)
Your Jupyter Cloud Environment comes with storage (persistent disk, or PD) that is kept by default when you delete or re-create the Cloud Environment. As long as you don't choose to delete your PD storage, there are many changes you can make - even while your Jupyter Cloud Environment is running or if you transition to working in RStudio - without worrying about losing any data.
Changes that don't put data at risk
Below are all changes you can make to the virtual environment where your notebook or RStudio analysis runs without losing any data stored in the PD.
- Increase or decrease the # of CPUs or VM memory
During this update, the Notebook will pause the cloud environment, update, and then restart. The update will take a couple of minutes to complete, and you will not be able to continue editing or running the Notebook while it's completing.
- Increase the disk size (note that decreasing the disk size can result in lost data)
- Change the number of workers (when running a Spark cluster and the number of workers is > 2)
During this update, you can continue to work in your Notebook without pausing your cloud environment. When the update is finished, you will see a confirmation banner.
Cloud Environment changes that can cause you to lose workNote that this applies no matter what kind of interactive analysis you are running, including RStudio, Jupyter Notebooks, or Galaxy. Please back up files as appropriate.
- Decreasing the Persistent Disk size
- Deleting the Persistent Disk (when re-creating or deleting the Cloud Environment)
Changing BOTH CPU/memory and number of workers (Spark VM)
Note that if you want to modify both the workers and CPU/memory, we advise doing this sequentially.
1. First, update the CPUs/memory.
2. Wait for the Notebook Cloud Environment to restart.
3. Then adjust the workers.
Additional resources: To learn more about your workspace Cloud Environment storage, see Detachable Persistent Disks.