Preconfigure a Cloud Environment with a startup script

Anton Kovalsky
  • Updated

The prepackaged Cloud Environment software environments (in the Application Configuration dropdown menu) may not come with all of the packages you consistently need. If you want a more efficient way to launch a Jupyter Notebook, RStudio, or Galaxy analysis without having to install each package manually, you can use a startup script to streamline the process. This article describes why and how to use startup scripts. 

Why use a startup script?

Standardize your Cloud Environment

Using the same software application configurations ensures that everyone has the same computational environment and gets the same results (when inputting the same data and using the same analysis tools, of course!). If one of the preconfigured application options doesn't meet your needs, you can make your own custom application configuration (i.e., preinstall software and dependencies in the virtual machine [VM]) with startup script (see additional Docker documentation for another option).

Startup script versus custom Docker

Startup scripts are good for installing packages and to make environment changes that typically require sudo. This makes them an efficient alternative to creating custom Docker images. (If you're curious about doing that, see this tutorial, Custom cloud environments for Jupyter Notebooks).

A custom Docker is a great way to take a snapshot of a set of package versions to keep your environment consistent. However, using a startup script is a quick way to add anything you need - including updated packages - to whatever Docker image you're working with, whether you're using a custom Docker or one of our preconfigured ones. Currently, startup scripts are supported for both Jupyter and RStudio environments.

Startup script Tutorial

In this short tutorial, you'll see an example of a startup script and learn how to upload it, find the file path you'll need to provide as a link, and use that link to launch your custom Jupyter or RStudio Cloud Environment.

Below is the example startup script we'll use in this tutorial. This startup script installs the multtest package that's part of Bioconductor, but is not included in the default R/Bioconductor environment in Terra.

#!/usr/bin/env bash

pip install nbconvert
apt-get update
apt-get install -yq pandoc texlive-xetex
R -e "install.packages('devtools')"
pip install scanpy[louvain]==1.4.4.post1 anndata==0.6.22rc1 h5py==2.9.0
R -e "BiocManager::install('multtest')"

Step 1: Check if the package is already in a preconfigured environment

Start with a quick confidence check to make sure the package you want isn't already part of one of the preconfigured environments in Terra.

1.1. Go to the Application configuration section of the Cloud Environment customization pane.

1.2. Start by selecting the default R/Bioconductor image in the Application Configuration dropdown menu.
Jupyter-R_Bioconductor-image_Screen_shot.png

1.3. Before adding a custom startup script, try to import the package by typing library(test-name) into a code block and running the cell.

You should see an error saying that no such package is currently on your virtual machine:
Screen_Shot_2021-03-17_at_1.46.45_PM.png

Step 2: Store script in Google bucket

To add a startup script to a Terra Cloud Environment, you need to give the script a URI (Unique Resource Identifier, similar to a URL) so Terra can access the script.

2.1. Store the script in workspace storage (Google bucket)
You can upload the startup script to any Google bucket, provided your workspace can access that bucket. For this tutorial, upload the startup script file to the workspace storage (i.e., Google bucket) - by 1) going to the Data tab of the workspace, 2) selecting the Files icon at the bottom of the left-hand menu, and 3) clicking the Upload button in the bottom right.

Start-up-script_How-to-upload-to-workspace-storage_Screen_shot.png

2.2. Copy the URI of the script
Once the file is in workspace storage, go to Google Cloud Console where you can copy the URI of your script file. You can do this either with the link to your bucket (in the Open in browser link at the right-hand side of your workspace dashboard) or by clicking the file link in the Data tab.

rstudio2.gif

URI in Google Cloud console (below)
Startup-script_Get-the-URI-from-GCP-console_Screen_shot.png

2.3. Once you have the URI, paste it into the Startup Script field under the Cloud Compute Profile section of the Jupyter Cloud Environment customization pane.
Start-up-script_Enter-URI-in-startup-script-field_Screen_shot_.png

2.4.  Choose any other compute profile customizations, then click the Update or Create button to spin up your Jupyter or RStudio Cloud Environment.

What to expect

Once your environment is ready, you can launch a notebook or Rstudio analysis. If you run the same command we attempted during the confidence check at the beginning of this tutorial, now you can successfully import this package:

Screen_Shot_2021-03-17_at_1.55.53_PM.png

Was this article helpful?

1 out of 1 found this helpful

Comments

0 comments

Please sign in to leave a comment.