Interactive analysis with Jupyter Notebooks

Jupyter notebooks are an open-source analysis environment where you can gain real time insight into study data with interactive analysis and visuals. You can import data - including processed genomics, phenotype and transcriptomics data stored in the cloud - and analyze with custom or pre-built libraries in R or Python.

Terra's integrated, open-source Jupyter Notebooks environment is accessible to newcomers and enables portability and reproducibility. Notebooks combine analysis methods and findings in a single place, in a form that's straightforward to understand and share. A logical evolution of the traditional scientific paper, Jupyter notebooks dramatically shorten the path between reading how an analysis was done and actually being able to reproduce it. It's difficult to overstate how powerful this concept is and what an impact notebooks can have on the reusability and reproducibility of findings in the computational sciences.

Read on to learn more about Jupyter notebooks and how to use them in Terra.

Try a hands-on T101 Notebooks tutorial

If you would rather learn about notebooks by running one, try the Terra (GCP) Quickstart workspace.

The Terra (GCP) Quickstart 3: Notebooks tutorial includes everything you need to get hands-on using Jupyter notebooks in Terra. Work through the tutorial steps in your copy of the Terra on GCP Quickstart workspace with your own billing.

Terra on GCP Quickstart tutorial workspace | Step-by-step guide

Jupyter notebooks 101

Jupyter notebooks are an open-source app that opens in the Analyses tab of your workspace. They contain rich text commentary as well as code cells. They can execute any Python or R-based commands - including GATK - on real data in real-time.

Reproducibility

When you send a copy of a notebook with code cells that have been run to collaborators, they can view your results embedded in the .jpynb file.

Find a notebook in a public Terra workspace you'd like to play around with? Clone yourself a copy. Need to develop/refine an analysis in tandem with distant collaborators? You can invite them to your workspace, and work directly in the same notebook with them.

New to Jupyter notebooks?

Check out the Jupyter Notebooks 101 notebook in the Terra on GCP QuickStart workspace workspace. The notebook covers:

Why notebooks are used in biomedical research
Jupyter Notebook basics: How to use a notebook, install packages, and import modules
The relationship between the notebook and the workspace
What are common libraries in data analysis and popular tutorial notebooks

Already familiar with Jupyter notebooks ?

Learn how to run a notebook in Terra by working through the Terra (GCP) Quickstart part 3: Notebooks.

How to run a notebook on Terra

1. Click on the notebook name in the Analyses tab.

2. In the preview pane, choose Open or Playground Mode.
To avoid overwriting another collaborator working in the same notebook, Terra will only allow you to edit a shared notebook in "Playground" mode.

3. Set up your Jupyter Cloud Environment (optional)
If you haven't run an interactive analysis before, you will need to configure the Cloud Environment that Terra will create for you to run your notebook. The default Jupyter environment in the Jupyter Cloud Environment pane is sufficient for many analysis applications.

You can customize the installed packages and libraries, compute resources, and storage (Persistent Disk) of your virtual machine by selecting the Customize button at the bottom of the Jupyter Cloud Environment pane. To learn more, see Starting and customizing your Jupyter app.

4. You'll see a Jupyter logo with a blue dot in the sidebar indicating that the Jupyter Cloud Environment is creating. Click Open again to start the notebook. Once the VM is running, you'll be able to work in the notebook.

Four ways to make your own notebook

1. Clone the workspace with a notebook you'd like to explore

Cloning a workspace will automatically create copies of all of the notebooks in that workspace, allowing you to quickly duplicate the notebook analysis in your own secure sandbox.

Clone-workspace-Analyses-tab-display_Screen_capture.gif

2. Copy an individual notebook to your own workspace

This is especially useful when you only want to use a few notebooks in public workspaces that include a large number of sample notebooks.

3. Create a new notebook from scratch

If you are an experienced notebook user, you may want to create a notebook from scratch.

1. Go to the Analyses tab.

2. Click the Start button.

3. Select the Jupyter application in the right pane.

4. In the new pane, name the new notebook and select a Billing project and Authorization Domain (optional).

4. Upload a template notebook from outside the Terra platform

This is similar to creating a new notebook. You'll need to make sure to upload a JSON file format saved with the .ipynb extension.

Code and markdown cells in a notebook

Cells are the building blocks of a notebook. Each cell has a "type" (Code/Markdown/Raw NBConvert) that determines how the application compute will interpret the instructions in the cell.

The cell type can be seen or changed either in the Cell>Cell Type drop down menu (see below)

or on the right of the toolbar of function shortcuts located just below the menu bar.

How to run a code cell

There are three ways to run a code cell in a notebook:

1. Select the cell and press Shift + Enter on your keyboard (your keyboard may say "return" instead of "enter").

2. Click the Run icon in the menu bar.

3. Use the appropriate command from the Cell dropdown menu.

How does a code cell run?

When you run a code cell, the application compute kernel interprets the code, passes those instructions to the actual operating system of the machine that the notebook is running on(e.g. Python, R), and retrieves results to display them in the notebook. As the command runs, the output log of the command appears right below the code cell (see the screenshot below).

If the code in the cell doesn't match the kernel's language, the application compute will return an error. If the code is correct but specifies no outputs, the code will run and the result of the computation will be stored in the notebook cloud environment until the kernel is restarted. If no output is specified, you will know the code was successfully executed by noting the number in the square parens [ ] to the left of the cell.

How to know if your code cell has executed

When you launch a notebook for the first time, the square parens to the left of each code cell are empty [], indicating these cells have not yet been run during this session. An asterisk inside the parens [*] means the cell is running. Once the command has been executed, the asterisk will be replaced by an integer representing the number of commands executed since the kernel started. There is nothing to stop you from executing the same cell multiple times, as in the screenshot above.

If you clear the outputs by going to the dropdown menu Cell>All Outputs and selecting "Clear", the integer parens will all be replaced by empty parens again. However, the integer count will only reset to zero if you restart the kernel.

How to edit content in a markdown cell

To edit the content of a markdown cell, double click on the cell to edit. See the clip below for example expected behavior. Markdown is a lightweight plain text formatting language. The syntax controls how the text appears - bold/italic, header size, section enumeration, etc.

Markdown cells do not interpret code
The VM can only interpret Markdown-based syntax in a markdown type cell. Running a Markdown-type cell that contains Python code, for example, will result in the code being reproduced as plain text.

Two examples of Markdown syntax

Italicizing using a single asterisk on either side (*italics* => italics)
Modifying text size using hashtags

A more comprehensive listing of Markdown syntax can be found in this helpful Markdown Cheatsheet.

Next steps and additional resources

To learn more about notebooks and how to customise your Jupyter Cloud Environment in Terra, see Starting and customizing your Jupyter app.
The Terra Notebooks Playground workspace contains Jupyter Notebooks that allow you to explore notebook functionality. These include both R and Python setup notebooks, and template notebooks for accessing and analyzing different kinds of data in different formats and cloud locations.
The BioData catalyst Collection includes notebooks that are helpful for BioData Catalyst users in Terra. The notebooks in this collection can be copied to a different workspace to aid users in their individual research needs.