Jupyter notebooks are an open-source analysis environment where you can visualize and analyze data in real time to gain insight into study data. Import data including processed genomics, phenotype and transcriptomics data stored in the cloud and analyze with custom or pre-built libraries using R or Python. Terra uses an integrated, open-source Jupyter Notebooks environment to streamline sharing and collaborating, and enable reproducibility.
This article covers the following topics:
If you're completely new to Jupyter notebooks, check out the Jupyter 101 primer workspace. The notebook in this workspace explains:
- Why notebooks are used in biomedical research
- The relationship between the notebook and the workspace
- Jupyter Notebook basics: How to use a notebook, install packages, and import modules
- What are common libraries in data analysis and popular tutorial notebooks
If you are already familiar with Jupyter notebooks, check out the notebooks in the Terra Notebooks Playgound, which contains a set of Jupyter Notebooks that allow users to play with this functionality. These include both R and Python Setup notebooks, and template notebooks for accessing and analyzing data.
1. Jupyter notebooks
Shareable, web-based Jupyter notebooks contain code cells interspersed with rich text commentary in markdown cells. Markdown is a text formatting language; adding documentation ensures that collaborators understand each step. Any analysis that can be done with Python or R code can be transferred to an interactive notebook. The combination of code and documentation makes Jupyter notebooks ideal for both collaboration and reproducibility.
Find a notebook in a public workspace you'd like to play around with? Clone yourself a copy. Need to develop/refine an analysis in tandem with distant collaborators? You can invite them to your workspace, and work directly in the same notebook with them.
2. Five ways to create/copy/open a notebook on Terra
2.1 Click on a notebook in one of your workspaces
Be very careful when choosing this type of interaction in a shared workspace. If someone has shared a master version of a notebook with you, changes you make to the code will be saved in the master copy.
2.2. Clone a workspace that contains a notebook you'd like to explore
Cloning a workspace will automatically create copies of all of the notebooks contained within that workspace, allowing users to quickly duplicate code and tools in a secure sand box for their own use.
2.3. Clone an individual notebook to your own workspace
This is especially useful for notebooks in public workspaces that contain large amounts of sample notebooks, of which a limited amount may be useful to any given project.
2.4. Create a new notebook from scratch
2.5. Upload a ready-made notebook
This is similar to creating a new notebook, you only have to make sure that what you are uploading is a JSON file format saved with the
3. Running a cell
Cells are the building blocks of a notebook. Each cell has a "cell type" (Code/Markdown/Raw NBConvert) that determines how the cluster will interpret the instructions in the cell.
The cell type can be seen or changed either in the Cell>Cell Type drop down menu, or on the right of the toolbar of function shortcuts located just below the menu bar:
To run a cell, either click the 'Run' button in the shortcut toolbar, or press
Enter, or select the appropriate command from the Cell dropdown menu:
To edit the content of a cell, double click on the cell you wish to edit.
3.1. Markdown Cells
The clip below shows an example of editing a Markdown-type cell. Markdown is a lightweight plain text formatting language. The syntax controls things like bold/italic text, header size, section enumeration, etc. Running a Markdown cell will cause the cluster to interpret only the Markdown-based syntax in the cell (running a Markdown-type cell that contains, for example, Python code, will result in that Python code simply being reproduced as plain text - Markdown cells do not interpret the code of the selected kernel).
The clip below shows two examples of Markdown syntax:
- Italicizing using a single asterisk on either side (*italics* => italics)
- Modifying text size using hashtags
A more comprehensive listing of Markdown syntax can be found in this helpful Markdown Cheatsheet
3.2. Code Cells
When the cell type is set to "code", running that cell will cause the cluster to interpret the contents of the cell using the interpreter of the selected kernel (e.g. Python, R). If the code in the cell does not match the kernel's language, the cluster will return an error. If the code is correct but specifies no outputs, the code will run and the result of the computation will be stored in the cluster (at least until the kernel is restarted). If no output is specified, the user can still tell if the cell was successfully executed by noting the square parens [ ] to the left of each cell.
When a user launches a notebook for the first time, the square parens are empty  indicating that these cells have not yet been run during this cluster session. Running a cell will cause this indicator to change: an asterisk inside the parens [*] indicates the cell is running. After a given cell has been executed, the asterisk will be replaced by an integer representing the number of times the cluster has executed any cell since the kernel last started. There is nothing to stop you from executing the same cell multiple times, as shown below.
If you clear the outputs by going to the dropdown menu Cell>All Outputs and selecting "Clear", the integer parens will all be replaced by empty parens again, but the integer count will only reset to zero if you restart the kernel.