Jupyter notebooks are an open-source analysis environment where you can gain insight into study data in real time with interactive analysis and visuals. You can import data - including processed genomics, phenotype and transcriptomics data stored in the cloud - and analyze with custom or pre-built libraries using R or Python.
Notebooks combine analysis methods and findings in a single place, in a form that's straightforward to distribute. A logical evolution of the traditional scientific paper, Jupyter notebooks dramatically shorten the path between reading how an analysis was done and actually being able to reproduce it. It's difficult to overstate how powerful this concept is and what an impact notebooks can have on the reusability and reproducibility of findings in the computational sciences.
Terra's integrated, open-source Jupyter Notebooks environment is accessible to newcomers and enables portability and reproducibility. Read on to learn more about Jupyter notebooks and how to use them in Terra.
Jupyter notebooks 101
Jupyter notebooks are an open-source app that runs in a browser. They contain rich text commentary as well as code cells. They can execute any Python or R-based commands - including GATK - on real data in real-time.
When you send a copy of a notebook with code cells that have been run to collaborators, they can view your results embedded in the .jpynb file.
Find a notebook in a public Terra workspace you'd like to play around with? Clone yourself a copy. Need to develop/refine an analysis in tandem with distant collaborators? You can invite them to your workspace, and work directly in the same notebook with them.
New to Jupyter notebooks?
Check out the
Intro to Jupyter notebook in the Terra-Notebooks-QuickStart workspace. It covers
- Why notebooks are used in biomedical research
- The relationship between the notebook and the workspace
- Jupyter Notebook basics: How to use a notebook, install packages, and import modules
- What are common libraries in data analysis and popular tutorial notebooks
Already familiar with Jupyter notebooks ?
The additional notebooks in the QuickStart offer hands-on practice using notebooks for several different data use-cases. You'll learn how to:
- Browse 1,000 genomes data in the Data Library and define a subset of data (cohort) for analysis
- Import the cohort from the Terra Data Library to the workspace
- Set up a virtual cloud environment to run the notebook
- Analyze the cohort of data interactively
How to run a notebook on Terra
1. Open the notebook in the "Notebooks" tab by clicking on the card or notebook name.
2. Choose "Edit" or "Playground" mode
To avoid overwriting another collaborator working in the same notebook, Terra will only allow you to edit a shared notebook in "Playground" mode.
3. Set up your virtual cloud environment
If you have not run an interactive analysis before, you will need to configure the cloud environment that Terra will create for you to run your notebook. You can customize the installed packages and libraries and compute resources of your virtual machine. To learn more about Understanding and Adjusting your Cloud Environment, see Understanding and adjusting your Cloud Environment.
Four ways to make (clone) your own notebook from a template
1. Clone the workspace with a notebook you'd like to explore
Cloning a workspace will automatically create copies of all of the notebooks in that workspace, allowing you to quickly duplicate the notebook analysis in your own secure sandbox.
2. Clone an individual notebook to your own workspace
This is especially useful when you only want to use a few notebooks in public workspaces that include a large number of sample notebooks.
3. Create a new notebook from scratch
If you are an experienced notebook user, you may want to create a notebook from scratch.
4. Upload a template notebook from outside the Terra platform
This is similar to creating a new notebook. You'll need to make sure to upload a JSON file format saved with the
Code and markdown cells in a notebook
Cells are the building blocks of a notebook. Each cell has a "type" (Code/Markdown/Raw NBConvert) that determines how the application compute will interpret the instructions in the cell.
The cell type can be seen or changed either in the Cell>Cell Type drop down menu (see below)
or on the right of the toolbar of function shortcuts located just below the menu bar.
How to run a code cell
1. Select the cell and press
Enteron your keyboard (your keyboard may say "return" instead of "enter").
2. Click the "Run" icon in the menu bar.
3. Use the appropriate command from the Cell dropdown menu.
How does a code cell run?
When you run a code cell, the application compute kernel interprets the code, passes those instructions to the actual operating system of the machine that the notebook is running on(e.g. Python, R), and retrieves results to display them in the notebook. As the command runs, the output log of the command appears right below the code cell (see the screenshot below).
If the code in the cell doesn't match the kernel's language, the application compute will return an error. If the code is correct but specifies no outputs, the code will run and the result of the computation will be stored in the notebook cloud environment until the kernel is restarted. If no output is specified, you will know the code was successfully executed by noting the number in the square parens [ ] to the left of the cell.
How to know if your code cell has executed
When you launch a notebook for the first time, the square parens to the left of each code cell are empty , indicating these cells have not yet been run during this session. An asterisk inside the parens [*] means the cell is running. Once the command has been executed, the asterisk will be replaced by an integer representing the number of commands executed since the kernel started. There is nothing to stop you from executing the same cell multiple times, as in the screenshot above.
If you clear the outputs by going to the dropdown menu Cell>All Outputs and selecting "Clear", the integer parens will all be replaced by empty parens again. However, the integer count will only reset to zero if you restart the kernel.
How to edit content in a markdown cell
To edit the content of a markdown cell, double click on the cell to edit. See the clip below for example expected behavior. Markdown is a lightweight plain text formatting language. The syntax controls how the text appears - bold/italic, header size, section enumeration, etc.
Markdown cells do not interpret code
The VM can only interpret Markdown-based syntax in a markdown type cell. Running a Markdown-type cell that contains Python code, for example, will result in the code being reproduced as plain text.
Two examples of Markdown syntax
- Italicizing using a single asterisk on either side (*italics* => italics)
- Modifying text size using hashtags
A more comprehensive listing of Markdown syntax can be found in this helpful Markdown Cheatsheet.
Next steps and additional resources
To learn more about notebooks and how to customise your notebook cloud environment in Terra, see Understanding and adjusting your Cloud Environment.
The Terra Notebooks Playground workspace contains Jupyter Notebooks that allow you to explore notebook functionality. These include both R and Python setup notebooks, and template notebooks for accessing and analyzing different kinds of data in different formats and cloud locations.
The BioData catalyst Collection includes notebooks that are helpful for BioData Catalyst users in Terra. The notebooks in this collection can be copied to a different workspace to aid users in their individual research needs.