Interactive statistics and visualization with Jupyter notebooks
FollowJupyter notebooks are an open-source analysis environment where you can visualize and analyze data in real time to gain insight into study data. Import data including processed genomics, phenotype and transcriptomics data stored in the cloud and analyze with custom or pre-built libraries using R or Python.
The basic idea is to combine analysis methods and findings in a single place, in a form that anyone can easily distribute. In a way this is a logical evolution of the traditional scientific paper, but much better because it dramatically shortens the path between reading how an analysis was done and actually being able to reproduce it. It's difficult to overstate how powerful this concept is and what a dramatic impact it can have on the reusability and reproducibility of findings in the computational sciences.
Terra's integrated, open-source Jupyter Notebooks environment is accessible to newcomers and offers as yet unmatched support for portability and reproducibility. Read on to learn more about Jupyter notebooks and how to use them in Terra.
Contents
- Jupyter notebooks 101
- Five ways to create/copy/open a notebook on Terra
- Code and documentation cells in a Notebook
3.1. Editing Markdown Cells in a Notebook
3.2. Running Code Cells in a Notebook - Next steps
1. Jupyter notebooks 101
Jupyter notebooks are an open-source app that run in a browser. They contain rich text commentary that explain briefly what's going on, as well as code cells that include fully functional tool commands (which you can execute to actually run any Python or R-based commands, including GATK, on real data). When you send a copy of a notebook with code cells that have been run to collaborators, they can view your results embedded within the document.
Find a notebook in a public workspace you'd like to play around with? Clone yourself a copy. Need to develop/refine an analysis in tandem with distant collaborators? You can invite them to your workspace, and work directly in the same notebook with them.
If you're completely new to Jupyter notebooks
Check out the optional Intro to Jupyter
notebook in the Terra-Notebooks-QuickStart workspace, which explains:
- Why notebooks are used in biomedical research
- The relationship between the notebook and the workspace
- Jupyter Notebook basics: How to use a notebook, install packages, and import modules
- What are common libraries in data analysis and popular tutorial notebooks
If you are already familiar with Jupyter notebooks
The exercise notebooks in the QuickStart offer hands-on practice using notebooks for interactive analysis in Terra. You'll learn how to:
- Browse 1,000 genomes data in the Data Library and define a subset of data (cohort) for analysis
- Import the cohort from the Terra Data Library to the workspace
- Set up a Jupyter notebook virtual application to analyze the data
- Analyze the cohort of data in an Interactive Jupyter notebook
When you're more comfortable with notebooks in Terra, the Terra Notebooks Playgound workspace contains Jupyter Notebooks that allow users to play with this functionality. These include both R and Python Setup notebooks, and template notebooks for accessing and analyzing data.
2. Five ways to create/copy/open a notebook on Terra
2.1 Click on a notebook in one of your workspaces
Be very careful when choosing this type of interaction in a shared workspace. If someone has shared a master version of a notebook with you, working in "Edit" mode will mean changes you make to the code will be saved in the master copy.
2.2. Clone a workspace that contains a notebook you'd like to explore
Cloning a workspace will automatically create copies of all of the notebooks in that workspace, allowing users to quickly duplicate the notebook analysis in their own secure sand box:
2.3. Clone an individual notebook to your own workspace
This is especially useful when you only want to use a few notebooks in public workspaces that include a large number of sample notebooks:
2.4. Create a new notebook from scratch
If you are an experienced notebook user, you may want to create a clean notebook from scratch:
2.5. Upload a ready-made notebook from outside the Terra platform
This is similar to creating a new notebook, you only have to make sure that what you are uploading is a JSON file format saved with the .ipynb
extension:
3. Code and markdown cells in a notebook
Cells are the building blocks of a notebook. Each cell has a "cell type" (Code/Markdown/Raw NBConvert) that determines how the application compute will interpret the instructions in the cell.
The cell type can be seen or changed either in the Cell>Cell Type drop down menu, or on the right of the toolbar of function shortcuts located just below the menu bar:
To run the contents of a code cell, or format a markdown cell, you simply 1) click on the cell press Shift
+ Enter
on your keyboard, (your keyboard may say "return" instead of "enter"), 2) click the "Run" icon in the menu bar, or 3) select the appropriate command from the Cell dropdown menu:
3.1. Editing Markdown Cells in a Jupyter notebook
To edit the content of a markdown cell, double click on the cell you wish to edit. See the clip below for example expected behavior. Markdown is a lightweight plain text formatting language. The syntax controls things like bold/italic text, header size, section enumeration, etc. The application compute can only interpret Markdown-based syntax in the cell (running a Markdown-type cell that contains, for example, Python code, will result in that Python code simply being reproduced as plain text - Markdown cells do not interpret the code of the selected kernel).
The clip below shows two examples of Markdown syntax:
- Italicizing using a single asterisk on either side (*italics* => italics)
- Modifying text size using hashtags
A more comprehensive listing of Markdown syntax can be found in this helpful Markdown Cheatsheet
3.2. Code Cells in a Jupyter notebook
When you run a code cell, the application compute kernel interprets the code, passes those instructions to the actual operating system of the machine that the notebook is running on(e.g. Python, R), and retrieves results to display them in the notebook. As the command runs, the output log of the command appears right below the code cell (see the screenshot below):
If the code in the cell doesn't match the kernel's language, the application compute will return an error. If the code is correct but specifies no outputs, the code will run and the result of the computation will be stored in the notebook cloud environment until the kernel is restarted. If no output is specified, you will know the code was successfully executed by noting the number in the square parens [ ] to the left of the cell.
How to know if your code cell has run
When launching a notebook for the first time, the square parens are empty [], indicating these cells have not yet been run during this session. When you run a cell, this will change. An asterisk inside the parens [*] means the cell is running. After a given cell has been executed, the asterisk will be replaced by an integer representing the number of times commands executed since the kernel last started. There is nothing to stop you from executing the same cell multiple times, as in the screenshot above.
If you clear the outputs by going to the dropdown menu Cell>All Outputs and selecting "Clear", the integer parens will all be replaced by empty parens again. However, the integer count will only reset to zero if you restart the kernel.
Next steps
To learn more about notebooks, and how to customise your notebook cloud environment in Terra, see this article.
Comments
0 comments
Please sign in to leave a comment.