Learn the basics of how to customize and launch an interactive Jupyter analysis on Terra in the Terra Notebooks Quickstart tutorial. See learning objectives and time and cost to complete each part, as well as step-by-step instructions, below.
What is a Jupyter notebook?
A Jupyter Notebook is an interactive analysis tool that includes code cells for manipulating and visualizing data in real time as well as documentation to make it easier to share and reproduce your analysis.
Quickstart learning objectives + time and cost to completeWhat you will learn
1) How to create a Jupyter Cloud Environment to run your notebook
2) How to work in/run a notebook
3) How to copy data from different cloud locations into the Cloud Environment for analysis
- Workspace storage (i.e. Google bucket)
- Data table
- The Terra Data Library
- Public GCP BigQuery
How much will it cost? How long will it take?
The tutorial has an interactive tour (5-10 minutes), a Jupyter 101 tutorial notebook (10-20 minutes), and four optional demo notebooks (5 minutes each). The entire QuickStart should take just over half an hour and will cost less than $1.00 (GCP compute charges).
Notebooks Quickstart Overview
A Jupyter notebook is an application that runs in a virtual Cloud Environment, which includes a virtual machine (VM), software and packages, and a detachable Persistent Disk storage. The notebook includes executable code (in either Python or R), that lets you interact with your data in real time. Documentation that explains each step means you don't have to be a coding expert to run a notebook analysis.
Follow the instructions below to complete four steps in the Quickstart.
Notebooks in this workspace
Jupyter Notebooks 101 is an interactive tutorial if you're new to Jupyter. If you're already familiar with Jupyter notebooks, you can skip to one of the optional data demo notebooks after taking the interactive Jupyter setup walkthrough.
Jupyter Notebooks 101
If you're not familiar with Jupyter notebooks, this gives a general intro (or refresher) on basics like how to insert a markdown or code cell, how to run code, etc.
Option 1: Data stored in a workspace data table
If your primary data or metadata are organized in a data table, you will first need to bring it into the storage for the VM running your notebook (i.e. Cloud Environment Persistent Disk). This notebook first installs additional libraries and packages you need. Then it brings in phenotypic data from the subject table to the Cloud Environment PD so you can generate some plots.
Option 2: Data in workspace storage (Google bucket)
If your primary data or metadata are in workspace storage (Google bucket), you will first need to bring it into the storage for the VM running your notebook (i.e. Cloud Environment Persistent Disk). This notebook first installs additional libraries and packages you need. Then it copies phenotypic data from a CSV in a public bucket to your workspace storage (cloned copies do not include data from the original workspace storage (i.e. Google bucket). It then copies the data to the Cloud Environment VM so you can generate some plots.
Option 3: Data from the Terra Data Library (i.e. consortia)
For this option, you'll bring in 1,000 genomes cohort data from the Terra Data Library into the Cloud Environment Persistent Disk for analysis in a notebook. A couple of plotting functions serve as a proxy for an analysis. Note that for this notebook, there are two additional preliminary steps (you'll need to select a custom cohort of data using a data explorer and import the cohort to the workspace). You can also use the included cohort (in the Data tab).
Option 4: Public data in BigQuery
Explore examples of two additional ways to access data in the cloud (unstructured data in a Google bucket and tabular data in BigQuery) in a notebook.
Step 1: Make your own copy of the Quickstart workspace
The Terra-Notebooks-Quickstart workspace is “Read only”. For hands-on practice, you'll need to be able to spin up a Cloud Environment. Making you own copy of the workspace makes you owner and allows you to do that. Once you're in your own copy of the workspace, you'll be ready to set up and run your own interactive analysis in a notebook.
1. Start by clicking on the round circle with three dots at the upper right hand corner and select "Clone from the dropdown menu. Then follow the directions below to complete the form:
Clone workspace form
2. Rename your copy something memorable
It may help to write down the name of your workspace
3. Choose your billing project
Note that this can be "getting started" credits
from Google Cloud! Don’t worry, you’ll have plenty left over when you’ve completed the Quickstart exercises.
4. Do not select an Authorization Domain, since these are only required when using restricted-access data
5. Click the “Clone Workspace” button to make your own copy
Step 2: Take the interactive Cloud Environment setup tour
Components in the Jupyter Cloud Environment
- The virtual machine (VM) that runs the notebook code
- Pre-installed software and dependencies
- Dedicated persistent disk storage.
All of these components can be customized in Terra! The Quickstart workspace includes an interactive walkthrough to guide you in setting up your Jupyter Cloud Environment.
2.1. Click on the Analyses tab of your copy of the Notebooks Quickstart workspace.
2.2. The tour will start automatically. It will only run one time (if you want to see it again, you'll need to make another copy of the workspace). It will walk you through the Cloud Environment setup that you need to do to run a notebook.
2.3. Follow the instructions in green.
What happens when I open a notebook for the first time? When you first open a notebook in a workspace, Terra creates your Cloud Environment (VM or cluster). This can take 5-10 minutes. During this time, don’t refresh the page or try to resume the notebook.
During creation, you will see a read-only notebook copy and a blue dot under the Jupyter logo in the sidebar that indicates Terra is creating the virtual environment.
If you open any notebook again in your workspace, it won't take as long, as Terra will only need to resume the application compute, not create it.
Step 3: Explore the Jupyter Notebooks 101 tutorial
Once you've set up your Cloud Environment, you can dive into this tutorial notebook to learn more about:
- Why notebooks are used in biomedical research
- The relationship between the notebook and the workspace
- Jupyter Notebook basics: how to use a notebook, install packages, and import modules
- Common libraries in data analysis and popular tutorial notebooks
3.1. Click on “Jupyter-Notebooks-101” in the Analyses tab or Notebooks tab of your clone of the QuickStart workspace
3.2. Spin up a Cloud Environment VM (default settings).
3.3. Click on the Open button at the top of the Preview so you can edit and run the code in the notebook.
To learn more about "Playground" mode, see this article
3.4. Wait for the Cloud Environment VM to start. Note - this can take 2-3 minutes the first time you create a notebook virtual environment in a workspace.
3.5. Skim the read-only view for an overview of what's in the notebook.
Once the Cloud Environment is up and running, follow the instructions to explore the tutorial.
3.6. Run the first code cell: click in the cell and then click Run from the menu at the top to execute the code in that cell. Note - you can also use the shortcut “shift” + “return” to run a cell.
3.7. Wait for the * at the left of the cell to turn into a number, i.e. ['*'] --> , which indicates that the code in the cell has executed successfully.
3.8. Click in the next cell and then click Run to execute the code in the next cell. Wait for execution to complete and review the results,
3.9. Continue steps 6-8 until you've executed the code in all cells from the notebook. Make sure to read the documentation so you understand the point of each code block. Note that you don't have to be able to code to run a notebook analysis, as long as your notebook has enough documentation!
3.10. When you've run all the code cells, pause the Jupyter app by clicking the Jupyter logo in the sidebar and clicking on the pause icon.
3.11. Close the notebook by clicking the green x in the top right.
Step 4 (optional): Run a demo analysis
In order to analyze data in a notebook, you first need to bring it into the Cloud Environment storage (Persistent Disk). There are four optional notebooks that show how to bring in data from four different places into a notebook for analysis. Most of these can be run as stand-alone tutorials.
Option 1: Data in a data table
In this notebook you'll bring phenotypic data from the subject data table to the notebook VM. Plotting functions (included) demonstrate the kind of analysis you might then do on this kind of data.
Option 2: Data in the workspace bucket
In this notebook you'll bring in the same phenotypic data - this time stored in a CSV file in the workspace bucket - to the notebook VM. Plotting functions (included) demonstrate the kind of analysis you might then do on this data.
Option 3: Data from the Terra Data Library
Many consortia make data stored in external repositories available in the Terra Data Library. As a demonstration of the process of analysing such data, in this notebook you will use a data explorer to create a custom cohort of 1,000 Genomes data from the Terra Data Library.
Before running the notebook you will need to do two steps1. Choose and import a custom cohort of 1,000 Genomes data from the Terra Data Library
2. Export the cohort to your workspace
See step-by-step instructions here.
This diagram illustrates the platforms and tools you will be using.
Option 4: Public data in the cloud
Explore two additional ways to access data in the cloud (unstructured data in a Google bucket and tabular data in BigQuery) in a notebook.
Additional notebooks resources
To learn more about interactive statistics and visualization with Jupyter notebooks, click here for Terra notebooks documentation.
To learn more about how to customize the Cloud Environment, see Your interactive analysis app VM (Cloud Environment).
For notebooks-focused featured workspaces, see the Showcase Library.