Hands-on practice setting up and running a Jupyter Notebook in an Azure cloud environment VM. This workspace is the last of the three tutorials that feature a mock study of the correlation between heights and grades for a cohort of 7th, 8th, and 9th graders. In this tutorial, you'll learn how to run an interactive JupyterLab analysis to import and compare data from a small cohort and the full dataset.
Importing and plotting data generated in a workflow analysis is the last step to discover if there is a correlation between student height and grades.
Jupyter tutorial details
The Jupyter Quickstart is intended to familiarize you with interactive analysis on Terra on Azure. In this tutorial, you'll run a JupyterLab analysis to plot height versus average GPA for students in the mock study.
Prerequisite
Before starting the Jupyter Quickstart, you should complete the Data Tables Quickstart, which introduces the mock study data and analysis tools. The Workflows Quickstart is optional but recommended.
If you don't work through the workflows quickstart, you will need to download the completed student table from the Intro to Terra workspace and upload it to your own copy (instructions are included below).
After working through the Jupyter Quickstart exercises, you will know
- How to set up a Jupyter Cloud Environment in Terra on Azure
- How to open and run a Jupyter Notebook
- How to import primary data from a data table into JupyterLab for analysis
- How to visualize study data and compare results from small and large datasets
Estimated time and cost to completeYou should be able to complete the Quickstart tutorial in half an hour. Running the tutorial will cost less than $0.25 (Azure Cloud data storage and VM costs). However, there are additional infrastructure costs (see below).
Additional requirements and costs You will need to have an Azure-subscription-backed Terra Billing project and your own copy of the Quickstart workspace to complete the tutorial.
Making a workspace will incur additional infrastructure costs (typically ~$5/day). See Overview: Costs and Billing Azure) for more details.
Notebooks Quickstart flow
New to Jupyter notebooks?While it is possible to set up the Azure cloud environment and run a notebook without any prior knowledge of Jupyter Notebooks, it may be useful to read the Jupyter 101 notebook to learn the basics
- How to use a notebook
- How to install packages
- How to import data
- Why notebooks are useful in biomedical research
Part 1: Set up Jupyter Cloud Environment
The interactive Jupyter Lab app runs in a customizable Azure Cloud Environment VM that you will need to set up and launch the first time you run a notebook in the workspace.
Step-by-step instructions
If you didn't do the Workflows QuickstartYou will need to update the data table before running your notebook analysis.
1. Go to the read-only Intro to Terra workspace.
2. Click the file icon in the right sidebar to reveal the workspace Cloud storage directory.
3. Click the completed-student-table.tsv link.
4. Click the blue Download button and save the TSV to your local machine.
5. In your own workspace Data page, click the Import Data button and Upload TSV.
6. Name the table student and upload the saved local file.
Note that this will overwrite the existing student table, but that's OK!
1.1. Start in the Analyses tab of your copy of the Intro to Terra Quickstart workspace.
1.2. Click the cloud icon in the right sidebar.
1.3. In the Cloud Environment Details pane, click the gear icon (Environment settings) under the Jupyter logo. This will surface the Jupyter Cloud Environment default pane (below).
1.4. Click the Create button to start a Jupyter Cloud Environment with the default settings.
What to expect
Once you click Create, it will take 10-15 minutes for the Jupyter Cloud Environment to start. During this time, Terra is requesting and setting up the Google resources for the virtual machine that will run the notebook.
Note on cost when running a notebookBilling for your Cloud Environment begins when your Cloud Environment is created and continues until you pause or delete it, regardless of whether the VM is running any computations. Note that Terra has an autopause that is set to 30 minutes by default to keep from runaway costs.
Every time you open a notebook, a new Jupyter kernel is created. If you have multiple notebooks open and running in a single workspace, they will all consume resources (memory and CPU) on the same Cloud Environment.
Note on billing for Detachable Persistent Disks
When you delete your Cloud Environment, you can choose to keep your Detachable Persistent Disk. If you do, you will incur a charge of $2.00/month (50 GB disk).
Note that your workspace Cloud Environment is yours and yours aloneNo one else - even collaborators in the same workspace - can view or access data generated in a notebook and stored in the cloud environment persistent disk The reason for this is security. We store your Google credentials on the Google VM, which cannot be shared with other users.
Part 2: Run the notebook Analyze data from a table
Once your Cloud Environment is running, you'll see a green dot just below the Jupyter logo in the right sidebar. You can now dive into this tutorial notebook to answer the burning question of whether and how height influences GPA (for a cohort of middle schoolers)
Step-by-step instructions
2.1. Click on “Analyze-data-from-a-table” in the Analyses tab of your clone of the QuickStart workspace.
2.2. Click on the Open button at the top of the Preview so you can edit and run the code in the notebook.
2.3. The notebook is intended to be self-guided. You should read through the documentation, clicking into each code cell to run (use "shift-enter" as a keyboard shortcut).
Thought Questions
Looking at the graph of the eight-student subset, what seems to be the relationship between height and cumulative GPA in the subset cohort? Does this relationship seem reasonable?
-
It seems from this plot that GPA increases linearly with height (taller students get better grades).
This seems unlikely, but it is a sample set of only eight students...
How does your graph change based on plotting the full dataset? Is this graph more expected? What did this exercise show you about the importance of sample size in data?
-
The larger sample size gives a clearer picture of the relationship (or lack thereof) between height and GPA.
This simple analysis example shoes the basic steps to run a Notebook analysis in Terra (GCP) and also demonstrates the importance of large sample sizes to boost the confidence in your study results.
🎉 🎉 Congratulations! You've finished the three Terra on GCP Quickstart tutorials 🎉 🎉
Part 3 (optional): Run the Jupyter 101 notebook
If you're new to Jupyter Notebooks, you might find it helpful to run the Jupyter 101 notebook included in the Quickstart workspace. It has lots of useful information as well as hands-on exercises to jump-start your notebook analysis journey.
Next Steps
If you've completed the three Terra on Azure Quickstart tutorials, you should understand enough of the basics in Terra to get started with some work of your own.
See Terra on Azure Featured Workspaces in the Terra Showcase Library for curated workspaces that highlight a range of scientific use cases.
Dive into this tutorial notebook to learn more about
- Why notebooks are used in biomedical research
- The relationship between the notebook and the workspace
- Jupyter Notebook basics: how to use a notebook, install packages, and import modules
- Common libraries in data analysis and popular tutorial notebooks