Hands-on practice setting up and running a Jupyter notebook in a cloud environment VM. This workspace is the last of the three tutorials that feature a mock study of the correlation between heights and grades for a cohort of 7th, 8th, and 9th graders. In this tutorial, you'll learn how to run an interactive Jupyter analysis to import and compare data from a small cohort and the full dataset.
Importing and plotting data generated in a workflow analysis is the last step to discover if there is a correlation between student height and grades.
Quickstart overview
T101 Notebooks 101 Learning Objectives
The T101 Quickstarts are a series of three tutorial workspaces featuring a mock study of the relationship between a student's height and their grade point average. The mock study is a little silly, but in doing it you'll learn how to use functionality typical for many bioinformatics investigations. In this tutorial, you'll plot height versus average GPA for students in the study to learn how to run Jupyter Notebook in Terra.
notebooks quickstart is intended to help you become familiar with interactive analysis on Terra. Although the focus is a Jupyter analysis, many of the steps for setting up a cloud environment are similar when running Galaxy or RStudio.
After working through the quickstart exercises, you will know
- How to set up a Jupyter Cloud Environment in Terra
- How to open and run a Jupyter Notebook
- How to import primary data from a data table into a notebook for analysis
- How to visualize study data and compare results from small and large datasets
Estimated Time and cost requirements You should be able to complete the Quickstart tutorial in half an hour or less. Running the tutorial will cost less than $0.25 (Google Cloud data storage and VM costs).
Additional requirements
You will need to have a Terra Billing project and your own copy of the workspace to complete the tutorial.
Notebooks Quickstart flow
There are three steps to complete the notebook quickstart.
Part 1: Set up a Jupyter Cloud Environment
Part 2: Import data and run the Analyze-student-table-data
notebook
Part 3: (optional) Run the Jupyter 101
notebook
New to Jupyter notebooks?While it is possible to set up the Jupyter cloud environment and run a notebook without any prior knowledge of Jupyter Notebooks, it may be useful to read the Jupyter 101 notebook to learn the basics
- How to use a notebook
- How to install packages
- How to import data
- Why notebooks are useful in biomedical research
First: Make your own copy of the Data Quickstart workspace
The T101-Notebooks-Quickstart is “Read only”. For hands-on practice, you'll need to be able to upload data to workspace storage, which has a cost. Making your own copy of the Data-Tables-Quickstart workspace gives you that. If you haven't already done so, you'll need to make your own copy of this workspace following the directions below.
Start by clicking on the round circle with three dots at the upper right hand corner and select "Clone" from the dropdown menu:
-
- Rename your copy something memorable
It may help to write down the name of your workspace - Choose your billing project
Note that this can be free credits! Don’t worry, you’ll have plenty left over when you’ve completed the Quickstart exercises. - Do not select an Authorization Domain, since these are only required when using restricted-access data
- Click the “Clone Workspace” button to make your own copy
- Rename your copy something memorable
T101 Notebooks Quickstart - step-by-step guide
Once you're in your own copy of the workspace, you can get hands-on to learn about analyzing data in a notebook!
Video walkthrough instructions
Part 1: Set up Jupyter Cloud Environment
The interactive Jupyter app runs in a fully customizable Cloud Environment VM that you will need to set up and launch the first time you run a notebook in the workspace.
Step-by-step instructions
1.1. Start in the Analyses tab of your workspace.
1.2. Click the cloud icon in the right sidebar.
1.3. In the Cloud Environment Details pane, click the gear icon (Environment settings) under the Jupyter logo. This will surface the Jupyter Cloud Environment default pane (below).
1.4. Click the Create button to start a Jupyter Cloud Environment with the default settings.
What to expect
Once you click Create, it will take a few minutes for the Jupyter Cloud Environment to start. During this time, Terra is requesting and setting up the Google resources for the virtual machine that will run the notebook.
You can also get to the (Jupyter) Cloud Environment pane by clicking the notebook name
Note on billing when running a notebook
Billing for your Cloud Environment begins when your Cloud Environment is created and continues until you pause or delete it, regardless of whether the VM is running any computations.
Every time you open a notebook, a new Jupyter kernel is created. If you have multiple notebooks open and running in a single workspace, they will all consume resources (memory and CPU) on the same Cloud Environment.
Note on billing for Detachable Persistent Disks
When you delete your Cloud Environment, you can choose to keep your Detachable Persistent Disk. If you do, you will incur a charge of $2.00/month (50 GB disk).
Note that your workspace Cloud Environment is yours and yours aloneNo one else - even collaborators in the same workspace - can view or access data generated in a notebook and stored in the cloud environment persistent disk The reason for this is security. We store your Google credentials on the Google VM, which cannot be shared with other users.
Part 2: Run the notebook Analyze student data in a table
Once your Cloud Environment is running, you'll see a green dot just below the Jupyter logo in the right sidebar. You can now dive into this tutorial notebook to answer the burning question of whether and how height influences GPA (for a cohort of middle-schoolers)
Step-by-step instructions
2.1. Click on “Analyze-student-data-in-a-table” in the Analyses tab of your clone of the QuickStart workspace.
2.2. Click on the Open button at the top of the Preview so you can edit and run the code in the notebook.
To learn more about "Playground" mode, see this article
2.3. Skim the read-only view for an overview of what's in the notebook.
2.4. Run the first code cell: click in the cell and then click Run from the menu at the top to execute the code in that cell. Note - you can also use the shortcut “shift” + “return” to run a cell.
2.5. Wait for the * at the left of the cell to turn into a number, i.e. ['*'] --> [4], which indicates that the code in the cell has executed successfully.
2.6. Click in the next cell and then click Run to execute the code in the next cell. Wait for execution to complete and review the results,
2.7. Repeat step 6 until you've executed the code in all cells from the notebook. Make sure to read the documentation to understand the point of each code block. Note that good documentation means you don't have to be able to code to run a notebook analysis!
2.8. When you've run all the code cells, pause the Jupyter cloud environment by clicking the Jupyter logo in the sidebar and clicking on the pause icon.
2.9. Close the notebook by clicking the green x in the top right.
Thought Questions
Looking at the graph, what is the relationship between height and cumulative GPA in the subset cohort? Does this relationship seem reasonable?
How does your graph change based on plotting the full dataset? Is this graph more expected? What did this exercise show you about the importance of sample size in data?
Part 3 (optional): Explore the Jupyter 101 tutorial notebook
If you aren't familiar with Jupyter notebooks, you can try out this primer at your own pace.
Dive into this tutorial notebook to learn more about
- Why notebooks are used in biomedical research
- The relationship between the notebook and the workspace
- Jupyter Notebook basics: how to use a notebook, install packages, and import modules
- Common libraries in data analysis and popular tutorial notebooks