Terra (GCP) Quickstart 3: Notebooks

Allie Cliffe
  • Updated

Hands-on practice setting up and running a Jupyter notebook in a cloud environment VM. This workspace is the last of three tutorials that feature a mock study of the correlation between heights and grades for a cohort of 7th, 8th, and 9th graders. In part 3 of the Quickstart, you'll run an interactive Jupyter analysis to import and compare data from a small cohort and the full dataset.

Importing and plotting data generated in a workflow analysis is the last step to discover if there is a correlation between student height and grades.

Prerequisite

You should have already completed the Data Tables and Workflows tutorials. You will work in your copy of the Quickstart workspace to get hands-on to learn about analyzing data in a notebook.

Notebooks tutorial learning objectives

In this tutorial, you'll plot height versus average GPA for students in the study and arrive at the answer to the question you're trying to answer: does a student's height affect their grade point average?

Part 3 of the quickstart is intended to help you become familiar with interactive analysis on Terra. Although the focus is a Jupyter analysis, many of the steps for setting up a cloud environment are similar when running Galaxy or RStudio.  

After working through the quickstart exercises, you will know how to

  1. Set up a Jupyter Cloud Environment in Terra 
  2. Open and run a Jupyter Notebook
  3. Import primary data from a data table into a notebook for analysis
  4. Visualize study data and compare results from small and large datasets

Estimated Time and cost requirements You should be able to complete part 3 of the Quickstart in half an hour or less. Running the tutorial will cost less than $0.25 (Google Cloud data storage and VM costs).

Additional requirements
You should have completed part 1 (data tables) and part 2 (workflows) of the quickstart in your own workspace.

Three steps to complete the notebook quickstart.
Notebooks-Quickstart_Tutorial-flow.png

Part 1: Set up a Jupyter Cloud Environment

Part 2: Import data and run the Analyze-student-table-data notebook

Part 3: (optional) Run the Jupyter 101 notebook

New to Jupyter notebooks?While it is possible to set up the Jupyter cloud environment and run a notebook without any prior knowledge of Jupyter Notebooks, it may be useful to read the Jupyter 101 notebook to learn the basics
- How to use a notebook
- How to install packages
- How to import data
- Why notebooks are useful in biomedical research

T101 Notebooks Quickstart - step-by-step guide

Work in your own copy of the Quickstart workspace

Now that you've gotten familiar with the mock study data and done some initial processing of the raw data, you're ready to view the results. You'll get hands-on practice with interactive analysis in Terra running a notebook to plot the heights versus grades for the subset and the full dataset of middle school students. 

Video walkthrough instructions

1. Set up Jupyter Cloud Environment

The interactive Jupyter app runs in a fully customizable Cloud Environment VM that you will need to set up and launch the first time you run a notebook in the workspace. 

Step-by-step instructions

1.1. Start in the Analyses tab of your workspace.

1.2. Click the cloud icon in the right sidebar
Analyses-tab_Cloud-Environment-icons_Screen_shot.png

1.3. In the Cloud Environment Details pane, click the gear icon (Environment settings) under the Jupyter logo. This will surface the Jupyter Cloud Environment default pane (below).
Analysies-tab_Jupyter-Cloud-Environment-settings_Screen_shot.png

1.4. Click the Create button to start a Jupyter Cloud Environment with the default settings.
Default-Jupyter-Cloud-Environment-pane_Screen_shot.png

What to expect

Once you click Create, it will take a few minutes for the Jupyter Cloud Environment to start. During this time, Terra is requesting and setting up the Google resources for the virtual machine that will run the notebook. 

You can also get to the (Jupyter) Cloud Environment pane by clicking the notebook name

Note on billing when running a notebook

Billing for your Cloud Environment begins when your Cloud Environment is created and continues until you pause or delete it, regardless of whether the VM is running any computations.

Every time you open a notebook, a new Jupyter kernel is created. If you have multiple notebooks open and running in a single workspace, they will all consume resources (memory and CPU) on the same Cloud Environment.

Note on billing for Detachable Persistent Disks

When you delete your Cloud Environment, you can choose to keep your Detachable Persistent Disk. If you do, you will incur a charge of $2.00/month (50 GB disk).

Note that your workspace Cloud Environment is yours and yours aloneNo one else - even collaborators in the same workspace - can view or access data generated in a notebook and stored in the cloud environment persistent disk The reason for this is security. We store your Google credentials on the Google VM, which cannot be shared with other users.

2. Run the notebook Analyze student data in a table

Once your Cloud Environment is running, you'll see a green dot just below the Jupyter logo in the right sidebar. You can now dive into this tutorial notebook to answer the burning question of whether and how height influences GPA (for a cohort of middle-schoolers)

Step-by-step instructions

2.1. Click on “Analyze-student-data-in-a-table” in the Analyses tab of your clone of the QuickStart workspace.

2.2. Click on the Open button at the top of the Preview so you can edit and run the code in the notebook.

Notebooks-Quickstart_OPEN-Jupyter-101_Screen_shot.png

To learn more about "Playground" mode, see this article

2.3. Skim the read-only view for an overview of what's in the notebook.

2.4. Run the first code cell: click in the cell and then click Run from the menu at the top to execute the code in that cell. Note - you can also use the shortcut “shift” + “return” to run a cell. 

2.5. Wait for the * at the left of the cell to turn into a number, i.e. ['*'] --> [4], which indicates that the code in the cell has executed successfully.

2.6. Click in the next cell and then click Run to execute the code in the next cell. Wait for execution to complete and review the results,

2.7. Repeat step 6 until you've executed the code in all cells from the notebook. Make sure to read the documentation to understand the point of each code block. Note that good documentation means you don't have to be able to code to run a notebook analysis!

2.8. When you've run all the code cells, pause the Jupyter cloud environment by clicking the Jupyter logo in the sidebar and clicking on the pause icon

2.9. Close the notebook by clicking the green x in the top right.

Thought Questions

Looking at the graph, what is the relationship between height and cumulative GPA in the subset cohort? Does this relationship seem reasonable? 

How does your graph change based on plotting the full dataset? Is this graph more expected? What did this exercise show you about the importance of sample size in data?

3. (optional) Explore the Jupyter 101 tutorial notebook

If you aren't familiar with Jupyter notebooks, you can try out this primer at your own pace.

Dive into this tutorial notebook to learn more about

  • Why notebooks are used in biomedical research
  • The relationship between the notebook and the workspace
  • Jupyter Notebook basics: how to use a notebook, install packages, and import modules
  • Common libraries in data analysis and popular tutorial notebooks

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.