Terra on Azure - Data Tables Quickstart Guide

User Ed
  • Updated

If you're interested in using Terra on Azure, please email terra-enterprise@broadinstitute.org.

Learn how to organize and manage data tables in your workspace by completing the Data Tables Quickstart. Workspace tables can make your life easier by helping you manage data in one central place - including files from different cloud storage locations. 

If you're new to working in the cloud, see Overview: Data in Terra on Azure for a useful conceptual overview.  

Quickstart Overview

We all know the best way to learn how to use a new platform is to use it. The Intro to Terra on Azure Quickstart walks you through a mock study to get hands-on practice with basic Terra functionality without spending a lot of time or money. 

Your mission, should you choose to accept it, is to discover if a student's height affects their grade. 
The mock study is a little silly, but in doing it, you'll learn how to work in Terra. The hands-on portions highlight functionality typically used for many bioinformatics investigations. Hopefully, you'll have a bit of fun, too! At the end, we’ll point you to some more realistic examples to try out what you’ve learned.

About the mock study

This is the first in a series of three Quickstart tutorials that walk through a completely fake study of the correlation between height and grades for a cohort of 7th, 8th, and 9th graders. As with any work on Terra, you'll do all the work in a dedicated project workspace, which you can share with collaborators. 

Study steps 

  1. Explore data (Data Tables Quickstart)
    Survey data includes heights and subject grades for 86 students in the mock study 
  2. Process data (Workflows Quickstart)
    Calculate the average GPA of each student in the study 
  3. Visualize data (Jupyter Quickstart)
    The notebook plots height versus average GPA, so you can draw your conclusions from the data  

Data tables tutorial overview

Learning objectives

The data tables quickstart is intended to help you understand data tables in your workspace - how to use them to store and organize input data and how to add data in a table to your Terra workspace.

After working through the exercises in the data tables quickstart, you will know how to

  1. Organize and manage data in a data table in your Terra workspace
  2. Add a data table to your workspace (optional)
  3. Import realistic biomedical data tables from a public workspace (optional)

Data tables Quickstart Flow

Terra-on-Azure_Data-Tables-Quickstart-flow_Diagram.png

Estimated time and cost to completeYou should be able to complete the Quickstart tutorial in half an hour. Running the tutorial will cost less than $0.25 (Azure Cloud data storage and VM costs).

Additional costs and requirements
You will need to have an Azure-subscription-backed Terra Billing project and your own copy of the Quickstart workspace to complete the tutorial.

Making a workspace will incur additional infrastructure costs (typically ~$5/day). See Overview: Costs and Billing Azure) for more details. 

Make your own copy of the tutorial workspace

The Intro to Terra Quickstart workspace is “Read-only”. You'll need to be able to upload data to workspace storage, which has a cost. Making your own copy of the Data-Tables-Quickstart workspace gives you that power. If you haven't already done so, you'll need to make your own copy of this workspace following the directions below.

  • 1. Start by clicking on the round circle with three dots at the upper right-hand corner and select Clone from the dropdown menu.
    ToA-Share-workspace_Screenshot.png

    ToA-clone-workspace-modal_Screenshot.png

    2. Rename your copy to something memorable
    It may help to write down the name of your workspace.

    3. Choose your Azure-subscription-backed Terra billing project.

    4. Click the Clone Workspace button to make your own copy

     

What to expect

Once you're in your own copy of the workspace, you can get hands-on to learn about data tables!

Part 1. Explore data in the student table

Tables are like spreadsheets built into your workspace. Part 1 of the tutorial is a guided tour of how to organize data in tables in Terra. 

For a conceptual overview of how tables can help organize data in the cloud, see Managing data in the cloud (data tables in Terra). 

Data tables in your workspace copy

When you create a clone or a new workspace, Terra will automatically launch the cloud infrastructure to power data tables. 

When data tables are ready

Once data tables are launched, you’ll see the active import data button in the top left section of the Data page and a student (86) table in your clone. 

1.1. Open and examine the data table

1. Open your copy of the Quickstart Data Tables workspace.

2. Click on the Data tab. From the left-hand side, click on the student (86) table to see the full table.

ToA-Data-Tables-Quickstart_student-table_Screenshot.png

3. Take a minute to look over the information in the columns and rows.

Thought questions

For background, don't forget to read the section About the mock study above. 

  • Answer: Each row in the student table is a different student.

    Rows represent separate entities of dataEach row is a unique input (an "entity"). You can define and name your table entity by the input data you have.

    For example, each row in a sample table would be a distinct sample. Each row in a subject table would be data about a subject in a study. 

  • Each column is a different piece of information corresponding to the student.

    Tip: the first column in a table is always the unique ID for the entity in that row. In the data tables quickstart, the first column is the unique student ID.

    The unique ID is often non-human readable (a random combination of numbers and letters). To make it easier for a human to understand, there is often a column with a human-friendly ID (like the student's first name). Additional columns store the primary data (the GPAs for three different subjects, plus the student's height and grade level). There are also some columns of useful information - like the units used for the height measurement.

    What data can be stored in a table?Like spreadsheets, tables can be very flexible to contain whatever data you have with as many rows and columns as you need. For example, a sample table could include information about the samples, like a column for the study name or ascension number, the date the sample was taken) as well as a link to the genomic data files in Azure blob cloud storage.

    It doesn't hurt to add additional columns of anything you want to remember about that entity.

  • The student table includes both Primary data (heights and GPAs that will be used in the study) as well as metadata (additional useful data, in this case). If you wanted to reference a file stored in cloud storage (maybe each student's picture?), you would include a link right in the table.

    Primary data versus metadataPrimary data in a Terra data table is the sort that would traditionally be in a CSV. Examples are phenotypic data, i.e., a subject’s clinical information such as disease symptoms, lab results, or demographic data, including age, ethnicity, and gender.   

    Metadata is data about data. This could be links to genomic data files or other information about the genomic data files (file size, the date they were created, experimental process). Metadata is how Terra "stores" large genomic files in a table. The data files are physically located in the workspace or external cloud storage. The table keeps track of the links to the files, so you can use the data no matter where it's actually located.

1.2. Sort data and customize your table view

You can sort data in ascending or descending order by column. Click on the three vertical dot action-icon at the top right of any column. 

You can also make your columns smaller by dragging the dots to the right of the column header. 

Part 2 (optional): Add/modify a data table 

To give you hands-on practice manipulating data tables - you'll create a new table with just eight students. In this exercise, you'll learn how to make your own data table and import it into your workspace, as well as how to manipulate data tables (outside of Terra, for now). 

Step-by-step instructions to make a sm-student table

2.1. Sort the student table alphabetically by their ID by clicking the arrow in the ID column. Make sure it's sorted in ascending order (i.e. 'A' at the top...).  

2.2. Download the table by clicking on the three-dot action icon to the right of the student table (on the left) and selecting Download TSV

2.3. Open the TSV locally in your favorite spreadsheet editor.

Workspace data tables are just TSV filesOnce you export a table from your workspace, you can modify it in a spreadsheet editor to add rows of data or columns of metadata, then upload the modified table to your workspace. The only requirement is that the first column is the unique ID for whatever entity is in the table and the header must be of the form <your-table-name>_id (with _id required). 

2.4. Delete all but the first eight rows of students and save the file as tab-delimited text or tab-separated values (Terra recognizes both). 

2.5. Go back to the Data page and click Import Data > Upload TSV

2.6. In the import data popup, enter the table name sm-student and upload the TSV file you just modified. 

What to expect

Notice that there is a new table in the workspace, sm-student. Open this table to see what it contains. 

Part 3 (optional): Import realistic data to a workspace 

Now that you've worked with the Quickstart mock data, you hopefully understand a bit more about data tables in your workspace and why they’re useful when working in the cloud in Terra. Now let's explore more realistic data tables (in the Featured Workspaces Library) and walk through how to add one to your workspace.

Find data

To see a more relevant example of a data table in Terra, pick a workspace from the Featured Workspaces Library. You can filter (on the left) by scientific use case or experimental strategy.

Once you've found a workspace that interests you, follow the steps below to import it to your workspace to explore

Step-by-step instructions to copy data from another workspace

3.1. Go to the Data page. 

3.2. Click on the three vertical dots to the right of the table you want to import. 

Screenshot 2023-12-05 at 9.58.00 AM.png

3.3. Click Download TSV and save the file to local storage.

3.4. Go back to the Data page of your workspace (the destination for the table), click the Import Data button at the top left, and then Import TSV.

3.5. Fill in the form with the table name, select the TSV on your local machine, and click the Start Import Job button.

Where are files in a table stored?Data tables include links to files in Azure Cloud storage. If you imported a data table that references large data files in the cloud, you will notice that the table in your workspace link to the same source as the original data table. In other words, the data in your table is not actually in your workspace cloud storage at all! That's OK, because Terra can use the links to pull the data for analysis wherever it actually resides. 

Copying tables does not copy data files
Copying tables (metadata) from another workspace will not import any linked files into your workspace bucket. File paths (i.e., URLs) in the imported table may refer to files in the source workspace storage (i.e., Azure Cloud blob storage). If that workspace storage is deleted, your data table will no longer refer to an existing file path.

Takeaway and next steps (analyze the data)

After completing the data tables tutorial, you should know/understand

  • Data tables are your own relational database, associated with your Terra workspace. They work a lot like like spreadsheets
  • How to modify/organize data in a table

Next: Quickstart part 2 - Run a workflow on the data (optional)

Now that you know how to organize and manage data in a table, let’s try running an analysis!

In the Terra on Azure Workflows tutorial, you'll first run a preconfigured workflow to get the GPA (averaged over all three subjects) for a subset of eight students. Then you'll set up a workflow from scratch to run on all 86 students.

Quickstart part 3: Plot the results in a notebook

If you don’t need to learn how to set up and run workflows, you can skip right to the Terra on Azure Jupyter Quickstart to learn how to set up and run an interactive analysis to visualize data.

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.