Terra (GCP) Quickstart 1: Data tables

User Ed
  • Updated

Workspace tables can make your life easier by helping you manage data - including files from different cloud storage locations - in one place. Part 1 of the Terra Quickstart will give you hands-on practice organizing and accessing data with workspace data tables.

If you're new to working in the cloud, see Understanding data in the cloud for a useful conceptual overview of data in Terra.  

About the Quickstart mock study

The Terra Quickstart is a set of of three tutorials that walk through a fake (made up by User Ed!) study. Each tutorial covers Terra functionality often used in an analysis journey in Terra. As with a real study, you will do all the tutorial exercises in a single Terra workspace. 

Explore data > Process raw data > Visualize results

Question to explore: Is there a correlation between height and grades for a cohort of 7th, 8th, and 9th graders? 

  1. Part 1: Data tables: Explore survey data (height plus subject grades for language arts, math and science) for 86 students in the study
  2. Part 2 Workflows: Run a workflow to calculate the average GPA of each student in the study
  3. Part 3 Notebooks: Run a Jupyter notebook plot height versus GPA 

Your mission, should you choose to accept it, is to discover if there is a correlation between student height and grades by the end of all three Quickstart parts. Yes, the mock study is a little silly, but in doing it you'll learn how to use functionality typical for many bioinformatics investigations.

Data tables tutorial learning objectives

The data tables quickstart is intended to help you become familiar with data tables in your workspace - how to use them to store and modify data in the cloud right in your workspace. 

After working through the exercises in the quickstart, you will know how to

  1. Organize and manage data in a data table in your Terra workspace
  2. Edit an existing table
  3. Upload a table (TSV) 
  4. Import realistic biomedical data tables from a public workspace

Three steps to complete the Data tables Quickstart

Data-tables-quickstart_Tutorial-flow.png

  1. Explore and manipulate data in the student table (preloaded)
  2. Make a subset of students (a new student_set table)
  3. (optional) Import/explore a more realistic example data table for a use case that interests you

Estimated time and cost to completeYou should be able to complete the Quickstart tutorial in half an hour. Running the tutorial will cost less than $0.25 (Google Cloud data storage and VM costs).

Additional requirements and costs
You will need to have a Terra Billing project and your own copy of the Quickstart workspace to complete the tutorial.

First: Make your own copy of the Quickstart workspace

The T101-Data-Tables-Quickstart featured workspace is “Read only”. For hands-on practice, you'll need to be able to upload data to workspace storage, which has a cost. Making your own copy of the Data-Tables-Quickstart workspace gives you that power. If you haven't already done so, you'll need to make your own copy of this workspace following the directions below.

Start by clicking on the round circle with three dots at the upper right hand corner and select Clone from the dropdown menu.
Clone-Data-Tables-QuickStart_Screen_Shot.png

  • Data-QuickStart-Part1_Clone-workspace-screen.png
    1. Rename your copy something memorable
      It may help to write down the name of your workspace
    2. Choose your billing project
      Note that this can be free credits! Don’t worry, you’ll have plenty left over when you’ve completed the Quickstart exercises.
    3. Do not select an Authorization Domain, since these are only required when using restricted-access data
    4. Click the “Clone Workspace” button to make your own copy

Once you're in your own copy of the workspace, you can get hands-on to learn about data tables!

Walkthrough demo instructions

1. Explore data in the student table

Tables are like spreadsheets built into your workspace. Part 1 of the tutorial is a guided tour of how to organize and manipulate data in tables in Terra much like you would in your favorite spreadsheet editor. 

For a conceptual overview of how tables can help organize data in the cloud, see Managing data in the cloud with tables. 

Step-by-step instructions

The exercises below guide you through a number of spreadsheet-like actions you can take to manage and manipulate data tables right in your workspace (without downloading the table TSV and editing in a spreadsheet editor). 

1.1. Open and examine the data table

1. Open your copy of the Terra on GCP Quickstart workspace.

2. Click on the Data tab. From the left hand side, click on the student table to see the full table.

3. Take a minute to look over the information in the columns and rows.

Thought questions

For background, don't forget to read the section About the quickstart study above. 

  • Answer: Each row in the student table is a different student.

    Rows represent separate entities of dataEach row is a unique input (an "entity"). You can define and name your table entity by the input data you have.

    If you were working with a sample table, each row would be a distinct sample.

  • Each column is a different piece of information corresponding to the student.

    Tip: the first column in a table is always the unique ID for the entity in that row. In the data tables quickstart, the first column is the unique Student ID.

    The unique ID is often non-human readable (a random combination of numbers and letters). To make it easier for a human to understand, there is often a column with a human-friendly ID. In this case, a column with the student's first name. Additional columns store the primary data (the GPAs for three different subjects, plus the student's height and grade level). There are also some columns of useful information - like the units used for the hieght measurement.

    What data can be stored in a table?Like spreadsheets, tables can be very flexible to contain whatever data you have with as many rows and columns as you need. For example, a sample table could include information about the samples, like a column for the study name or ascension number, the date the sample was taken) as well as a link to the genomic data stored in a Google bucket.

    It doesn't hurt to add additional columns of anything you want to remember about that entity. You can always hide columns in your workspace!

  • The student table includes both Primary data (heights and GPAs that will be used in the study) as well as metadata (additional useful data, in this case).

    Primary data versus metadataPrimary data in a Terra data table is the sort that would traditionally be in a CSV. Examples are phenotypic data, i.e., a subject’s clinical information such as disease symptoms, lab results, or demographic data, including age, ethnicity, and gender.   

    Metadata is data about data. This could be links to genomic data files, or other information about the genomic data files (file size, the date they were created, experimental process). This is how Terra "stores" large genomic files in a table. The data files are physically located in workspace or external cloud storage. The table keeps track of the links to the files, and lets you use the data no matter where it's actually located.

1.2. Customize your table view

To customize what you see in your workspace while still keeping all the information, you can hide or move certain columns. In this exercise you'll hide the Height_units column (you don’t need this column to complete your mission).

Hide the Heights_unit column

1. Click on the gear icon on the top menu to open the Settings.

2. Unclick the height_units column to hide it.

3. Drag and drop the Grade column using the pixelated bar to below the Height_units column.

4. Click done.

Note that you can save the view, to share with colleagues, for example. 

1.3. Sort and search data

Need to find a particular student but don't want to scroll through the entire table? You can search within a single table or between all tables in the workspace using the search fields.

You can also sort data in ascending or descending order by column. 

Search within this table

1. Try looking for Hrdika in the table using the Search field at the top right of the table. 

Sort by column values

2. Click the blue arrow at the top of the first column to sort the student table by ascending or descending order of the student IDs.

1.4. Change a single field of data

Change a student's name

1. Click on the student named Dulce in the first-name column.

2. Click on the pencil icon to open the edit value box.

3. Change Dulce’s name to Sabrosa, then click Save changes.

1.5. Add or delete data

Add a row (i.e. a new student)

1. Add a row by clicking the pencil icon next to Edit, then select the Add row option.

2. You will need to add a new student ID (ZZ637) and fill in the values for the new student you are adding (First name, GPA_language arts, GPA maths, GPA_science, Grade, Height, Height units) then click add.

You can make up these values! 

Delete the row you just created

4. Select the checkbox for that row, then click Edit (pencil icon above the table) and Delete selected rows.

Add a column of data

1. Click on the pencil icon next to Edit, then select the Add column option.

2. Enter the column name and what variable type will be in the row.

For example, if you wanted to add the Students’ last names, you would enter Last-name in the Column name field and select value type “string”.

You can make up a column of metadata about the student!

Delete or clear a column

3. Delete or clear the column you just created by clicking on the three dot action icon in the column header and selecting the appropriate option.

For a comprehensive list of things you can do in a workspace data table, see the Organizing data with tables section. 

2: Make a subset of the student data

There are times when you may want to run an analysis on the same subset of the data many times (when testing a workflow, for example). Selecting the right rows of data manually every time is error-prone and tedious, but Terra has a way to define particular subsets, which you'll learn about below.

Step-by-step instructions to make a student_set table

2.1. Sort the student table alphabetically by their ID by clicking the arrow in the student_id column. Make sure it's sorted in ascending order (i.e. A at the top...).  

2.2. Check off the top eight rows of students. 

2.3. Click the Edit button (pencil icon) above the table entries.

2.4. Choose Save selection as set from the menu.

2.5. Give your set a name and click save

What to expect

Notice that there is a new table in the workspace, a student_set table. Open this table to see what it contains. 

  • The student_set table includes one row and two columns.

    Each row is a unique set (subset of the student data)

    Since you've only made a single set, there's currenly only one row.

    Columns

    The first column is the unique student set ID (the name you gave your set when you created it). The second column is an array that includes all the students in the set. Note that the values in the student_set table reference the unique student_ids in the student table.

    Data-tables-quickstart_Expand-student_set-table_Screenshot.png

Additional tables in a workspace

Besides the student and student_set (input data tables), there are two other special kinds of tables in the Workspace data page: Reference Data (in the middle of the left-hand column), and Workspace Data (under Reference Data).

  • Reference data - You can add a variety of pre-loaded reference data here, for easy reference.
  • Workspace Data - This special table is for keeping workspace-level files and variables that you might use for analyzes across different inputs. Examples include Docker files, CSVs stored in external buckets, or reference data not available in the pre-configured options.

We won’t be using these in the data tables quickstart, but you’ll get to work with them when you move on to the Workflows Quickstart tutorial.

3 (optional): Import (realistic) data to a workspace 

Now that you've worked with the T101 mock data, you hopefully understand a bit more about data tables in your workspace and why they’re useful when working in the cloud in Terra.

Now let's explore more realistic data tables (in the Showcase Workspaces Library) and walk through how to add one to your workspace. To see a more relevant example of a data table in Terra, pick a workspace from the Featured Workspaces Library. You can filter (on the left) by scientific use case or experimental strategy. Once you've found a workspace that interests you, follow the steps below to import it to your workspace to explore. 

Step-by-step instructions to copy data from another workspace

1. Go to the Data page. 

2. Click on the three vertical dots to the right of the table to import. 

Export-data-table-to-workspace_Screenshot.png

3. Click Export to workspace.

4. Choose your workspace from the dropdown and click the blue Copy button

Takeaway and next steps (analyze the data)

After completing the Quickstart, you should know/understand

  • Data tables in Terra work like spreadsheets
  • How to modify/organize data in a table

Quickstart part 2: Run a workflow on the data (optional)

Now that you know how to organize and manage data in a table, let’s try running an analysis!

In the Workflows-Quickstart you will run a workflow to get the total GPA (averaged over all three subjects) for students in the 8-person cohort (the student table) as well as for students in a more complete dataset of 86 students.

Quickstart part 3: Plot the results in a notebook

If you don’t need to learn how to set up and run workflows, you can skip right to the Notebooks-Quickstart to learn how to set up and run an interactive analysis to visualize data.

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.