Workspace tables can make your life easier by helping you manage data - including files from different cloud storage locations - in one place. The Data Tables Quickstart gives hands-on practice organizing and accessing data with workspace data tables.
If you're new to working in the cloud, see Understanding data in the cloud for a useful conceptual overview of data in Terra.
Quickstart overview
Data tables tutorial learning objectives
The data tables quickstart is intended to help you become familiar with data tables in your workspace - what they are and how to manipulate them in Terra.
After working through the exercises in the quickstart, you will know how to
- Organize and manage data in a data table in your Terra workspace
- Add data and metadata to an existing table
- Organize the same data in different ways (single entity tables and arrays)
- Import realistic biomedical data tables from a public workspace
Data tables Quickstart Flow
Three parts to the Data Tables Quickstart
- Explore and manipulate the student table
- Make a subset of students (a student_set table)
- (optional) Import/explore a more realistic example data table for a use-case that interests you
Estimated time and cost to completeYou should be able to complete the Quickstart tutorial in half an hour. Running the tutorial will cost less than $0.25 (Google Cloud data storage and VM costs).
Additional requirements
You will need to have a Terra Billing project and your own copy of the Quickstart workspace to complete the tutorial.
About the mock study in the T101 quickstarts
This is the first in a series of three Quickstarts that walk through a completely fake study of the correlation between height and grades for a cohort of 7th, 8th, and 9th graders.
Your mission, should you choose to accept it, is to discover if and how a student's height affects their grade.
The mock study is a little silly, but in doing it you'll learn how to use functionality typical for many bioinformatics investigations. And hopefully have a bit of fun! At the end, we’ll point you to some more realistic examples to try out what you’ve learned.
Study steps
- Explore data - in this case, survey data from 86 students in the study (Data tables quickstart)
- Run a workflow - you'll calculate the average GPA of each student in the study (Workflows quickstart)
- Run a Jupyter notebook - the notebook plots height versus GPA, so you can draw your conclusions from the data (Notebooks quickstart)
First: Make your own copy of the Data Quickstart workspace
The T101-Data-Tables-Quickstart featured workspace is “Read only”. For hands-on practice, you'll need to be able to upload data to workspace storage, which has a cost. Making your own copy of the Data-Tables-Quickstart workspace gives you that power. If you haven't already done so, you'll need to make your own copy of this workspace following the directions below.
Start by clicking on the round circle with three dots at the upper right hand corner and select "Clone" from the dropdown menu:
-
- Rename your copy something memorable
It may help to write down the name of your workspace - Choose your billing project
Note that this can be free credits! Don’t worry, you’ll have plenty left over when you’ve completed the Quickstart exercises. - Do not select an Authorization Domain, since these are only required when using restricted-access data
- Click the “Clone Workspace” button to make your own copy
- Rename your copy something memorable
Once you're in your own copy of the workspace, you can get hands-on to learn about data tables!
Walkthrough demo instructions
Part 1. Explore data in the student table
Tables are like spreadsheets built into your workspace. Part 1 of the tutorial is a guided tour of how to organize and manipulate data in tables in Terra much like you would in your favorite spreadsheet editor.
For a conceptual overview of how tables can help organize data in the cloud, see Managing data in the cloud with tables.
Step-by-step instructions
The exercises below guide you through a number of spreadsheet-like actions you can take to manage and manipulate data tables right in your workspace (without downloading the table TSV and editing in a spreadsheet editor).
1.1. Open and examine the data table
1. Open your copy of the Quickstart Data Tables workspace.
2. Click on the Data tab. From the left hand side, click on the student table to see the full table.
3. Take a minute to look over the information in the columns and rows.
Thought questions
For background, don't forget to read the section About the quickstart study above.
-
Answer: Each row in the student table is a different student.
Rows represent separate entities of dataEach row is a unique input (an "entity"). You can define and name your table entity by the input data you have.
If you were working with a sample table, each row would be a distinct sample. -
Each column is a different piece of information corresponding to the student.
Tip: the first column in a table is always the unique ID for the entity in that row. In the data tables quickstart, the first column is the unique Student ID.
What data can be stored in a table?Like spreadsheets, tables can be very flexible to contain whatever data you have with as many rows and columns as you need. For example, a sample table could include information about the samples, like a column for the study name or ascension number, the date the sample was taken) as well as a link to the genomic data stored in a Google bucket.
It doesn't hurt to add additional columns of anything you want to remember about that entity. You can always hide columns in your workspace! -
The student table includes both Primary data (heights and GPAs that will be used in the study) as well as metadata (additional useful data, in this case).
Primary data versus metadataPrimary data in a Terra data table is the sort that would traditionally be in a CSV. Examples are phenotypic data, i.e., a subject’s clinical information such as disease symptoms, lab results, or demographic data, including age, ethnicity, and gender.
Metadata is data about data. This could be links to genomic data files, or other information about the genomic data files (file size, the date they were created, experimental process). This is how Terra "stores" large genomic files in a table. The data files are physically located in workspace or external cloud storage. The table keeps track of the links to the files, and lets you use the data no matter where it's actually located.
1.2. Customize your table view
To customize what you see in your workspace while still keeping all the information, you can hide or move certain columns. In this exercise you'll hide the Height_units column (you don’t need this column to complete your mission).
Hide the Heights_unit column
1. Click on the gear icon on the top menu to open the Settings.
2. Unclick the height_units column to hide it.
3. Drag and drop the Grade column using the pixelated bar to below the Height_units column.
4. Click done.
Note that you can save the view, to share with colleagues, for example.
1.3. Sort and search data
Need to find a particular student but don't want to scroll through the entire table? You can search within a single table or between all tables in the workspace using the search fields.
You can also sort data in ascending or descending order by column.
Search within this table
1. Try looking for Hrdika in the table using the Search field at the top right of the table.
Sort by column values
2. Click the blue arrow at the top of the first column to sort the student table by ascending or descending order of the student IDs.
1.4. Change a single field of data
Change a student's name
1. Click on the student named Dulce in the first-name column.
2. Click on the pencil icon to open the edit value box.
3. Change Dulce’s name to Sabrosa, then click Save changes.
1.5. Add or delete data
Add a row (i.e. a new student)
1. Add a row by clicking the pencil icon next to Edit, then select the Add row option.
2. You will need to add a new student ID (ZZ637) and fill in the values for the new student you are adding (First name, GPA_language arts, GPA maths, GPA_science, Grade, Height, Height units) then click add.
You can make up these values!
Delete the row you just created
4. Select the checkbox for that row, then click Edit (pencil icon above the table) and Delete selected rows.
Add a column of data
1. Click on the pencil icon next to Edit, then select the Add column option.
2. Enter the column name and what variable type will be in the row.
For example, if you wanted to add the Students’ last names, you would enter Last-name in the Column name field and select value type “string”.
You can make up a column of metadata about the student!
Delete or clear a column
3. Delete or clear the column you just created by clicking on the three dot action icon in the column header and selecting the appropriate option.
For a comprehensive list of things you can do in a workspace data table, see the Organizing data with tables section.
Part 2: Make a subset of the student data
There are many times when you may want to run an analysis on the same subset of the dataset many times (when testing a workflow, for example). Selecting manually every time is error-prone and tedious, but Terra has a way to define particular subsets.
Step-by-step instructions to make a student_set table
2.1. Sort the student table alphabetically by their ID by clicking the arrow in the student_id column. Make sure it's sorted in ascending order (i.e. A at the top...).
2.2. Check off the top eight rows of students.
2.3. Click the Edit button (pencil icon) above the table entries.
2.4. Choose Save selection as set from the menu.
2.5. Give your set a name and click save.
What to expect
Notice that there is a new table in the workspace, a student_set table. Open this table to see what it contains.
-
The student_set table includes one row and two columns.
Each row is a unique set (subset of the student data)
Since you've only made a single set, there's currenly only one row.
Columns
The first column is the unique student set ID (the name you gave your set when you created it). The second column is an array that includes all the students in the set. Note that the values in the student_set table reference the unique student_ids in the student table.
Additional tables in a workspace
Besides the student and student_set (input data tables), there are two other special kinds of tables in the Workspace data page: Reference Data (in the middle of the left-hand column), and Workspace Data (under Reference Data).
- Reference data - You can add a variety of pre-loaded reference data here, for easy reference.
- Workspace Data - This special table is for keeping workspace-level files and variables that you might use for analyzes across different inputs. Examples include Docker files, CSVs stored in external buckets, or reference data not available in the pre-configured options.
We won’t be using these in the data tables quickstart, but you’ll get to work with them when you move on to the Notebooks and Workflows Quickstarts.
Part 3 (optional): Import (realistic) data to a workspace
To see a more relevant example of a data table in Terra, pick a workspace from the Featured Workspaces Library. You can filter (on the left) by scientific use case or experimental strategy. Once you've found a workspace that interests you, follow the steps below to import it to your workspace to explore.
Step-by-step instructions to copy data from another workspace
1. Go to the Data page.
2. Click on the three vertical dots to the right of the table to import.
3. Click Export to workspace.
4. Choose your workspace from the dropdown and click the blue Copy button.
Takeaway and next steps (analyze the data)
After completing the Quickstart, you should know/understand
- Data tables in Terra work like spreadsheets
- How to modify/organize data in a table
Run a workflow on the data (optional)
Now that you know how to organize and manage data in a table, let’s try running an analysis!
In the T101-Workflows-Quickstart you will run a workflow to get the total GPA (averaged over all three subjects) for students in the 8-person cohort (the student table) as well as for students in a more complete dataset of 86 students.
Plot the results in a notebook
If you don’t need to learn how to set up and run workflows, you can skip right to the T101-Notebooks-Quickstart to learn how to set up and run an interactive analysis to visualize data.