As your data evolves and you compute new results, your data tables will also change. Learn how to track these changes with data table versioning and identify columns generated by a specific workflow submission.
Overview: Versioning a data table
Why track a data table version?
-
To track which version of your data was used in an analysis
This is important if you need to go back and repeat an analysis on the same data. -
To share data with a collaborator
When sharing data with collaborators, versioning allows you to track what was shared. It also allows you to track whether the data have changed since the last time you shared it. -
To back up data
Avoid accidentally overwriting your data by periodically saving a new version of your data table. This way, you can return to a prior version if you do something to overwrite the data—such as uploading a TSV with the same primary key as one that is already present in a table, or writing outputs of a workflow to a data table—you can roll back your changes and return to the prior version.
How to version a table in Terra
1. Locate the table that you would like to version in the left-hand panel of your workspace's Data tab.
2. Click the menu icon (circle with three dots) to the right of the table's name and select Save version.
3. Name the version in the Description box and click save.
If your table also has a related set
table, you can include the set
table in the saved version by checking the box under “include set tables?”. To learn more about set tables, read When and how to use a set table for a workflow.
What to expect
Once the table version is saved, you will see it appear in the version history listed below the table.
Additionally, you will find zip files for each table version in your workspace bucket, in a folder called .data-table-versions in your workspace's file explorer.
Download a .zip file locally and un-compress it to access the tables included in the version in JSON format.
Hide the version history to reduce clutter The version history can clutter your table experience. You can hide the version history by using the menu (circle with three dots).
How to restore a previous version of a table
1. Locate the table in the left-hand panel of your workspace’s Data tab.
2. Click on the table's menu (circle with three dots) and select Show version history. You will see a list of previously-saved table versions.
3. Click on the hyperlinked date associated with the version that you want to restore or delete.
4. Select Import to restore the selected version.
The version you imported will now appear in the left-hand panel at the same level as the other tables. It will be named [original table name]_[version timestamp]
— for example, a version of the sample table saved on September 10th, 2024 at 7:02:36 UTC time would be named sample_2024-09-10_19-02-36
.
Once you've imported the specific version you need, you can analyze its contents with a workflow or interactive analysis — follow the same procedure as for any other table.
How to delete a saved version of a table
1. Locate the table in the left-hand panel of your workspace's Data tab.
2. Click on the tables' menu (circle with three dots) and select Show version history. You will see a list of previously-saved table versions.
3. Click on the hyperlinked date associated with the version that you want to delete.
4. In the screen that appears, select Delete. The version will be deleted from the version history and you will no longer be able to access it.
Overview: Data provenance
You can currently see your data’s provenance—or how it was generated—for data table columns that were output by a workflow.
When you might want data provenance
- To understand how a piece of data was created (e.g., which pipeline was used to calculate it).
- To understand if the column was recently updated by a workflow.
How to view a data table column's provenance
1. Select the data table from the left-hand panel in your workspace’s Data tab.
2. Go to a column in your data table and click the column menu icon (circle with three dots). Select Show Provenance from the drop-down menu.
3. A pop up window will appear, showing which workflow submissions added data to this column.
Provenance limitations Provenance is only available for columns, not for individual cells. For example, the provenance shown below shows all workflow runs that added data to the alignment_summary_metrics_file
column.
Provenance is only available for the last 25 workflows run on a column in your data table. If you try to see the provenance for a column that was altered more than 25 workflow submissions ago, you will see a message like this:
"No provenance information available for column reported_sex
. It was not configured as an output in any of the last 25 workflows run on this data type."