How to track your data table's version and provenance

Leyla Tarhan
  • Updated

As your data evolves and you compute new results, your data tables will also change. Learn how to track these changes with data table versioning and identify columns generated by a specific workflow submission.

Overview: Versioning a data table

Why track a data table version?

  • To track which version of your data was used in an analysis
    This is important if you need to go back and repeat an analysis on the same data.
  • To share data with a collaborator
    When sharing data with collaborators, versioning allows you to track what was shared. It also allows you to track whether the data have changed since the last time you shared it.
  • To back up data
    Avoid accidentally overwriting your data by periodically saving a new version of your data table. This way, you can return to a prior version if you do something to overwrite the data—such as uploading a TSV with the same primary key as one that is already present in a table, or writing outputs of a workflow to a data table—you can roll back your changes and return to the prior version.

How to version a table in Terra

1. Locate the table that you would like to version in the left-hand panel of your workspace's Data tab. 

2. Click the menu icon (circle with three dots) to the right of the table's name and select Save version.

Screenshot of the tables in the Data tab of an example workspace. An orange box highlights the menu icon (circle with three dots) for the 'sample' table. Another orange box highlights the 'Save version' option in the table menu that appears when someone clicks on the table's menu icon.

3.  Name the version in the Description box and click save.

If your table also has a related set table, you can include the set table in the saved version by checking the box under “include set tables?”. To learn more about set tables, read When and how to use a set table for a workflow.

Screenshot of the menu used to save a version of a table. The description field contains a same that will help you track the table's version. An orange hbox highlights the checkbox used to include associated set tables in the version. This box will only appear if an associated set table already exists for the table you are versioning.

What to expect

Once the table version is saved, you will see it appear in the version history listed below the table.

Screenshot showing the version history for an example table.

Additionally, you will find zip files for each table version in your workspace bucket, in a folder called .data-table-versions in your workspace's file explorer.

Screenshot showing the downloadable .zip files for a specific table version in the workspace's file explorer.Download a .zip file locally and un-compress it to access the tables included in the version in JSON format.

Hide the version history to reduce clutter The version history can clutter your table experience. You can hide the version history by using the menu (circle with three dots).
Screenshot showing how to hide a table's version history. An orange box highlights an example table's menu icon (circle with three dots inside). Another orange box highlights the menu option to hide version history.

How to restore a previous version of a table

1. Locate the table in the left-hand panel of your workspace’s Data tab.

2. Click on the table's menu (circle with three dots) and select Show version history. You will see a list of previously-saved table versions.

Screenshot showing how to reveal a table's version history. An orange box highlights the table menu icon in the shape of a circle with three dots inside. Another orange box highlights the 'show version history' option in the menu that appears after clicking the table menu icon.

3. Click on the hyperlinked date associated with the version that you want to restore or delete.

4. Select Import to restore the selected version.

Screenshot showing how to import an previously-saved version of a data table. An orange box highlights the timestamp for the version in the table's version history. Clicking this timestamp brings options to either import or delete the table. Another orange box highlights the 'import' option.

The version you imported will now appear in the left-hand panel at the same level as the other tables. It will be named [original table name]_[version timestamp] — for example, a version of the sample table saved on September 10th, 2024 at 7:02:36 UTC time would be named sample_2024-09-10_19-02-36.
Once you've imported the specific version you need, you can analyze its contents with a workflow or interactive analysis — follow the same procedure as for any other table.

How to delete a saved version of a table

1. Locate the table in the left-hand panel of your workspace's Data tab.

2. Click on the tables' menu (circle with three dots) and select Show version history. You will see a list of previously-saved table versions.

3. Click on the hyperlinked date associated with the version that you want to delete.

4. In the screen that appears, select Delete. The version will be deleted from the version history and you will no longer be able to access it.

Screenshot showing how to delete a saved version of a table. An orange box highlights the hyperlinked timestamp for the version in the table's version history. Clicking this link brings up an option to either import or delete the version. Another orange box highlights the 'delete' option.

Overview: Data provenance

You can currently see your data’s provenance—or how it was generated—for data table columns that were output by a workflow.

When you might want data provenance

  • To understand how a piece of data was created (e.g., which pipeline was used to calculate it).
  • To understand if the column was recently updated by a workflow.

How to view a data table column's provenance

1. Select the data table from the left-hand panel in your workspace’s Data tab.

2. Go to a column in your data table and click the column menu icon (circle with three dots). Select Show Provenance from the drop-down menu.
Screenshot showing how to show a data table column's provenance. An orange box highlights the column's menu icon (circle with three dots inside) and another orange box highlights the 'show provenance' option in the resulting menu.
3. A pop up window will appear, showing which workflow submissions added data to this column.

Screenshot of the provenance for an example data table column.

Provenance limitations Provenance is only available for columns, not for individual cells. For example, the provenance shown below shows all workflow runs that added data to the alignment_summary_metrics_file column.
Screenshot of the provenance for an example data table column.

Provenance is only available for the last 25 workflows
run on a column in your data table. If you try to see the provenance for a column that was altered more than 25 workflow submissions ago, you will see a message like this:

"No provenance information available for column reported_sex. It was not configured as an output in any of the last 25 workflows run on this data type."

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.