Data table column names (imported data)

Allie Cliffe
  • Updated

All data table attributes created by handing-off data from TDR, PIC-SURE, or Gen3 to Terra include a namespace prefix, such as pfb: or tdr: in the data table attribute header. This document explains what to expect - and why - when importing data to Terra.

Compliance when using controlled-access (PFB) dataPFB imports of controlled-access data (including from external repositories) to Terra on Google workspaces will now add the Additional Security Monitoring policy to the workspace.

See BioData Catalyst PFB documentation on GitHub for an overview of the Portable Format for Bioinformatics (PFB) file type.

Data imported to Terra (TDR, Gen3, PIC-SURE)

To increase interoperability, attribute names for some data exported directly to a workspace data table include a namespace prefix. This table formatting applies to all data table attributes created by handing off data from TDR, Gen3 or PIC-SURE to Terra (after October 21, 2020). It impacts the attribute name (i.e. the data table header).

Example subject data table (Gen3, PIC-SURE)

subject_id pfb:consent_codes pfb:participant_id pfb:project_id
00207d77- open (1 item) NA19669 tutorial-synthetic_data

Example sample data table (TDR)

sample_id tdr:data_format tdr:participant_id pfb:ga4gh_drs_uri
000075d7d- VCF NA19669 drs://dg.4503:dg.4503/00148a0e-

Greater interoperability with namespaces This change supports the NIH Cloud Platform Interoperability effort (NCPI). The pfb namespace prefix identifies attributes imported via Portable Format for Biomedical data (PFB) while the tdr namespace prefix identifies attributes imported from TDR. Using the namespace prefix prevents name conflicts and can reduce potential confusion when data comes from multiple sources.

The Portable Format for Biomedical data (PFB) namespace
The Portable Format for Biomedical Data (PFB) - developed at the University of Chicago Center for Translational Data Science as part of their ongoing partnership with the Data Commons - is an efficient and portable way to serialize complex data.  It is used by multiple institutions/programs to exchange biomedical data and is the exchange format of the current NIH Cloud Platform Interoperability effort.

NEW! (2025) PFB imports from PIC-SURE to Terra

You can now export your selected participant-level cohort from BioData Catalyst Powered by PIC-SURE to a Terra workspace. The data will be displayed as two tables: the data and data dictionary tables.

What’s in the data table

The data table will be labeled with the prefix “pic_sure_patients_” and show the participant-level data from PIC-SURE. The columns of this table are the variables, which are labeled as PIC-SURE concept paths.

What’s in the data dictionary?

The data dictionary table will be labeled with the prefix “pic_sure_data_dictionary_” and will contain information about the variables that have been exported in the data table.

Additional PIC-SURE resources

How to use data with namespaces in an analysis

Most of the work of implementing namespaces in Terra - such as making sure workflows and notebooks recognize the prefix - happens behind the scenes. The only difference you will see when running workflows or interactive analyses is the new attribute name (as it appears in column headings in data tables in a Terra workspace) when referencing data attributes.

Example: Terra workflow inputs configuration

If you're running a workflow on Gen3 data exported to a data table, you must include the pfb: or tdr: prefix when configuring a workflow inputs. 

Workflow configuration with pfb: prefix

Namespace-support_Attribute-new_Screen_shot.png

Example: Reading data from a Terra data table using FISS

Firecloud Service Selector (FISS) is a Python module that allows API (Application Programming Interface) calls from the notebook to the workspace. Namespace support impacts the formatting when using FISS to read attributes from a data table.

Command (notice the colon after pfb!)

response = fapi.get_entities_tsv(BILLING_PROJECT, WORKSPACE, "sample", "pfb:submitter_id", model="flexible")

Future namespace benefits

Looking forward: Common PFB Attributes

To better support interoperability between systems - i.e., working with data from multiple sources and datasets - the NCPI has defined a small set of PFB Common Attributes to be used consistently across systems. The Gen3 team is in the process of adding these attributes to the Gen3 BioData Catalyst data model. In some cases, these common attributes represent data that is already in the Gen3 BioData Catalyst data model with a different name. In such cases, both the existing name and the new common name will be present, both with the same value. A list of the NCPI PFB Common attributes is available here

Namespaces in Terra

Where and how the namespace feature evolves depends on feedback from you, our users.

One possibility is to enable the use of a more unique and descriptive namespace value than “pfb”: the user  could specify the namespace value in the Terra import form as part of the hand off process.

Another option is to make the namespace value the name of the program/portal from which the data came (“bdcat”, “anvil”, etc.) or the name of a specific data model and version, etc. This may help to identify the data when working with it, and could potentially facilitate advanced use cases such as having data from multiple programs/portals and data models in the same Terra workspace, with the data for each namespaced appropriately.

Please let us know what you think, or if you have questions or would like help with migration.

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.