Create a BioData Catalyst cohort with PIC-SURE to analyze in Terra

Gabriella Senior
  • Updated

Learn how to create a patient cohort in PIC-SURE and export it to Terra for analysis. The documentation includes validation steps to ensure data integrity and compatibility between the systems.

Example case

The steps below demonstrate how to create a cohort in PIC-SURE that integrates participant IDs and variables from the Framingham Heart Study (phs000007) and the NHLBI TOPMed Framingham Heart Study (phs000974).

Step 1: Access the PIC-SURE Platform 

1.1. Start at the PIC-SURE Login Page.

1.2. Select the Login with eRA Commons button.
Screenshot-of-PIC-SURE-login-page.png

1.3. Enter your eRA Commons credentials (on the NIH website) and click Sign in
eRA-Commons-login_screenshot.png

1.4. You'll be redirected back to PIC-SURE. Click Explore (at top of page).
explore-ss1.png

Step 2: Select the Dataset

2.1. Navigate to the dataset panel where you can filter by dataset (left column).

2.2. Locate and select the desired dataset: In this example, biolincc_digitalis (tutorial-biolincc_digitalis). You'll see a checkmark next to the dataset name once selected. The biolincc_digitalis dataset is one of the public datasets available on BDC and is a good option for testing.

2.3. To ensure the dataset is correctly loaded for exploration, confirm the dataset name and the total number of variables in the main viewing panel (circled in the screenshot below). ss2.png

Step 3: Filter variables within the dataset

Variables for each study are listed in the main part of the page. 

3.1. Identify variables of interest. In this example, we’re focusing on the row labeled AGE.

3.2. For more details/information, click on the variable name.
PIC-SURE_Screenshot-of-expanded-age-variable.png

3.3. To set filter criteria, click on the filter icon (it looks like a funnel in the Actions column at the right) associated with a variable of interest (highlighted with arrows in the screen shot below). 
PIC-SURE_Filter-by-AGE-with-funnel-icon-highlighted-in-orange.png

3.4. Enter the desired minimum and maximum values for your cohort in the fields. In the example below, we've filtered to include participants between 25 and 35 years old.  
PIC-SURE_Screenshot-of-filtering-by-age-between-25-and-35-years.png

3.4. Confirm your entries by clicking the "+" button  (to the right of the Min and Max fields). The dataset will be filtered to include participants within the specified range.

3.5. Check the summary on the right-hand side to see the total number of filtered participants (circled in orange in the screenshot below). 
PIC-SURE_Screenshot-of-77-filtered-participants-and-the-prepare-for-analysis-button-highlighted.png

3.6. Click the Prepare for Analysis button (circle with a + sign on the right-hand side) to export the filtered data.

What to expect

You'll be directed to a page to export data next.

Step 4: Review cohort details

4.1. On the Review Cohort Details page, review the number of participants, variables, and data points (circled in the screenshot below). 
PICTURE-Review-Cohort-Details-page_with-summary-and-next-button-highlighted.png

4.2. Verify included variables. For example, ensure that AGE is a selected variable in the cohort details table and the AGE includes the correct range.

4.3. Click the Next button to continue.

Step 5: Choose export format

5.1. Select the format to export the data. For instance, choose Export as PFB (Portable Format for Biomedical Data).

5.2. After selecting the format, click Next to move to the next step.
PIC-SURE_Export-data-for-Research-analysis-with-Export-as-PFB-selected.png

Step 6: Save the dataset ID

6.1. Enter a unique, descriptive name for your dataset in the Save Dataset ID form.
PIC-SURE_Save-dataset-ID-form-with-dataset-name-AGE-25-to-35_Dataset-ID-circled.png

6.2. It's useful to make note of the dataset ID (generated automatically by the system and circled in the screenshot above) for future reference.

6.3. Click Next to finalize the dataset creation.

Step 7: Export data for analysis

7.1. To export the dataset to Terra for further analysis, select the Export to Terra button.
ss9.png

Downloading locally optionIf required, you can download the dataset in PFB format locally by selecting Download as PFB.

Note that this option transfers data through the BioData Catalyst security boundary, which may or may not be supported by your Data Use Agreement(s), Limitation(s), or your Institutional Review Board policies and guidelines. As a BioData Catalyst user, you are solely responsible for adhering to the terms of these policies.

7.2. Once the destination is selected, click Done to start the data export.

7.3.Choose Create a new workspace under Destination of the prepared data.ss10.png

What to expect

This will redirect you to a popup where you will configure the workspace settings.

Step 8: Create Terra workspace

Fill in all the fields in the Create a New Workspace popup. 
PIC-SURE_Create-new-Terra-workspace_screenshot.png

8.1. Assign a unique Workspace Name

8.2. Select the Billing project from the dropdown, which will include all Terra billing projects you have access to. All cloud costs for data storage and analysis done in the workspace will be billed to this. 

8.3. Choose the bucket location for workspace storage. The default location is us-central1 (Iowa).

8.4. Enable security monitoring (Optional) by checking the box, if required.

8.5. Click the Create Workspace button to complete the setup.

Step 9: Monitor data import

9.1. Navigate to the Data tab within the workspace where you'll see a Data import in progress message (top right).
Importing-data-from-PIC-SURE_screenshot-highlighting-import-in-progress-alert.png

9.2. Allow the import process to finish before proceeding to make sure all data is properly loaded into the workspace.

What to expect

Once the import is complete, you'll see a green Data imported successfully notification (top left).
Importing-a-cohort-from-PIC-SURE_Screenshot-of-data-imported-successfully-alert.png

9.3. To ensure the tables are visible, reload the workspace data page by clicking the refresh icon in the browser address bar.

Step 10: View and explore imported data tables

10.1. You can view the imported data tables under the Tables section in the Data page (left column). For example, the screenshot below shows the two tables exported to the workspace pic_sure_data_dictionary and pic_sure_patient tables. 
PIC-SURE_screenshot-of-imported-dataset_patients-table.png

The pic_sure_data_dictionary table contains information about the variables included in the export, while the pic_sure_patient table contains the participant-level data.

10.2. Click on the table name in the left column to view detailed records and variables and ensure the data matches your selected filters.

Additional examples

1. Handling Long Variable Descriptions

  • Variable ID: F1_PC2_3
  • Details: This variable represents a weighted scale of socioeconomic factors such as education, income, and home value. Missing values result from incomplete data in any contributing variable. Data sources include Census 2000 (2004 exams) and ACS 2005–2009 (2005 exams).
  • Length Metrics:
    • With spaces: 470 characters
    • Without spaces: 394 characters

2. Managing Extended Variable Names

  • Variable ID: noxisiess_nreas_enrollment_home_polysomnography_with_ess_and_is
  • Length: 63 characters (with spaces)

3. Open-Ended Questions with Lengthy Responses

  • Variable Name: lfcoma1
  • Variable Accession: phv00090159
  • Description: Text responses provide detailed comments regarding measurement quality or issues related to liver attenuation.

Additional resources and next steps

  •  
  1.  
  •  

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.