Polygenic Risk Score (Azure tutorial)

Allie Cliffe
  • Updated

If you're interested in using Terra on Azure, please email terra-enterprise@broadinstitute.org.

1. Clone the template featured workspace

1.1. Go to the PRS calculation Featured Workspace. Click on the three-dot menu at the top-right of the workspace and select Clone.

PRS-calc-FW_1.1.Clone-workspace.png

1.2. Fill in a Workspace name, select your Terra Billing project and click the CLONE WORKSPACE button. In this example, we’ve added my initials and the date so it’s easy to identify.

PRS-calc-FW_1.2.Clone-this-worspace-screenshot.png

What to expect

GenomicScreening_1.3-loading-clone.png

1.3. After a few minutes, your workspace will finish creating. You will need to navigate to your cloned workspace (Main menu > Workspaces > Polygenic Risk Score) to continue.

2. Examine the workspace data table

2.1. Click on the DATA tab in your clone.

PRS-calc-FW_2.1.Data-tab-highlighted-on-home-page.png

2.2. The workspace comes with a data table called cohorts. Click on the name to examine the table.

PRS-calc-FW_2.2.Screenshot-of-tables-in-data-page-pointing-to-the-cohorts-link.png

2.3. This table contains sample data to use when running the workflow as a template. 

PRS-calc-FW_2.3.Screnshot-of-expanded-cohort-table-with-one-row-and-two-columns.png

What’s in the cohorts table?The table includes one row of data - labeled test_cohort - and the input file (example.vcf.gz) in the vcf column.

Where is the sample input data located?
The input vcf is stored in the featured workspace cloud storage, but it could be in any external Azure cloud storage Terra has permission to access. The table contains a link to a file that will be used as inputs for the PolygenicRiskScore workflow.

3. Run the PolygenicRiskScore workflow

Workflows are orchestrated in Terra by the workflows application - Cromwell. When you are in a new workspace, you will need to launch Cromwell. Note that you only need to do this once, the first time you set up or run a workflow in the workspace.

3.1. Go to the Workflows tab.

3.2. Click the blue Launch Workflows App button.

<INSERT SCREENSHOT>

The Workflows app may take between 5-15 minutes to launch.
GWAS-on-Azure_3.2.Launch-workflows-app-screenshot.png

When workflows are ready

After a few minutes, you will see the workflows menu on the left-hand side and the PolygenicRiskScore workflow from the original workspace in the center under Workflows in this workspace.

Screenshot 2024-05-31 at 9.43.35 AM.png

3.3. Click the blue Configure button for the PolygenicRiskScore workflow.

3.4. The data table should already be set to the cohorts table. If it is not already selected, choose it from the dropdown menu.
3.5. Check the box to select test cohort data (the only row in the table).

Screenshot 2024-05-31 at 9.47.25 AM.png

3.6. The required Inputs and Outputs have already been configured for you. You can click on these tabs to see what is in each workflow attribute, and what columns will be generated for the outputs.

Inputs

Screenshot 2024-05-31 at 9.50.08 AM.png

Input sources: Input data versus reference filesNote that the source for the cohort input data is the table (Fetch from Data Table in the dropdown menu). This is input data associated with the data (cohort) you are analyzing. It is all the same row in the root data table.

Workspace-level input files such as reference files do not vary with the data chosen. These files are stored in the workspace cloud storage and specified by their full cloud URI path. The input source is Type a value in the dropdown.

Outputs

Screenshot 2024-05-31 at 9.50.29 AM.png

3.6. When you’re ready to run the workflow, click the blue SUBMIT button to expose the Send submission form.

3.7. You can add a comment for this submission if you wish, then click SUBMIT again.

Screenshot 2024-07-16 at 12.41.32 PM.png

3.8. The first time you run a workflow in your workspace, Cromwell will take some time to launch. The workflow will submit automatically once Cromwell is up and running and you’ll be redirected to the Submission details page.

Screenshot 2024-07-15 at 9.41.32 AM.png

3.9. You can monitor your submission progress at any time by clicking the Submission history option on the left side of the Workflows page. Clicking the submission name will give you more information.

Screenshot 2024-07-15 at 10.55.08 AM.png

3.10. Once the submission is complete, you’ll see the status change to Success. You can click on the Submission name for more details.

4. Find/explore results

What to expect and where to find generated data

The workflow generates two new files, raw_scores and sites_scored. The output files will be stored in the workspace cloud storage and Terra will write links to the files in the original data table you examined earlier.

4.1. Click on the DATA tab again to examine the updated cohorts table.

4.2. Click to open the cohorts table and scroll to the right to see two new columns of data: raw_scores and sites_scored. These cells include links to the output files generated by the workflow.

Screenshot 2024-07-15 at 11.10.06 AM.png

4.3. Clicking on the link in the raw_scrores column will expose where the file is stored in your workspace cloud storage and allow you to download the output file to local storage.

PRS-calc-FW_4.2_Hover-over-test_cohort_sscore-in-raw_scores-column-in-cohorts-table.png

PRS-calc-FW_4.3.Screenshot-of-test_cohort.sscore-file-details-popup.png

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.