How much does my workflow cost?

Beri

Executing jobs on the cloud can be a little scary if you don't know how much you're spending. For peace of mind that you're not going over a project budget, read on for two ways to understand the cost of an executed workflow.

Workflow cost reporting: Built-in versus notebooks Cost report availability
Results from Terra's built-in cost report aren't available until some hours after the workflow has completed. It's best to allow a day to pass before checking in on the cost.

Notebooks allow cost reporting for workflows in progress
You can run the cost-estimating notebook at any point in the workflow execution and it will li how much the workflow has cost thus far. You can use as an early indicator of whether you may exceed your budget. 

Notebooks give more granular cost details
Unlike Terra's cost report, which only provides the total job cost, the notebook estimates task-level and instance-level costs (i.e. the cost of running a particular task within a workflow). This is useful when you want to know what task(s) are responsible for most of the cost. 

The notebook gives an estimate; built-in reporting is a final cost
However (see below), this estimate is usually fairly accurate. 

Option 1: Terra's built-in cost reporting

To help make cloud computing costs more transparent, you can have Terra display the cost for each executed workflow (including failed and aborted workflows) in the Job History. In order to access spend reporting, you will configure "spend reporting" for your Terra Billing Project. Most users will also need to do a one-time setup on GCP, to allow Terra access to GCP's cost reports. 

How accurate is the cost report? Terra's built-in cost reports come directly from Google. This is the actual cost for the given workflow (i.e. not an estimate). They are generated by accessing Google's Billing Repository, and can take several hours to log. 

Step 1. Set up access (once per Cloud Billing account)

If you are not using a Broad Institute Google Cloud Billing account, you will need to grant Terra access to cost reports on GCP console to enable the workflows spend reporting. 

Broad GCP Billing account users can skip these steps. 

Not able to follow these directions? You must have “Owner” or "Admin" permission on the GCP Billing account. If you are not able to follow the directions below, or do not see the options in the screenshots (or they are greyed out), it is likely because you do not have sufficient permissions on the GCP Billing account. You will need to ask the owner for admin privileges to enable spend reporting.

This may be the case, for example, if you are using a third-party resellers such as Onix.

General user setup 

1.1. Navigate to the billing account management page in Google Cloud Console.

1.2. Click on the name of the billing account you would like to enable billing exports for. If you have more than one, you will need to repeat for each one associated with a Terra Billing Project.

1.3. Click on Billing export in the left sidebar. 
Set-up-cost-reporting-3_Billing-export_Screen_shot.png

1.4. Under Standard usage cost, click the Edit Settings button.
Set-up-spend-report_Edit-billing-export-settings_Screen_shot.png

1.5. Next you will configure/create the BigQuery dataset to export the billing data to. 

Note that you must have an active Google project - not a Terra Billing project - tied to this billing account for this step. 

The Billing project you select cannot be a Terra-generated workspace project To check, make sure the ID in the dropdown is not of the format <Terra-Billing-project-name>--<workspace-name>. If the project name has that formatting, it was created by Terra and will not work for this step.

In that case, you will need to create a GCP-native project first. You can find step-by-step instructions here.

If you don't have a project, Google will prompt you to create one. 
Set-up-cost-reporting_5-Set-up-project-prompt_Screen_shot.png

1.6. Create the project.
(if you already have an existing Google project, you may skip this step).
Set-up-cost-reporting-6_Create-new-project_Screen_shot.png

1.7. Select the Google project. This will be used to host the billing export BigQuery dataset (cannot be a workspace project created by Terra - see 1.5 above)

1.8. From the dropdown menu, select (or create) the BigQuery dataset to store the billing export data.
Set-up-spend-report_Select-BigQuery-Dataset_Screen_shot.png
If you don't already have a dataset in this Google project, Google will prompt you to create one: From this menu, click on Create new dataset as shown above (1.8), fill in this form and select the Create dataset button at the bottom.

Set-up-spend-report_Create-BigQuery-dataset_Screen_shot.png

1.9. Click Save.

1.10. To view the BigQuery dataset, click on the link (name of your BigQuery dataset) in the BigQuery export tab - to the right of Dataset name).
Set-up-cost-reporting-10-Select-link-to-view-BigQuery-dataset.png


1.11.
From the dataset tab, copy the Google project ID (to the left of the colon) and BigQuery dataset name (to the right of the colon) to a safe place. You will need it in the next step (2.4) in Terra.
Set-up-spend-report_Find-Project-and-dataset-names_Screen_shot.png

The format is Google project ID : BigQuery dataset name (separated by a colon)

Set-up-spend-reporting_Google-project-ID-Dataset-name_Screen_shot.png

1.12. Hover over the person icon to the right of the project and dataset name and click Share dataset. Set-up-spend-report_Share-dataset_Screen_shot.png
If this option is greyed out, it is most likely because you don't have the right permission. You will need to ask the owner of the Google Cloud Billing account to grant you permission to share a BigQuery dataset. 

1.13. Type spend-reporting@terra.bio into the Add principals field and select BigQuery Data Viewer from the Select a role dropdown. This grants Terra permission to access the dataset you've just set up.

1.14. Click Add (to the right of the dropdown), then Done (at the bottom of the form).

Google BigQuery billing exports are now configured for this Cloud Billing account. You can confirm by expanding the menu under BigQuery Data Viewer (below).

Set-up-cost-reporting-12_Grant-terra-permission_Screen_shot.png

The next step will be to configure workflow spend report.

STRIDES users setup

1.1. Navigate to the billing account management page in Google Cloud Console

1.2. Click on the name of the STRIDES billing account you would like to enable billing exports for. It will have the form NIH.NHLBI.BDC.Cohort#.Fellow.00#.

1.3. Click on "Billing export" in the left sidebar. 
Set-up-cost-reporting-3_Billing-export_Screen_shot.png

1.4. Under Standard usage cost, click the "Edit Settings" button.
Set-up-spend-report_Edit-billing-export-settings_Screen_shot.png

1.6. From the dropdown menu under Dataset ID, select the Billing dataset. This is the pre-configured BigQuery dataset to store the billing export data.

1.7. Click Save.

1.8. To view the BigQuery dataset, click on the Billing link  in the BigQuery export tab - to the right of Dataset name under Standard usage cost

Set-up-spend-report_STRIDES_Dataset-in-BigQuery-Export_Screen_shot.png
1.9.
Copy the Google project ID to a safe place. You will need it in the next step (2.4) in Terra.
Set-up-spend-report-STRIDES_Google-project-ID_Screen_shot.png

1.10. Hover over the person icon to the right of the project and dataset name and click Share dataset
Set-up-spend-report_STRIDES_Share-dataset_Screen_shot.png

1.11. Type spend-reporting@terra.bio into the Add principals field and select BigQuery Data Viewer from the Select a role dropdown. This grants Terra permission to access the dataset you've just set up.

1.12. Click Add (to the right of the dropdown), then Done (at the bottom of the form).

Google BigQuery billing exports are now configured for the STRIDES Cloud billing account. You can confirm by expanding the menu under BigQuery Data Viewer (below).

Set-up-cost-reporting-12_Grant-terra-permission_Screen_shot.png

TIP: If you just completed Part 1 we recommend you wait several hours to complete Part 2 in order for billable activity to be recorded in BigQuery or you may receive an error that the dataset cannot be found.

Step 2. Configure workflow spend report

Now that you have set up export of spend report on GCP, you will set up on the Terra side. You will only need to do this once per Terra Billing project. 

2.1. Go to the Billing page by first clicking your name and selecting Billing from the main navigation menu (top left of any page in Terra).
Workflow-spend-report_Configure-Billing_Screen_shot.png

2.2. Select the Terra Billing project associated with the workspace where you're running your workflow analysis.

Note that you need to be an owner to follow these steps. You'll know you're the owner if you see the Terra billing project listed under Owned by You in the top left column. 
Workflow-spend-report_Billing-project-to-configure_Screen_shot.png

2.3. Click the pencil icon beside Workflow Spend Report Configuration to edit. 
Workflow-spend-report_Edit-workflow-spend-configuration_Screen_shot.png

2.4. Fill in the Dataset Project Name (Project ID) and Dataset Name from GCP console (step 1.11 for general users or step 1.9 for STRIDES - above).
Workflow-spend-report_Configure-workflow-spend-form_Screen_shot.png

2.5. Click the OK button to save.

You will not get a confirmation message,but as long as you don't get an error message, your configuration should be saved. 

If you get an error that looks like this
Set-up-spend-report_Error-updating-spend-report-configurations.png

It is because there is currently no data in the dataset. 

To remedy this, try the following.

1. Run a small workflow in the workspace.

2. Wait 2-3 hours and follow steps 2.1 - 2.5 again. 

How to find built-in workflow cost reporting

1. Navigate to the Job History page of the workspace.

Built-in-spend-reporting_Job-History-tab_Screen_shot.png
This page includes all workflow submissions for the workspace

2. Click the submission of interest in the far left column.

Built-in-spend-reporting_Click-submission-for-details_Screen_shot.png

3. If your spend reporting has been correctly set up, you will find the Total Run Cost at the top right corner.  Note that the spend report can take up to 24 hours to appear in Terra, as GCP costs reports have some delay. 

Built-in-spend-reporting_Total-run-cost_Screen_shot.png
 If your submission included more than one execution, each will be listed separately under "Run Cost"

Option 2: Workflow cost estimate via Jupyter Notebooks 

You can estimate the cost of running your workflow with a Workflow Cost Estimator notebook created for BioData Catalyst Powered by Terra (available in the biodata-catalyst/BioData Catalyst Collection workspace). Follow the instructions below to find and run the notebook. You can also find the Python code in the BioData Catalyst Git repository DataBiosphere/bdcat_notebooks

Step 1. Import the notebook to your workspace

1.1. Navigate to the Notebooks page of the biodata-catalyst/BioData Catalyst Collection workspace. You will need the Workflow Cost Estimator notebook.
1.2. Click the three-dot icon to the left of the Workflow Cost Estimator notebook. Built-in-spend-reporting_Workflow-Cost-Estimator-notebook_Screen_shot.png

1.3. Select Copy to another workspace.
Screen_Shot_2021-01-26_at_16.13.46.png

1.4. Import the notebook to the workspace where you ran or are running the workflow by entering your workspace name in the Destination field. Then click Copy
Screen_Shot_2021-01-26_at_16.14.52.png

Step 2. Run the notebook

2.1. Click on the Workflow Cost Estimator notebook (in the Notebook page of your workspace).

2.2. Open the notebook in either Edit or Playground mode.

2.3. Run each cell in the notebook. The notebook itself describes each cell does and how to use it. For a description of how to run a notebook in Terra, see Interactive statistics and visualization with Jupyter notebooks, or the Interactive Jupyter notebooks video

What to expect

The notebook uses FireCloud Service Selector (FISS) to request information on all the submitted jobs associated with the workspace.

The notebook will list all the submission IDs on the screen, and you'll choose which submission to process for cost estimates. The notebook will then use FISS to obtain metadata information about the particular submission - such as how many VMs were used, the number of CPUs used, the duration for each VM, etc. A cost formula uses this information to calculate the cost of running the workflow. The cost formula is based on the GCP's price estimate per resource.

How accurate are notebook-generated workflow cost estimates?

These cost estimates do not come directly from Google billing. Instead, the notebook calculates a cost estimate based on metadata from Terra. The estimates are (usually) very close to the real cost, though they could be slightly off. Below are descriptions of what's accounted for in the calculations, and the difference between the notebook results, and what Terra's built-in results showed in a benchmark.

Included GCP costs

Note that estimates will be lower than actual costs, because the notebook cost formula does not account for all possible GCP resources. The table below lists each parameter available in a WDL runtime block (i.e. what type of GCP resource is used for a task) and whether it's included in the cost formula.

WDL Runtime Parameters
Accounted for in Formula?
CPU/GPU*
Yes
Memory
Yes
Preemptibles
Yes
Disk
Yes
Data Egress
No
noAddress (rarely used)
No
cpuPlatform (rarely used)
No
zones (rarely used)
No

* Currently, the cost formula assumes all instances are of type N1 listed here, which uses the least expensive type of CPU instance even if GPUs are being used. 

Benchmark: Terra cost report versus notebook cost estimates

The two spend report options were benchmarked by running the Cram-to-Bam workflow N times on different CRAM samples. The notebook cost-estimates and built-in cost reporting showed an average difference of $0.02 per sample/run.

Note that when running larger sample sets, there will be a larger differences as minor differences accumulate.

CRAM_to_BAM - Sample Number Terra Cost Report Notebook Cost Estimator
1 $0.22 $0.23
50 $14.87 $13.95
100 $30.39 $26.59
200 $55.62 $53.41

 

Screen_Shot_2021-02-16_at_16.57.46.png

 

Was this article helpful?

1 out of 2 found this helpful

Have more questions? Submit a request

Comments

7 comments

  • Comment author
    Kamil Slowikowski

    Could I please ask if you might consider sharing a specific example of how to retrieve the data for a specific Terra workspace?

    For example, you included a screenshot of a CSV file. Could you please take screenshots of all of the steps that led to that point?

    1
  • Comment author
    Andrew Davidson

    I happened to stumble across this document today by accident. Any idea why stuff like this is not sent out in the newsletter or 

    Given terra is still in beta knowing a new vignette has been posted or that some part of the documentation has changed is really important.

    The purpose of writing documentation is to reduce support costs. If we do not know about changes we all just loose time and money

    0
  • Comment author
    Andrew Davidson
    • Edited

    How is this related to the 'Workflow Spend Report Configuration:' on https://app.terra.bio/#billing?

    0
  • Comment author
    Geraldine Van der Auwera

    Hi Andrew Davidson, that's absolutely a fair point. We've had this particular item in our blog/newsletter backlog for a while and finally got to it this week; I'm putting the finishing touches to the blog post right now and it should come out tomorrow if all goes well. Going forward we're going to try to report on these kinds of things more consistently as they come out.

    Note that you can get notified when new documentation articles are posted by clicking the blue "Follow" button near the top of each category page (such as this one). Annoyingly you have to do this for each category of the knowledge base, as opposed to just being able to subscribe to the whole thing in one click, which I believe is a limitation of the ZenDesk software, but at least this way you can get notifications about categories of documentation that you care about. 

    Regarding your second question, the 'Workflow Spend Report Configuration' item on https://app.terra.bio/#billing gives you a way to specify which BigQuery dataset to associate with a given billing project from within Terra rather than by going through the Google Cloud console (which is described in the "Grant Terra Billing project access" folding section). This approach is described under the folding section titled "Configure workflow spend report in UX" (which should probably be UI rather than UX, sorry about that typo). 

    I hope this helps, let us know if you have any thoughts on how we could make the docs clearer and more helpful. 

    0
  • Comment author
    Shyamsundar Ravishankar

    Thanks for this article it was very easy to follow and both methods worked great! 

    I wanted to confirm for Option 2 cost estimation, is the currency USD or does it automatically change to the local currency the workspace / billing account is in? Our Workspaces and compute instances are in the Australian region. 

    I would think Option 1's built-in reporting would be AUD since it is getting it directly from the billing account. 

    Thanks,

    Shyam

     

     

    0
  • Comment author
    Andrew Davidson
    • Edited

    I had trouble following the 1-time general instructions. I do not know anything about gcp administration. 

    I had trouble creating a new project. There are instructions on https://support.terra.bio/hc/en-us/articles/360051229072-Accessing-GCP-features-that-are-not-in-the-Terra-UI- 

    I also had trouble with step 1.12. The shares icon was not present. When I tried again the next day it magically appeared. 

    Once I got past the one-time set the setting up the report was easy

     

     

     

    0
  • Comment author
    Brendan Reardon

    We've literally been asking for this for years and I am so happy to see it finally in production. I enabled this last night and it is working smoothly. Thank you! 

    0

Please sign in to leave a comment.