How much did a workflow cost?
FollowExecuting jobs on the cloud can be frightening when you don't know how much you're spending. Here we'll go over ways to view the cost of an executed workflow to give you the peace of mind that you're not going over a projected goal.
This article will review two methods by which the cost of a workflow can be retrieved. One is through Terra's built-in cost report and the other is a Jupyter Notebook. Before going into the details of each it's important to know a few things:
- For the time being, Terra's cost report is only available for workspaces under a Broad Institute billing account (not available to all users). The Notebook estimate tool is available to all users.
- Results from the cost report aren't available until some hours after the workflow has completed, it's best to allow a day to pass before checking in on the cost.
- A large advantage of using the Notebook is that you can run it at any point in the workflow execution and it will list how much the workflow has cost thus far. Thus, you can get an idea of how much the workflow cost while it's still running and use it as an early indicator of whether you've gone past your goal. Even if you have access to Terra's cost report, using the notebook as well just for this reason would be worth it.
- Another advantage of using the notebook is the detail of the results. Unlike Terra's cost report, the notebook provides task-level and instance-level costs. This means it will give you the cost of running a particular task within a workflow. This is useful when you want to know where the majority of cost is coming from a workflow.
Contents
Terra's built-In cost reporting
- How accurate are the results?
Calculate the cost estimate via Jupyter notebooks
- How to use the Workflow Cost Estimator notebook?
- Import the notebook to your workspace
- Run the notebook
- How does the Notebook work?
- How accurate are the results?
Terra's built-in cost reporting (for Broad Institute workspaces)
Terra includes features to make cloud computing cost transparent. One way it does this is by recording the cost for each executed workflow and making it visible in the Job History. These cost estimates are gathered by accessing Google's Billing Repository and obtaining the costs associated with a submitted job.
1. To view the cost for a workflow, navigate to the Job History tab of your workspace.
2. This tab contains all the executed workflow submissions in the workspace. Click on one of the rows under the column "Submission" to view further details about a specific submission.
3. This page contains a detailed view of the job submission. At the top right corner will be the "Total Run Cost" which is the total cost of the job submission. If your submission contains more than one execution then the cost of each execution will be listed in the table below under the column "Run Cost".
What if you don't have access to the cost report? We'll show you another method to approximate the cost in the next section.
How accurate are the results?
Calculate the cost estimate via Jupyter Notebooks
Another method by which you can obtain the cost of running your workflow is by a using the Workflow Cost Estimator notebook created by the BioData Catalyst team. This python notebook is stored in their Git repository DataBiosphere/bdcat_notebooks as a python script. However, the easiest method by which to use their notebook is to obtain it from their biodata-catalyst/BioData Catalyst Collection workspace as described below.
How to use the Workflow Cost Estimator notebook
-
Navigate to the biodata-catalyst/BioData Catalyst Collection workspace, and head to the Notebooks tab.
- In the Notebook tab click on the three-dot icon to the left of the Workflow Cost Estimator notebook then click on "Copy to another workspace", causing a small window to appear
- In the small window, import the notebook to your workspace by entering your workspace name in the Destination field then click "Copy".
- The Workflow Cost Estimator notebook will now be in the Notebook tab of your workspace. Run the notebook on a default (python) Terra notebook environment.
- Once the notebook is open (in either "edit" or "playground" mode), run through the each cell in the notebook. The notebook itself has a description of what each cell does and how to use the notebook.
How does the Notebook work?
How accurate are the results?
What's being recorded?
The notebook cost formula does not account for all possible resources offered by GCP. The table below lists all the possible parameters available in a WDL runtime block (which decides what type of GCP resource is used for a task) and whether it's included in the cost formula. Since not all of the parameters are accounted for in the formula expect to see on average a lower estimate.
WDL Runtime Parameters
|
Accounted for in Formula
|
CPU/GPU*
|
Yes
|
Memory
|
Yes
|
Preemptables
|
Yes
|
Disk
|
Yes |
Data Egress
|
No |
noAddress (rarely used)
|
No
|
cpuPlatform (rarely used)
|
No
|
zones (rarely used)
|
No
|
*Currently, the cost formula assumes all instances are of type N1 listed here, which uses the least expensive type of CPU instance even if GPUs are being used.
Benchmark comparison of Terra's cost report and the BioData Catalyst team's notebook-based cost estimator
The notebook cost estimates and Terra cost report were compared in a benchmark. The benchmark involved running the Cram-to-Bam workflow N times on different samples. A comparison between the cost estimates for Terra and the Notebook showed an average difference to be $0.02 per sample/run. The minor difference accumulates when running larger samples sets causing larger cost difference.
CRAM_to_BAM - Sample Number | Terra Cost Report | Notebook Cost Estimator |
1 | $0.22 | $0.23 |
50 | $14.87 | $13.95 |
100 | $30.39 | $26.59 |
200 | $55.62 | $53.41 |
Comments
1 comment
Could I please ask if you might consider sharing a specific example of how to retrieve the data for a specific Terra workspace?
For example, you included a screenshot of a CSV file. Could you please take screenshots of all of the steps that led to that point?
Please sign in to leave a comment.