Cost estimating notebook (deprecated)

Allie Cliffe
  • Updated

If you need granular (task-level) cost reporting, or costs of running workflows, you can estimate the cost of running your workflow with a Workflow Cost Estimator notebook. Follow the instructions below to find the workflow cost estimating python code to run in a notebook. Note that this is currently broken and not supported. However, it is offered as an example of a possible alternative approach for those comfortable with coding. 

Cost-estimate notebook overview

Notebooks allow cost reporting for workflows in progress

You can run the cost-estimating notebook at any point in the workflow execution and it will list how much the workflow has cost thus far. You can use as an early indicator of whether you may exceed your budget.

Notebooks give more granular cost details

Unlike Terra's cost report, which only provides the total job cost, the notebook estimates task-level and instance-level costs (i.e. the cost of running a particular task within a workflow). This is useful when you want to know what task(s) are responsible for most of the cost.

The notebook gives an estimate; built-in reporting is a final cost. However (see below), this estimate is usually fairly accurate. 

Some additional guidance

Originally created for BioData Catalyst Powered by Terra, an archived version is available on github at https://github.com/DataBiosphere/featured-notebooks/tree/master/attic workspace. You will need the Workflow Cost Estimator notebook.

The archived version includes the python code, which you can run in a notebook in Terra. Note that this notebook is broken (deprecated) and no longer supported. It is intended as an example of a notebook approach only.

What to expect

The notebook uses FireCloud Service Selector (FISS) to request information on all the submitted jobs associated with the workspace.

The notebook will list all the submission IDs on the screen, and you'll choose which submission to process for cost estimates. The notebook will then use FISS to obtain metadata information about the particular submission - such as how many VMs were used, the number of CPUs used, the duration for each VM, etc. A cost formula uses this information to calculate the cost of running the workflow. The cost formula is based on the GCP's price estimate per resource.

How accurate are notebook-generated workflow cost estimates?

These cost estimates do not come directly from Google billing. Instead, the notebook calculates a cost estimate based on metadata from Terra. The estimates are (usually) very close to the real cost, though they could be slightly off. Below are descriptions of what's accounted for in the calculations, and the difference between the notebook results, and what Terra's built-in results showed in a benchmark.

Included GCP costs

Note that estimates will be lower than actual costs, because the notebook cost formula does not account for all possible GCP resources. The table below lists each parameter available in a WDL runtime block (i.e. what type of GCP resource is used for a task) and whether it's included in the cost formula.

WDL Runtime Parameters
Accounted for in Formula?
CPU/GPU*
Yes
Memory
Yes
Preemptibles
Yes
Disk
Yes
Data Egress
No
noAddress (rarely used)
No
cpuPlatform (rarely used)
No
zones (rarely used)
No


* Currently, the cost formula assumes all instances are of type N1 listed here, which uses the least expensive type of CPU instance - even if GPUs are being used in the workflow execution. 

Benchmark: Terra cost report versus notebook cost estimates

The two spend report options were benchmarked by running the Cram-to-Bam workflow N times on different CRAM samples. The notebook cost-estimates and built-in cost reporting showed an average difference of $0.02 per sample/run.

Note that when running larger sample sets, there will be a larger differences as minor differences accumulate.

CRAM_to_BAM - Sample Number Terra Cost Report Notebook Cost Estimator
1 $0.22 $0.23
50 $14.87 $13.95
100 $30.39 $26.59
200 $55.62 $53.41

Screenshot comparing Terra built-in cost report in green and cost estinmating notebook values in organge. The two costs are equal when processing a single sample with the cram-to-bam workflow. For 200 samples, the built-in cost reporting is about $58 and the cost estimating notebook is about (estimating notebook values as about $56

 

Was this article helpful?

Comments

0 comments

Please sign in to leave a comment.