What is going on behind-the-curtain when you hit "Launch Analysis" from the Terra UI? This article outlines the components working in the backend - like what Cromwell does versus what Google Pipelines API does - to help understand where and why you might have lag (slow runtimes) or failure. This information could be especially useful if you're learning to develop WDLs, or troubleshooting or optimizing large scale batch workflows analysis on Terra.
For a deeper dive in the back end - and a story of how Terra engineers examined and tweaked the underlying structures to improve performance - see the blog post Smarter workflow launching reduces latency and improves user experience.
Under the hood, quite a lot is happening when you specify input data and a workflow to process it, hit the "Run analysis" button, and then hit "Launch" in the confirmation pane!
What you see in the workspace
Terra automatically takes you to the Job History page where you can view the status of your workflow(s) and monitor how the work is progressing (note that you may need to refresh the browser window to update the status).
When all workflows in a submission are complete, Rawls updates the submission's overall status in the Job History page.
Three subsystems plus the Google Cloud Platform backend work together to make the magic happen.
- Terra User Interface (UI: what you see when you access Terra in a browser)
- Rawls (internal component that creates submissions and sends their workflows to Cromwell)
- Cromwell (a Workflow Management System geared towards scientific workflows
- Google Cloud Compute Engine
Diagram of how the Terra UI, Rawls, Cromwell and Google Cloud work together.
What's going on in the backend
Various system components kick into gear to ensure that submissions of one or many workflows gets properly assembled and, when that’s done, that each individual task is properly dispatched to the Google Compute Engine for execution.
- UI send instructions to Rawls subsystem to create a submission.
A submission could be one workflow (if you're feeding all the input to a single workflow instance) or more than one (if you're running the same workflow on multiple inputs separately)
- Rawls submits individual launch requests to Cromwell.
- Six Cromwell runners pick up workflows from the queue and interpret their WDL code across all tasks, defining the jobs (specific command lines) that need to be run on the cloud backend.
- Six Cromwell runners submit the jobs to the Google Cloud Life Sciences API (i.e. PAPI), which is responsible for provisioning the Google Cloud Engine virtual machines (VMs) and executing the jobs on those VMs.
- As jobs finish, Cromwell writes the corresponding logs and output files to cloud storage. Once all jobs have finished, the workflow terminates, Rawls detects the termination, updates the workflow status and writes output metadata to the data table.
Optimizing for diverse use-cases
Terra's workflow management and execution system was designed to meet the needs of a wide range of audiences. It aims to be:
- Efficient and consistent for large programs like All of Us
- Smooth and reliable for smaller research groups
- Low latency and nimble for method developers testing and iterating
Reducing delay by batching requests
To improve efficiency, Rawls submits requests to Cromwell (step 2) in batches of 50, rather than one at a time.
Improving performance for submissions with parallelism
Terra can submit a maximum of 28,800 jobs on your behalf, if you have workflows assigned to all six Cromwell runners. To distribute the demand evenly across all six runners, Cromwell runners pick up one workflow at a time. This dramatically increases the number of concurrent jobs (for workflows that generate many jobs), given the per runner cap.
This especially benefits workflows that generate lots of concurrent jobs (for example, WDLs with a lot of scatter parallelism).
A prioritization algorithm for all job sizes
To balance the needs of researchers and groups with different job sizes and platform priorities (e.g. to let small users iterate quickly while maintaining hands-off reliability for large users), Cromwell uses a priority algorithm.
- Identify all projects with workflows to run
- Sort projects by the lowest number of currently-running workflows
- Start one workflow from that group
- When groups are tied, select the group that started a workflow least-recently
- Wait for the next clock tick
- Goto (1)
Small users benefit, since they will no longer have to wait in line for the genome center's thousands of workflows to complete before starting.
And large users will rarely notice, because the small users are small. It's also an elegant way to bring workflow counts of multiple large users launching large submissions at the same time into equilibrium.
Where workflows get stalled
There are several points in this process that are subject to slowdowns, including quota limits. Read each section below to understand what's happening - and expected bottlenecks - at each stage.
Terra -> Cromwell (status: submitted in Cromwell)
Rawls takes the workflow specified in the WDL and asks Cromwell to run it.
Possible sources of lag
Rawls supports up to 3,000 concurrent workflows per user (across all submissions). If there are many user-submitted jobs, especially from the same billing project, your submission will remain in "Submitted" as the Cromwell engine works its way through the queue.
Cromwell -> Google PAPI (status: submitted)
Cromwell is responsible for managing the sequence of the tasks/jobs, and asks the Google Pipelines API (PAPI) to launch each task in the workflow when the inputs become available. Cromwell uses a "constellation" of six runner instances - each with a quota of 4,800 jobs per workspace - working together to perform various tasks, distribute load, and provide increased uptime with redundancy.
Possible sources of lag
- If you are using preemptible machines, there could be delay if you are preempted while a preempted machine is restarted. Note that you are not charged for this time.
- Cromwell can handle 28,800 jobs at a time (4,800 for each of six runners). Jobs over and above this hard limit will wait until room frees up.
Google Pipelines API (PAPI) sets up VM (status: submitted)
PAPI starts a virtual machine per task and provides the inputs; the WDL specifies what it should do, the environment to do it in (the Docker image), and requests the outputs when it is done. Each virtual machine’s (VM) requirements can be specified in the task (RAM, disk space, memory size, number of CPUs). Once the task is done, PAPI will shut down the VM.
Possible sources of lag
- It can take a longer time to set up a more complex machine. You won't be charged as the machine is set up, only when it is running.
- It can take time to localize large data files to the VM disk. This time has a GCP charge.
Submit and execute WDL on GCP (status: running)
The Docker required for each task will be pulled to the virtual machine along with any inputs from Google buckets. When the output is produced, it will be put in the Google bucket of the workspace where the analysis was launched. Links to the outputs will be written back to the workspace data table.
Scenario: Large numbers of workflows, each with large jobs
Consider the million-job scenario: you submit 1,000 workflows, each of which launches 1,000 jobs. Because 1,000 workflows easily fit within the Rawls quota of 3,000 concurrent workflows, all of the workflows immediately start running in Cromwell, which proceeds to launch 6 x 4,800 = 28,800 jobs across the six runners.
What could cause this to run slow?
That job capacity is enough for 29 of the 1,000 workflows to make progress, while the remaining 971 non-progressing workflows simply sit there waiting for their jobs to run. However, all 1,000 workflows are marked “Running” in the Terra UI, so you have no way to know what to expect and may quickly start worrying that something is wrong.
How will you know?
If Cromwell determines that a workflow will not be able to start due to job capacity limitations, it holds the workflow in a separate queue with a clear “Queued in Cromwell” status indicator until capacity becomes available.
Terra UI and corresponding backend functions
Here's another diagram of the relationship between what you see in Terra and what is happening behind the scenes.
Glossary of useful terms
If you're new to bioinformatics, and especially if you're new to cloud computing, you may find lots of unfamiliar terms. Understanding the ones below in particular is useful when discussing what Terra is doing behind-the-scenes. Click to expand the definitions.
See a more comprehensive list at Glossary of terms related to cloud-based genomics.
Read (or re-read) Terra architecture and where your files live in it.
- Backend service that processes workflows and sends their jobs (tasks) to the cloud.
A lightweight, standalone, executable package of software that includes everything needed to run an application on a virtual machine. A Docker container can specify the operating system, runtime configuration, and all necessary dependencies needed for a given application. Packaging these components together is called “containerizing” them, and it’s an effective way of making VM configurations shareable and making the analyses that depend on keeping these configurations consistent reproducible.
A JSON is an open-standard file and data interchange format that uses human-readable text to store and transmit data objects - including attribute–value pairs and arrays.
- Also known as a task or VM. Technically, a task is the element of a WDL that defines a command-line script plus a Docker image that will execute on the cloud backend. Once launched, a job is an instance of a task running on a VM. In practice, the terms are used interchangeably.
- Google Cloud Platform APIs are how Terra communicates with the Google Cloud execution engine (i.e. VMs).
- A collection of one or more workflows that start simultaneously. Selecting the “Run Analysis” button in Terra launches a new submission.
- The user-facing platform comprising a user interface, backend services, and API. Rawls: For purposes of this document, backend service creates submissions and sends their workflows to Cromwell.
- A virtual machine (aka VM) is a virtual construct that is functionally equivalent to a computer - complete with processing power and storage capacity - whose technical specifications are determined by what a user requests, rather than by the hardware where the computation and storage actually take place. This is actually what makes cloud computing so flexible - when you create a virtual machine it's just like setting up a new computer, but the power and configuration is determined by whatever you choose when you're creating that machine, and you can create, delete, modify, and replace these virtual machines on-demand.
- A collection of one or more tasks defined by a WDL file, specifying the order, inputs, and outputs.
WDL - Workflow Description Language is a community-driven programming language stewarded by the community at openWDLorg. It's designed for describing data-intensive computational workflows, and is designed with a focus on accessibility for scientists without deep programming expertise. Similarly to CWL, portability is a key factor in its design, and what differentiates WDL from CWL is that WDL is designed to be more human-readable whereas CWL is primarily optimized for being machine-readable.
Please sign in to leave a comment.