What is going on behind the curtain when you hit "Launch Analysis" on a Terra workflow? This article outlines the components working in the backend to help you understand where and why you might encounter lags (slow runtimes) or failures. This information is especially useful if you're learning to develop WDLs, or troubleshooting and optimizing large-scale batch workflows.
Under the hood, quite a lot is happening when you run a workflow!
What you see in the workspace
Once you've launched a workflow, Terra automatically takes you to the Job History page where you can view the status of your workflow(s) and monitor how the work is progressing. Note that you may need to refresh the browser window to update the status.
When all workflows in a submission are complete, the submission's overall status updates in the Job History page.
Three subsystems - plus the Google Cloud Platform backend - work together to make the magic happen.
- Terra User Interface (UI: what you see when you access Terra in a browser)
- Rawls (internal component that creates submissions and sends their workflows to Cromwell)
- Cromwell (a Workflow Management System geared towards scientific workflows
- Google Cloud Compute Engine
Diagram of how the Terra UI, Rawls, Cromwell and Google Cloud work together.
What's going on in the backend
Various system components kick into gear to ensure that submissions of one or many workflows gets properly assembled and, when that’s done, that each individual task is properly dispatched to the Google Compute Engine for execution.
- Terra's UI sends instructions to the Rawls subsystem to create a submission.
A submission could be one workflow (if you're feeding all the input to a single workflow instance) or more than one (if you're running the same workflow on multiple inputs separately)
- Rawls submits individual launch requests to Cromwell.
- Six Cromwell runners pick up workflows from the queue and interpret their WDL code across all tasks, defining the jobs (specific command lines) that need to be run on the cloud backend.
- Six Cromwell runners submit the jobs to the Google Cloud Life Sciences API (i.e. PAPI), which is responsible for provisioning the Google Cloud Engine virtual machines (VMs) and executing the jobs on those VMs.
- As jobs finish, Cromwell writes the corresponding logs and output files to cloud storage. Once all jobs have finished, the workflow terminates. Rawls detects the termination, updates the workflow status and writes output metadata to the data table.
For a deeper dive into the back end - and a story of how Terra engineers examined and tweaked the underlying structures to improve performance - see the blog post Smarter workflow launching reduces latency and improves user experience.
Steps where workflows may move slowly
Terra's workflow management and execution system was designed to meet the needs of a wide range of audiences. It aims to be:
- Efficient and consistent for large research groups, like All of Us
- Smooth and reliable for smaller research groups
- Low latency and nimble for method developers who are testing and iterating on their tools
Terra accomplishes this by intelligently distributing its computational resources among its active workflow submissions. As a result, your workflows might run more slowly during some steps, and more quickly during others. Read each section below to understand what's happening at each of these bottlenecks, and how they help the overall workflow experience run more smoothly.
Rawls takes the workflow specified in the WDL and asks Cromwell to run it.
Possible sources of lag
- Rawls supports up to 3,000 concurrent workflows per user (across all submissions). If there are many user-submitted jobs, especially from the same billing project, your submission's status will remain "Submitted" while the Cromwell engine works its way through the queue.
Cromwell is responsible for managing the sequence of the tasks (jobs), and asks the Google Pipelines API (PAPI) to launch each task in the workflow when its inputs become available. Cromwell uses a "constellation" of six runner instances - each with a quota of 4,800 jobs per workspace - working together to perform various tasks, distribute load, and provide increased uptime with redundancy.
Possible sources of lag
- If you are using preemptible machines, there could be delay if you are preempted while a preempted machine is restarted. Note that you are not charged for this time.
- Cromwell can handle 28,800 jobs at a time (4,800 for each of six runners). Jobs over and above this hard limit will wait until room frees up to dispatch them.
PAPI starts a virtual machine (VM) for each task, and provides the inputs; the WDL specifies what it should do, the environment to do it in (the Docker image), and requests the outputs when it is done. Each VM's requirements - such as RAM, disk space, memory size, and number of CPUs - can be specified in the task. Once the task is done, PAPI will shut down the VM.
Possible sources of lag
- It can take longer to set up a more complex VM. You won't be charged while the machine is set up, only when it is running.
- It can take time to localize large data files to the VM disk. This time has a GCP charge.
The Docker required for each task will be pulled to the virtual machine along with any inputs from Google buckets. The task's outputs will be saved in the Google bucket of the workspace from which the workflow was launched. If the workflow's inputs and outputs were configured using a data table, links to these outputs will be written back to the table.
Possible sources of lag
- This is likely to be caused by large numbers of workflows, which each have large jobs. Consider the million-job scenario: you submit 1,000 workflows, each of which launches 1,000 jobs. Because 1,000 workflows easily fit within the Rawls quota of 3,000 concurrent workflows, all of the workflows immediately start running in Cromwell, which proceeds to launch 6 x 4,800 = 28,800 jobs across the six runners. That job capacity is enough for 29 of the 1,000 workflows to make progress, while the remaining 971 workflows simply sit there waiting for their jobs to run.
Monitoring individual workflows in a submission
How do you know if something is wrong with one of these 1,000 workflows?
You can see the breakdown of each workflow's status in your workspace's Job History tab by clicking on an individual submission. If Cromwell determines that a workflow in the submission will not be able to start due to job capacity limitations, it holds the workflow in a separate queue with a clear “Queued in Cromwell” status indicator until capacity becomes available.
For more information on how to monitor the status of a workflow submission, see Job History overview (monitoring workflows).
Glossary of useful terms
If you're new to bioinformatics, and especially if you're new to cloud computing, you may find lots of unfamiliar terms in this article. Understanding the ones below is useful when discussing what Terra is doing behind-the-scenes. Click to expand the definitions.
See a more comprehensive list at Glossary of terms related to cloud-based genomics.
For more information on the architecture used to run workflows, see Terra architecture and where your files live in it.
- Backend service that processes workflows and sends their jobs (tasks) to the cloud.
A lightweight, standalone, executable package of software that includes everything needed to run an application on a virtual machine. A Docker container can specify the operating system, runtime configuration, and all necessary dependencies needed for a given application. Packaging these components together is called “containerizing” them, and it’s an effective way of making VM configurations shareable and making the analyses that depend on keeping these configurations consistent reproducible.
A JSON is an open-standard file and data interchange format that uses human-readable text to store and transmit data objects - including attribute–value pairs and arrays.
- Also known as a task. Technically, a task is the element of a WDL that defines a command-line script plus a Docker image that will execute on the cloud backend. Once launched, a job is an instance of a task running on a VM. In practice, the terms are used interchangeably.
- Google Cloud Platform APIs are how Terra communicates with the Google Cloud execution engine (i.e. VMs).
- A collection of one or more workflows that start simultaneously. Selecting the “Run Analysis” button in Terra launches a new submission.
- The user-facing platform comprising a user interface, backend services, and API.
- For the purposes of this document, Rawls is the backend service that creates submissions and sends their workflows to Cromwell.
- A virtual machine (aka VM) is a virtual construct that is functionally equivalent to a computer - complete with processing power and storage capacity - whose technical specifications are determined by what a user requests, rather than by the hardware where the computation and storage actually take place. This is actually what makes cloud computing so flexible - when you create a virtual machine it's just like setting up a new computer, but the power and configuration is determined by whatever you choose when you're creating that machine, and you can create, delete, modify, and replace these virtual machines on-demand.
- A collection of one or more tasks defined by a WDL file, specifying the order, inputs, and outputs.
WDL is a community-driven programming language stewarded by the community at openWDLorg. It's designed for describing data-intensive computational workflows, and is accessible for scientists without deep programming expertise. Similarly to CWL, portability is a key factor in its design. Compared to CWL, WDL is designed to be more human-readable, whereas CWL is primarily optimized for machine-readability.
Optimizing for diverse use-cases
Terra uses several methods to run workflows efficiently and reliably for both large and small research groups. Read on for more in-depth information on these methods.
To improve efficiency, Rawls submits requests to Cromwell (step 2) in batches of 50, rather than one at a time.
Cromwell's runners operate in parallel. This dramatically increases the number of jobs that can run concurrently, allowing Terra to submit up to 28,800 jobs on your behalf. This especially benefits workflows that generate lots of concurrent jobs (for example, WDLs with a lot of scatter parallelism).
To balance the needs of researchers and groups with different job sizes and platform priorities, Cromwell uses a priority algorithm:
- Identify all projects with workflows to run.
- Sort projects by the lowest number of currently-running workflows.
- Start one workflow from that group.
- When groups are tied, select the group that started a workflow first.
- Wait for the next clock tick.
- Repeat this process.
Small users benefit from this prioritization, since they will no longer have to wait in line for a large genome center to complete thousands of workflows before starting theirs. As a result, these users can iterate on their workflows quickly.
And large users will rarely notice, because the small users are small and therefore don't significantly delay their jobs. It's also an elegant way to handle simultaneous large submissions from multiple large users.