Overview: How the workflow system works

Allie Cliffe
  • Updated

What is going on behind the curtain when you hit "Launch Analysis" on a Terra workflow? This article outlines the components working in the backend to help you understand where and why you might encounter lags (slow runtimes) or failures. This information is especially useful if you're learning to develop WDLs, or troubleshooting and optimizing large-scale batch workflows.

Back-end overview

Under the hood, quite a lot is happening when you run a workflow! 

What you see in the workspace

Once you've launched a workflow, Terra automatically takes you to the Job History page where you can view the status of your workflow(s) and monitor how the work is progressing. Note that you may need to refresh the browser window to update the status.

When all workflows in a submission are complete, the submission's overall status updates in the Job History page. 

Terra subsystems

Three subsystems - plus the Google Cloud Platform backend - work together to make the magic happen. 

  • Terra User Interface (UI: what you see when you access Terra in a browser)
  • Rawls (internal component that creates submissions and sends their workflows to Cromwell)
  • Cromwell (a Workflow Management System geared towards scientific workflows
  • Google Cloud Compute Engine

Diagram of how the Terra UI, Rawls, Cromwell and Google Cloud work together to run a workflow. A researcher launches a workflow from the Terra User Interface. Then, the Rawls sub-system creates a workflow submission, which it sends to the Cromwell execution engine. Cromwell creates individual jobs to run the workflow's commands, which it dispatches job by job to the Google Cloud Compute Engine.
Diagram of how the Terra UI, Rawls, Cromwell and Google Cloud work together.

What's going on in the backend

Various system components kick into gear to ensure that submissions of one or many workflows gets properly assembled and, when that’s done, that each individual task is properly dispatched to the Google Compute Engine for execution.

  1. Terra's UI sends instructions to the Rawls subsystem to create a submission.
    A submission could be one workflow (if you're feeding all the input to a single workflow instance) or more than one (if you're running the same workflow on multiple inputs separately)
  2. Rawls submits individual launch requests to Cromwell.
  3. Six Cromwell runners pick up workflows from the queue and interpret their WDL code across all tasks, defining the jobs (specific command lines) that need to be run on the cloud backend. 
    Diagram of how the Cromwell executation engine creates jobs for the Google Cloud Compute Engine. Cromwell generates a list of jobs (one per command in the workflow). These jobs are sent to the Cloud Compute Engine via six runners.
  4. Six Cromwell runners submit the jobs to the Google Cloud Life Sciences API (i.e. PAPI), which is responsible for provisioning the Google Cloud Engine virtual machines (VMs) and executing the jobs on those VMs. 
  5. As jobs finish, Cromwell writes the corresponding logs and output files to cloud storage. Once all jobs have finished, the workflow terminates. Rawls detects the termination, updates the workflow status and writes output metadata to the data table.

For a deeper dive into the back end - and a story of how Terra engineers examined and tweaked the underlying structures to improve performance - see the blog post Smarter workflow launching reduces latency and improves user experience.

Steps where workflows may move slowly

Terra's workflow management and execution system was designed to meet the needs of a wide range of audiences. It aims to be:

  • Efficient and consistent for large research groups, like All of Us
  • Smooth and reliable for smaller research groups
  • Low latency and nimble for method developers who are testing and iterating on their tools

Terra accomplishes this by intelligently distributing its computational resources among its active workflow submissions. As a result, your workflows might run more slowly during some steps, and more quickly during others. Read each section below to understand what's happening at each of these bottlenecks, and how they help the overall workflow experience run more smoothly.

  • What's happening

    Rawls takes the workflow specified in the WDL and asks Cromwell to run it

    Possible sources of lag

    • Rawls supports up to 3,000 concurrent workflows per user (across all submissions). If there are many user-submitted jobs, especially from the same billing project, your submission's status will remain "Submitted" while the Cromwell engine works its way through the queue.
  • What's happening

    Cromwell is responsible for managing the sequence of the tasks (jobs), and asks the Google Pipelines API (PAPI) to launch each task in the workflow when its inputs become available. Cromwell uses a "constellation" of six runner instances - each with a quota of 4,800 jobs per workspace - working together to perform various tasks, distribute load, and provide increased uptime with redundancy.

    Possible sources of lag

    • If you are using preemptible machines, there could be delay if you are preempted while a preempted machine is restarted. Note that you are not charged for this time
    • Cromwell can handle 28,800 jobs at a time (4,800 for each of six runners). Jobs over and above this hard limit will wait until room frees up to dispatch them. 
  • What's happening

    PAPI starts a virtual machine (VM) for each task, and provides the inputs; the WDL specifies what it should do, the environment to do it in (the Docker image), and requests the outputs when it is done. Each VM's requirements - such as RAM, disk space, memory size, and number of CPUs - can be specified in the task. Once the task is done, PAPI will shut down the VM.

    Possible sources of lag

    • It can take longer to set up a more complex VM. You won't be charged while the machine is set up, only when it is running.
    • It can take time to localize large data files to the VM disk. This time has a GCP charge.
  • What's happening

    The Docker required for each task will be pulled to the virtual machine along with any inputs from Google buckets. The task's outputs will be saved in the Google bucket of the workspace from which the workflow was launched. If the workflow's inputs and outputs were configured using a data table, links to these outputs will be written back to the table.

    Possible sources of lag

    • This is likely to be caused by large numbers of workflows, which each have large jobs. Consider the million-job scenario: you submit 1,000 workflows, each of which launches 1,000 jobs. Because 1,000 workflows easily fit within the Rawls quota of 3,000 concurrent workflows, all of the workflows immediately start running in Cromwell, which proceeds to launch 6 x 4,800 = 28,800 jobs across the six runners. That job capacity is enough for 29 of the 1,000 workflows to make progress, while the remaining 971 workflows simply sit there waiting for their jobs to run.

    Monitoring individual workflows in a submission

    How do you know if something is wrong with one of these 1,000 workflows?

    You can see the breakdown of each workflow's status in your workspace's Job History tab by clicking on an individual submission. If Cromwell determines that a workflow in the submission will not be able to start due to job capacity limitations, it holds the workflow in a separate queue with a clear “Queued in Cromwell” status indicator until capacity becomes available.

    Screenshot showing the 'Calls' menu, which displays the status for the individual workflows in an example submission. In this example, some jobs are held in a separate queue while waiting for other jobs to run. The status message indicates that 344 jobs are running, 4216 have succeeded, and 442 have an 'unexpected status (QueuedInCromwell)'.

For more information on how to monitor the status of a workflow submission, see Job History overview (monitoring workflows).

Glossary of useful terms

If you're new to bioinformatics, and especially if you're new to cloud computing, you may find lots of unfamiliar terms in this article. Understanding the ones below is useful when discussing what Terra is doing behind-the-scenes. Click to expand the definitions.

See a more comprehensive list at Glossary of terms related to cloud-based genomics.

For more information on the architecture used to run workflows, see Terra architecture and where your files live in it.

  • Backend service that processes workflows and sends their jobs (tasks) to the cloud.
  • A lightweight, standalone, executable package of software that includes everything needed to run an application on a virtual machine. A Docker container can specify the operating system, runtime configuration, and all necessary dependencies needed for a given application. Packaging these components together is called “containerizing” them, and it’s an effective way of making VM configurations shareable and making the analyses that depend on keeping these configurations consistent reproducible.

  • A JSON is an open-standard file and data interchange format that uses human-readable text to store and transmit data objects - including attribute–value pairs and arrays.

  • Also known as a task. Technically, a task is the element of a WDL that defines a command-line script plus a Docker image that will execute on the cloud backend. Once launched, a job is an instance of a task running on a VM. In practice, the terms are used interchangeably.
  • Google Cloud Platform APIs are how Terra communicates with the Google Cloud execution engine (i.e. VMs).
  • A collection of one or more workflows that start simultaneously. Selecting the “Run Analysis” button in Terra launches a new submission.
  • The user-facing platform comprising a user interface, backend services, and API.
  • For the purposes of this document, Rawls is the backend service that creates submissions and sends their workflows to Cromwell.
  • A virtual machine (aka VM) is a virtual construct that is functionally equivalent to a computer - complete with processing power and storage capacity - whose technical specifications are determined by what a user requests, rather than by the hardware where the computation and storage actually take place. This is actually what makes cloud computing so flexible - when you create a virtual machine it's just like setting up a new computer, but the power and configuration is determined by whatever you choose when you're creating that machine, and you can create, delete, modify, and replace these virtual machines on-demand.
  • A collection of one or more tasks defined by a WDL file, specifying the order, inputs, and outputs.
  • WDL is a community-driven programming language stewarded by the community at openWDLorg. It's designed for describing data-intensive computational workflows, and is accessible for scientists without deep programming expertise. Similarly to CWL, portability is a key factor in its design. Compared to CWL, WDL is designed to be more human-readable, whereas CWL is primarily optimized for machine-readability.

Optimizing for diverse use-cases

Terra uses several methods to run workflows efficiently and reliably for both large and small research groups. Read on for more in-depth information on these methods.

  • To improve efficiency, Rawls submits requests to Cromwell (step 2) in batches of 50, rather than one at a time. 

    Diagram showing how Rawls batches jobs to submit them to Cromwell in an efficient way. This batching significantly speeds workflows' runtime. For example, a batch of 50 jobs might take 35 seconds to run; in contrast, running these jobs without batching might take 24 seconds per job!

  • Cromwell's runners operate in parallel. This dramatically increases the number of jobs that can run concurrently, allowing Terra to submit up to 28,800 jobs on your behalf. This especially benefits workflows that generate lots of concurrent jobs (for example, WDLs with a lot of scatter parallelism). 

    Diagram comparing the workflow efficiency when Cromwell runners pick up jobs in batches versus one job at a time. The left-hand side of the diagram illustrates what happens when the runner pick up workflow job lists in batches: one runner might pick up many job lists, while the other runners have none. This means that the job execution will be limited by the maximum number of jobs that each runner can submit at a time (4,800). The right-hand side of the diagram illustrates the method that Terra uses. The jobs are evenly distributed between the 6 runners. This means that more jobs can be executed at once (4,800 per runner, or 28,800 total).

  • To balance the needs of researchers and groups with different job sizes and platform priorities, Cromwell uses a priority algorithm: 

    1. Identify all projects with workflows to run.
    2. Sort projects by the lowest number of currently-running workflows.
    3. Start one workflow from that group.
    4. When groups are tied, select the group that started a workflow first.
    5. Wait for the next clock tick.
    6. Repeat this process.

    Small users benefit from this prioritization, since they will no longer have to wait in line for a large genome center to complete thousands of workflows before starting theirs. As a result, these users can iterate on their workflows quickly.

    And large users will rarely notice, because the small users are small and therefore don't significantly delay their jobs. It's also an elegant way to handle simultaneous large submissions from multiple large users. 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.