Job History overview (monitoring workflows)

Allie Hajian
  • Updated

The Job History tab is your workspace operations dashboard, where you can check the status of past and current workflow submissions, drill down to see what’s going on (i.e., troubleshoot), and find direct links to all input and output files. This article walks through functions you need to know. 

Job History reporting structure

Workflow information is organized in a hierarchy from submissions to tasks. As you click into each level, you get increasingly granular details.

Submission: a collection of workflows submitted in one batch
----
Workflow: a particular run of a workflow/method on a specific dataset
----
Task: the lowest level of analysis reporting representing individual
calls/jobs made during workflow execution

Workspace submissions (Job History top level)

Job-History_Screen_shot.png

Here is a list of all submissions, along with their current status (i.e., submitted, queued, submitted, running, completed or failed), and links to further information. Each submission is one row, no matter how many entities you run on in that submission.

A note about deleting information in Job History It's possible to filter the list, but you can't delete any submissions. Similarly, it's not possible to delete workflows within a submission.

For guidance about deleting files that belong to past submissions, please see the forum.

Submission status

Workflow submissions can be in the following states, which will be listed (and updated in real time) in the top-level Job History page. 

Queued, Launching or Submitted

In these states, the workflows are being handed off from Terra to Cromwell (see Overview: How the workflow system works

Running

When running, the commands specified in the WDL script are being executed on virtual machines.

Aborted and aborting

These display for analysis submissions, workflows, and tasks if you requested a workflow to be aborted. These are not pictured here.

Succeeded

When all the tasks reach Done successfully, the workflow is updated to Succeeded, and the Job History page shows the submission as Done.

Workflow-level details (Job History submission page)

Clicking on a particular submission opens the next level in the Job History. At the top of the page is information about the submission as a whole

Job-History-submission-page_Screen_shot.png

Submission information

Workflow information

The submission page lists each workflow within the submission and its status (failed, queued, etc.). If you ran on two samples, for example, there would be two rows, one for each sample.

Task-level details (Job Manager or Workflow Dashboard)

From the submission page you can access task-level details by selecting one of the three icons at the right of the particular workflow you're interested in.

Screenshot of the Workflow submission page of an example failed workflow run. The image is annotated with an orange box highlighting the links to the Job Manager, Workflow Dashboard, and Execution Directory for the workflow.Task-level-details-links_Sreen_shot.png
Job Manager | Workflow Dashboard | Execution Directory

If you don't see these iconsIf your job failed because it never started (e.g., if Terra could not find your input files to localize), you won't see these options. 

Job manager 

Clicking the icon at left opens the Job Manager, your go-to location for a more thorough breakdown of your workflow. Here you can find information about each individual task in the workflow, including

  1. Failure messages
  2. Log files, links to Google Cloud executions directories, and compute details

Troubleshooting_Job_Manager_Screen_shot.png

Job-Manager_Links_Screen_shot.png
Backend task log | Execution directory | Compute details

Note: The Job Manager will open in a new tab and is outside of your workspace. 

If Job Manager won’t loadJob Manager may fail to load if your job produced huge amounts of metadata. In these cases, skip to the Workflow Dashboard (described below).

Backend task log

If it's not immediately obvious what failed, the best sources of information are often log files. These files are generated by Cromwell when executing any task and are placed in the task's folder along with its output. In Terra, we add quick links to these files to make troubleshooting easier.

The backend task log gives a step-by-step report of actions during the execution of the task. These details include information about Docker setup, localization (the step of copying files from your Google bucket into the Docker container), stdout from tools run within the command block of the task, and finally, the delocalization and Docker shutdown steps.

You can also see this in Google Cloud console by clicking the link at the bottom.

Troubleshooting-Backend-log_Screen_shot.png

If your log stopped abruptly Some log files seem to stop abruptly, not yet having reached the delocalization stage. This is almost certainly because the task has run out of memory. We recommend retrying with more memory to see if your job gets farther. See Out Of Memory Retry to learn more about how to configure your workflow to immediately retry certain tasks if the only error was to run out of memory.

Execution directory

Clicking on this icon will redirect you to the exact folder/directory in your workspace Google bucket where you can find your stderr, stdout, and backend logs. From there, you can open those files to view their contents or download them. If your task generates outputs, this is where you will find them as well. 

Troubleshooting_Execution-directory_Screen_shot.png

1. taskname.log

A log file tracking the events that occurred in performing the task such as downloading Docker, localizing files, etc. This is the same log mentioned in the previous section. Occasionally a workflow will fail without a stderr and stdout files, leaving you with only a task log.

2. stderr and stdout

Standard Error (stderr)
A file containing error messages produced by the commands executed in the task. A good place to start for a failed task, as many common task-level errors are indicated in the stderr file.

Standard Output (stdout)
A file containing log outputs generated by commands in the task. Not all commands generate log outputs and so this file may be empty.

Compute details

This section displays information on the workflow at the Google Pipelines worker level, including timestamps for the execution of worker tasks and virtual machine (VM) configuration information. You can use this section to understand or validate the configuration of your worker VM (memory, disk size, machine type, etc.). You can also check this section if you suspect your workflow failed due to a transient Google issue.

This information is available for 42 days from when the pipeline (VM) started; after which time, it ceases to be accessible. This is a Google lifecycle policy and there's no workaround to retrieve the data after 42 days.

Troubleshooting-TaskName-log-in-UI_Screen_shot.png

Workflow Dashboard

The workflow dashboard includes some of the details in the Job Manager, but as part of your workspace.

  1. Error (failure) messages
  2. Links to the Job Manager and the Execution directory 

Screenshot of the Workflow Dashboard for an example workflow. The image is annotated with orange boxes to highlight error messages and links to the Job Manager, Execution Directory, and execution log.

Execution directory (icon at right)

The Execution directory, which is on Google Cloud console, includes a wealth of details on the API side of things. 

Troubleshooting_Execution-directory-1_Screen_shot.png

For more information about what goes on under the hood, see What happens when you launch a workflow? 

 

 

Was this article helpful?

0 out of 0 found this helpful

Comments

0 comments

Please sign in to leave a comment.