Job History overview (monitoring workflows)

Allie Hajian
  • Updated

The Job History tab is your workspace operations dashboard, where you can check the status of past and current workflow submissions, drill down to see what’s going on (i.e. troubleshoot), and find direct links to all input and output files. This article walks through the functions you will find. 

Job History reporting structure

Workflow information is organized in a hierarchy from submissions to tasks. As you click into each level, you will get increasingly granular details.

Submission: a collection of workflows submitted in one batch
----
Workflow: a particular run of a workflow/method on a specific dataset
----
Task: the lowest level of analysis reporting representing individual
calls/jobs made during workflow execution

Workspace submissions (Job History top level)

Job-History_Screen_shot.png

Here you'll find a list of all submissions, along with their current status (i.e. submitted, queued, submitted, running, completed or failed), and links to further information. Each submission is one row, no matter how many entities you run on in that submission.

A note about deleting information in Job History It is possible to filter the list, but you cannot delete any submissions. Similarly, it is not possible to delete workflows within a submission.

For guidance about deleting files that belong to past submissions, please see the forum.

Submission status

Workflow submissions can be in the following states, which will be listed (and updated in real time) in the top-level Job History page. 

Queued, Launching or Submitted

In these states, the workflows are being handed off from Terra to Cromwell. See Overview: How the workflow system works

Running

When running, the commands specified in the WDL script are being executed on virtual machines.

Aborted and aborting

These will display for analysis submissions, workflows, and tasks if you have requested a workflow be aborted. These are not pictured here.

Succeeded

When all the tasks reach Done successfully, the workflow will be updated to Succeeded, and the Job History page will show the submission as Done.

Workflow-level details (Job History submission page)

Clicking on a particular submission opens the next level in the Job History. At the top of the page is information about the submission as a whole. 

Job-History-submission-page_Screen_shot.png

Submission information

Workflow information

The submission page lists each workflow within the submission and its status (failed, queued, etc.). If you ran on two samples, for example, there would be two rows, one for each sample.

Task-level details (Job Manager or Workflow Dashboard)

From the submission page you can access task-level details by selecting one of the three icons at the right of the particular workflow you're interested in.

Troubleshooting_Job-History-submision_Screen_shot.pngTask-level-details-links_Sreen_shot.png
Job Manager | Workflow Dashboard | Execution Directory

If you don't see these iconsIf your job failed because it never started (if Terra could not find your input files to localize, for example), you won't see these options. 

Job manager 

Clicking the icon at left will open the Job Manager, your go-to location for a more thorough breakdown of your workflow. Here you can find information about each individual task in the workflow, including

  1. Failure messages
  2. Log files, links to Google Cloud executions directories, and compute details

Troubleshooting_Job_Manager_Screen_shot.png

Job-Manager_Links_Screen_shot.png
Backend task log | Execution directory | Compute details

Note that the Job Manager will open in a new tab and is outside of your workspace. 

If Job Manager won’t loadJob Manager may fail to load if your job produced huge amounts of metadata. In these cases, skip to the Workflow Dashboard (described below).

Backend task log

If it's not immediately obvious what failed, the best sources of information are often log files. These files are generated by Cromwell when executing any task and are placed in the task's folder along with its output. In Terra, we add quick links to these files to make troubleshooting easier.

The backend task log gives a step-by-step report of actions during the execution of the task. These details include information about Docker setup, localization (the step of copying files from your google bucket into the Docker container), stdout from tools run within the command block of the task, and finally, the delocalization and Docker shutdown steps.

You can also see this in Google Cloud Platform console by clicking the link at the bottom.
Troubleshooting-Backend-log_Screen_shot.png

If your log stopped abruptly Some log files seem to stop abruptly, not yet having reached the delocalization stage. This is almost certainly because the task has run out of memory. We recommend retrying with more memory to see if your job gets farther. See Out Of Memory Retry to learn more about how to configure your workflow to immediately retry certain tasks if the only error was to run out of memory.

Execution directory

Clicking on this icon will redirect you to the exact folder/directory in your workspace Google bucket where you can find your stderr, stdout, and backend logs. From there, you can open those files to view their contents or download them. If your task generates outputs, this is where you will find them as well. 

Troubleshooting_Execution-directory_Screen_shot.png

1. taskname.log

A log file tracking the events that occurred in performing the task such as downloading docker, localizing files, etc. This is the same log mentioned in the previous section. Occasionally a workflow will fail without a stderr and stdout files, leaving you with only a task log.

2. stderr and stdout

Standard Error (stderr)
A file containing error messages produced by the commands executed in the task. A good place to start for a failed task, as many common task level errors are indicated in the stderr file.

Standard Output (stdout)
A file containing log outputs generated by commands in the task. Not all commands generate log outputs and so this file may be empty.

Compute details

This section displays information on the workflow at the Google Pipelines worker level, including timestamps for the execution of worker tasks and VM configuration information. You can use this section to understand or validate the configuration of your worker VM (memory, disk size, machine type, etc.). You can also check this section if you suspect your workflow failed due to a transient Google issue.
Troubleshooting-TaskName-log-in-UI_Screen_shot.png

Workflow Dashboard

The workflow dashboard includes some of the details in the Job Manager, but as part of your workspace.

  1. Error (failure) messages
  2. Links to the Job Manager and the Execution directory

Troublehooting_Workflow-Dashboard_Screen_shot.png

Execution directory (icon at right)

The Execution directory, which is on Google Cloud Platform console, includes a wealth of details on the API side of things. 

Troubleshooting_Execution-directory-1_Screen_shot.png

For more information about what goes on under the hood, see What happens when you launch a workflow? 

 

 

Was this article helpful?

0 out of 0 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.