Can't open the job manager Answered

March 16, 2021 13:51
9 comments

I'm not able to open up the job manager page and keep getting this page instead. I couldn't open the workflow dashboard page as well.

Comments

9 comments

Jason Cerrato
- March 16, 2021 14:00
Hi Afshin Akbarzadeh,

Thank you for reporting this issue. Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.
1. Add GROUP_FireCloud-Support@firecloud.org to the User email field and press enter on your keyboard.
2. Click Save.
Let us know the workspace name, as well as the relevant submission and workflow IDs. We’ll be happy to take a closer look as soon as we can.

Kind regards,

Jason
0
Afshin Akbarzadeh
- March 16, 2021 14:41
Hi Jason,
Done.
WS: Workspacesterra-billing-datester/afshin_ws_00

Workflow: MfConversion

Submission IDS, I had problem with:

01dcb9d0-5f25-47b2-9e4a-1a9933389818
58fab8d6-90cf-468d-9feb-f3b279eb7fc4
ec944a62-60a2-4109-b25f-68ca009c4a8d
7fa3f478-739b-4450-8dba-156225ccea46
592b99a6-8c5e-4bac-923e-1f5873f674f2
8580b41c-4e81-467a-9afe-4fc869bc162d

0
Jason Cerrato
- March 17, 2021 13:27
Hi Afshin,

Thank you for that information, and for sharing the workspace. We'll take a look and get back to you as soon as we can!

Kind regards,

Jason

0
Jason Cerrato
- March 17, 2021 14:42
Hey Afshin,

We were able to confirm that Workflow Dashboard and Job Manager are not showing correctly due to the amount of metadata generated by your workflow. Do you expect your workflow to generate so much metadata? Looking at the inputs alone we see a count of 4200716 inputs for two of the workflow IDs. Are you expecting to see over 4 million inputs for your job?

Is there anything specific we can help you understand about your submissions? For example, do you want to know
- why a workflow or a call failed?
- why the metadata is so large?
- something else?
Kind regards,

Jason
0
Afshin Akbarzadeh
- March 17, 2021 16:05
This is in DICOM image conversion task and I need to convert over 2000000 instances. My main cause in using terra was to scale up the process. My questions:
1) The last job took too long and I aborted it yesterday. Though, the job state is still "aborting" for over 24 hours. Submission ID: 01dcb9d0-5f25-47b2-9e4a-1a9933389818

2) I also want to know why the job (Submission ID: 01dcb9d0-5f25-47b2-9e4a-1a9933389818) took so long to finish that I decided to abort it. The job should have been over in 2h while it took a day and when I aborted, it took an additional day to abort. Why is this so?

3) Some of the tasks failed with an error of not being able to evaluate output while the outputs were fine and accurate. I want to know why this error was fired.

4) Is there any limit in the number of inputs or tasks for Terra workflows?

0

Jason Cerrato

March 17, 2021 16:22

Hi Afshin,

Thank you for detailing those questions. I'll be happy to bring them to our engineers to get some answers.

One thing an engineer noticed is that your WDL employs a nested scatter.

Nested scatters can produce quite a ballooning of metadata. If you reframe it to avoid having to nest the scatter, they will vastly reduce how much metadata your workflow produces. They wrote up a possible alternative you can consider implementing.

    scatter (p in zip(range(length(inputs)), range(length(inputs))))
    {
        Int i = p.left
        Int j = p.right        
        Array[File] series_files= inputs[i][j].SERIES_PATH
    }    

    scatter (i in range(length(inputs))) {
        File json_file = input_sereis.json[i]        
        
        call test_task
        { 
            input: sereis_file_list=series_files,
            json_file=json_file
        }
    }

I'll get back to you with answers as soon I can!

Kind regards,

Jason

Afshin Akbarzadeh
- March 17, 2021 17:00
I believe the zip (dot product) is not gonna work for my case. I need to have a nested loop or at least use cross(Array[X], Array[Y]) function which is going to generate the same amount of metadata

Afshin

0
Afshin Akbarzadeh
- Edited March 17, 2021 17:35
Also worth mentioning that the dockstore workflow was not updated to my last code. Now it is updated.

Afshin

0
Jason Cerrato
- March 19, 2021 13:03
Hey Afshin,

Thanks for your patience here. Our engineer agrees that cross would have been the better choice! They would also like to let you know that employing a non-nested scatter would almost certainly produce less metadata because of the nuances of how Cromwell (the workflow engine) handles nested scatters. Nested scatter subworkflows result in Cromwell printing out the entire array worth of metadata as inputs for every index in the scatter. This results in a lot of metadata duplication, so not using nested scatters where possible will definitely cut down on the amount of resultant metadata.

The engineers also noticed that your workflow doesn’t have an outputs section, which results in Cromwell re-reporting all of the inputs and outputs of the nested scatters as 3-dimensional arrays in the outputs section. This is also contributing to the size of your workflows.

So to summarize, these are our recommendations for a smoother workflow experience:
1. Use the cross method to remove the need for a nested scatter. The total amount of work being done will be the same, but Cromwell won’t need to report all of its intermediate values out as metadata
2. Add an output section so that intermediate output values are also not reported
As a side note, our Cromwell team currently has better handling of workflows with large amounts of metadata as a major priority in the next quarter.

We've also fixed the status of submission 58fab8d6-90cf-468d-9feb-f3b279eb7fc4 stuck in Aborting!

If we can help with anything else, please let us know!

Kind regards,

Jason
0

Please sign in to leave a comment.