Need Help?

Search our documentation and community forum

Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Terra powers important scientific projects like FireCloud, AnVIL, and BioData Catalyst. Learn more.

Comments

9 comments

  • Avatar
    Jason Cerrato

    Hi Afshin Akbarzadeh,

    Thank you for reporting this issue. Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.

    1. Add GROUP_FireCloud-Support@firecloud.org to the User email field and press enter on your keyboard.
    2. Click Save.

    Let us know the workspace name, as well as the relevant submission and workflow IDs. We’ll be happy to take a closer look as soon as we can.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Afshin Akbarzadeh

    Hi Jason,
    Done.
    WS: Workspacesterra-billing-datester/afshin_ws_00

    Workflow: MfConversion

    Submission IDS, I had problem with:

    01dcb9d0-5f25-47b2-9e4a-1a9933389818
    58fab8d6-90cf-468d-9feb-f3b279eb7fc4
    ec944a62-60a2-4109-b25f-68ca009c4a8d
    7fa3f478-739b-4450-8dba-156225ccea46
    592b99a6-8c5e-4bac-923e-1f5873f674f2
    8580b41c-4e81-467a-9afe-4fc869bc162d

     

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Afshin,

    Thank you for that information, and for sharing the workspace. We'll take a look and get back to you as soon as we can!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hey Afshin,

    We were able to confirm that Workflow Dashboard and Job Manager are not showing correctly due to the amount of metadata generated by your workflow. Do you expect your workflow to generate so much metadata? Looking at the inputs alone we see a count of 4200716 inputs for two of the workflow IDs. Are you expecting to see over 4 million inputs for your job?

    Is there anything specific we can help you understand about your submissions? For example, do you want to know

    • why a workflow or a call failed?
    • why the metadata is so large?
    • something else?

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Afshin Akbarzadeh

    This is in DICOM image conversion task and I need to convert over 2000000 instances. My main cause in using terra was to scale up the process. My questions:
    1) The last job took too long and I aborted it yesterday.  Though, the job state is still "aborting" for over 24 hours. Submission ID: 01dcb9d0-5f25-47b2-9e4a-1a9933389818

    2) I also want to know why the job (Submission ID: 01dcb9d0-5f25-47b2-9e4a-1a9933389818) took so long to finish that I decided to abort it. The job should have been over in 2h while it took a day and when I aborted, it took an additional day to abort. Why is this so? 

    3) Some of the tasks failed with an error of not being able to evaluate output while the outputs were fine and accurate. I want to know why this error was fired.

    4) Is there any limit in the number of inputs or tasks for Terra workflows?

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Afshin,

    Thank you for detailing those questions. I'll be happy to bring them to our engineers to get some answers.

    One thing an engineer noticed is that your WDL employs a nested scatter.

    Nested scatters can produce quite a ballooning of metadata. If you reframe it to avoid having to nest the scatter, they will vastly reduce how much metadata your workflow produces. They wrote up a possible alternative you can consider implementing.

        scatter (p in zip(range(length(inputs)), range(length(inputs))))
    {
    Int i = p.left
    Int j = p.right
    Array[File] series_files= inputs[i][j].SERIES_PATH
    }

    scatter (i in range(length(inputs))) {
    File json_file = input_sereis.json[i]

    call test_task
    {
    input: sereis_file_list=series_files,
    json_file=json_file
    }
    }

    I'll get back to you with answers as soon I can!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Afshin Akbarzadeh

    I believe the zip (dot product) is not gonna work for my case. I need to have a nested loop or at least use cross(Array[X], Array[Y]) function which is going to generate the same amount of metadata

    Afshin

    0
    Comment actions Permalink
  • Avatar
    Afshin Akbarzadeh

    Also worth mentioning that the dockstore workflow was not updated to my last code. Now it is updated.

    Afshin

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hey Afshin,

    Thanks for your patience here. Our engineer agrees that cross would have been the better choice! They would also like to let you know that employing a non-nested scatter would almost certainly produce less metadata because of the nuances of how Cromwell (the workflow engine) handles nested scatters. Nested scatter subworkflows result in Cromwell printing out the entire array worth of metadata as inputs for every index in the scatter. This results in a lot of metadata duplication, so not using nested scatters where possible will definitely cut down on the amount of resultant metadata.

    The engineers also noticed that your workflow doesn’t have an outputs section, which results in Cromwell re-reporting all of the inputs and outputs of the nested scatters as 3-dimensional arrays in the outputs section. This is also contributing to the size of your workflows.

    So to summarize, these are our recommendations for a smoother workflow experience:

    1. Use the cross method to remove the need for a nested scatter. The total amount of work being done will be the same, but Cromwell won’t need to report all of its intermediate values out as metadata
    2. Add an output section so that intermediate output values are also not reported

    As a side note, our Cromwell team currently has better handling of workflows with large amounts of metadata as a major priority in the next quarter.

    We've also fixed the status of submission 58fab8d6-90cf-468d-9feb-f3b279eb7fc4 stuck in Aborting!

    If we can help with anything else, please let us know!

    Kind regards,

    Jason

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk