Need Help?

Search our documentation and community forum

Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Terra powers important scientific projects like FireCloud, AnVIL, and BioData Catalyst. Learn more.

Terra sometimes misreports workflow times

Comments

7 comments

  • Avatar
    Jason Cerrato

    Hi Aisling O'Farrell,

    Thanks for writing in—we'll be happy to take a closer look at this.

    Can you share the workspace where you are seeing this issue with GROUP_FireCloud-Support@firecloud.org by clicking the Share button in your workspace? The Share option is in the three-dots menu at the top-right.

    1. Add GROUP_FireCloud-Support@firecloud.org to the User email field and press enter on your keyboard.
    2. Click Save.


    Let us know the workspace name or share a link. Please also let us know the submission IDs, workflow IDs, and tasks that correspond to the questionable time entries. We''ll investigate as soon as we can.

    Kind regards,

    Jason

     

    0
    Comment actions Permalink
  • Avatar
    Aisling O'Farrell

    Alrighty, that account has been added. Like I mentioned earlier it has controlled access data so it has an auth domain. The workspace is named Aligner Cost Estimator and the workflow submission in question is 5b3f2466-8267-4f0f-a760-2ee72570e01f.

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Aisling,

    I had missed that point about it being controlled-access. Can you provide the workflow ID(s) associated with these strange costs and point to which IDs/tasks correspond to which odd value?

    Jason

    0
    Comment actions Permalink
  • Avatar
    Aisling O'Farrell

    Submission ID: 5b3f2466-8267-4f0f-a760-2ee72570e01f

    78e5da68-7b30-4a77-8be6-b1df9aa8a6a9

    In Pre-Align, a disk is preempted immediately (weird but not a big concern). On the second attempt, also on a preeempt, it is reported that the run was successful and it took 0h and 0m. However, the log indicates it took quite a bit longer.

    2020/11/18 20:34:13 Starting container setup.

    [...]

    2020/11/19 18:37:02 Done delocalization.

    9b37305a-1501-4fd7-b7ff-d308827b74d6

    Similar story to the above except the preempts failed thrice and the reported 0h 0m successful run was on a nonpreempt.

    2020/11/18 15:44:45 Starting container setup.

    [...]

    2020/11/19 15:20:53 Done delocalization.

    8b6e9afc-b222-474f-9c48-b3ca4c19bb4a

    This one is different than the other two in that it seems like all steps are misreported, not just the pre-align step. Here, pre-align is reported as 0h 0m but the log shows:

    2020/11/17 20:38:16 Starting container setup.

    [...]

    2020/11/18 19:57:39 Done delocalization.

    The align step doesn't have a complete log in Terra's UI. But based on the parts that are visible ([main] Real time: 126.273 sec; CPU: 1444.153 sec and the fact it took a few minutes to delocalize) indicates it shouldn't be 0h 0m. However since the final cost of this submission is low, it is possible it was indeed unusually quick.

    The post-align step is also incorrect, but in a different way. Rather than reporting 0h 0m it reports 7h 49m. However, the correct time should be closer to 22 hours.

    2020/11/19 07:08:47 Starting container setup.

    [...]

    2020/11/20 04:51:31 Done delocalization.

    6ab77bf2-2780-4735-bbfb-53b9fb70631a

    This one is similar to the one immediately above -- it seems every estimate, for every task, is underreported.

     

    There might be more but these are the four out of ten that stuck out to me and my team.

     

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Aisling,

    Perfect, thank you so much for this detailed breakdown. One last ask since we can't actually see your workspace—can you send us screenshots for these four cases where you're seeing the runtime showing as a different number than expected?

    We've already validated your first case and are looking to do the rest. Having screenshots to assist will be helpful for when we bring this to our engineers for a deeper investigation.

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Aisling O'Farrell

    78e5da68-7b30-4a77-8be6-b1df9aa8a6a9

    9b37305a-1501-4fd7-b7ff-d308827b74d6

    8b6e9afc-b222-474f-9c48-b3ca4c19bb4a

    6ab77bf2-2780-4735-bbfb-53b9fb70631a

    Also -- all of these are run on full-sized CRAM files from the same TOPMed project, so it seems unlikely this is an artefact of running on unharmonized or downsampled data.

    0
    Comment actions Permalink
  • Avatar
    Samantha (she/her)

    Hi Aisling O'Farrell,

     

    Thanks for reporting this. I've created a bug ticket for our engineering team to further investigate the issue.

     

    Best,

    Samantha

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk