Need Help?

Search our documentation and community forum

Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Terra powers important scientific projects like FireCloud, AnVIL, and BioData Catalyst. Learn more.

best practices for running large data sets

Comments

3 comments

  • Avatar
    Jason Cerrato

    Hi Andrew,

    Thank you for writing in. We'll take a look at your questions and get back to you as soon as we can!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Andy,

    Here are some answers for your questions:

    1. Running large submissions would not have any impact for other Terra users when it comes to Google Cloud resources, as those are defined by your billing project. Large submissions might have a minor impact for job queueing time, but our system is designed to handle submissions so that they don't hold up other submissions for long, if at all.
    2. So long as you haven't changed your method configuration for your workflow, you will see a "Relaunch Failures" button on your job history page which will allow you to automatically kick off a resubmission of failed jobs!
    3. Our development team is aware of the need for a notification system for workflow submissions. Unfortunately, we do not yet have one in place. I'm happy to follow up on this thread if I hear that it's been built!
    4. Yes that's the ideal scenario! I would say you can expect close to linear assuming nothing goes wrong: no preemptible failures, no data access issues, no resource configuration problems. Of course, the real world is hardly so smooth—you'll likely see some variation based on how your machine utilizes its memory, how often it fails due to preemption, etc. The big thing may be to ensure that your workflow doesn't have any spots where you can get caught in an infinite loop under certain conditions, as this could result in a huge bill. If the workflow is set to fail gracefully at the appropriate times, all data access is set up in advance, the data is similar in size and form across your workflows, and you've done your tests to get a good sense of what to expect, you should be in good form.

    Here is some general guidance about scaling you may want to read through before launching your bigger submissions: https://support.terra.bio/hc/en-us/articles/360059028911-Scaling-your-workflow-submissions

    If you have any questions, please let us know!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Andrew Davidson

    Thanks Jason

     

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk