Need Help?

Search our documentation and community forum

Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Terra powers important scientific projects like FireCloud, AnVIL, and BioData Catalyst. Learn more.

Tuning Apache Spark in Runtimes

Comments

3 comments

  • Avatar
    Jason Cerrato

    Hi Dan Spagnolo,

    Thank you for your inquiry. You can make changes to the spark default configuration by calling this api https://notebooks.firecloud.org/#/runtimes/createRuntime with something like:

     { "runtimeConfig": {
    "cloudService": "dataproc",
    "properties": {
    "spark.executor.cores": "4"
    }
    },
    "label: {
    "saturnAutoCreated: true"
    }
    }

     

    You would first need to authenticate your account, then click Try it out and add the custom properties before finally selecting Execute.

    You can alternatively create a custom runtime by following the directions in this article: Docker tutorial: Custom runtime environments for Jupyter Notebooks

    If you have questions about any of this, please let me know!

    Kind regards,

    Jason

    0
    Comment actions Permalink
  • Avatar
    Dan Spagnolo

    Hi Jason Cerrato . This is still a bit over my head. Not sure how to follow your instructions.

    What do you mean by authenticate your account? I am clicking the "authorize" button but I'm not sure what to do next.

    I understand what would go into the googleProject parameter but how do I figure out the runtimeName that goes into the name parameter? As far as I can see there is no name I can associate with the Notebook runtime.

    Once I modify the request body and execute, will the Terra environment I am running immediatly have the changes I need in the Spark configuration or do I need to do anything else?

    I was looking into creating my own Spark context and using that to initialize hail (rather than let hl.init() do it for me) but your suggestion seems more straightforward.

    0
    Comment actions Permalink
  • Avatar
    Jason Cerrato

    Hi Dan,

    Happy to provide some additional details here.

    The first thing you'll do is click Authorize on the page, check off the three boxes, then press the Authorize button at the bottom.

     

    Next, you'll supply your Terra billing project name as your googleProject, and the name can be anything you want it to be so long as it meets the requirements.

     

    After you add the runtime configuration you want, you will click Execute and you should see that you have an active runtime when you visit a workspace of the billing project you provided after approximately 2-5 minutes.

    The example above is actually slightly in error, so you will want to use this instead:

    {
    "runtimeConfig": {
    "cloudService": "dataproc",
    "properties": {
    "spark:spark.executor.cores": "4"
    }
    },
    "labels": {
    "saturnAutoCreated": "true"
    }
    }

    If you have any questions, please let me know.

    Kind regards,

    Jason

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk