Need Help?

Search our documentation and community forum

Terra is a cloud-native platform for biomedical researchers to access data, run analysis tools, and collaborate.
Terra powers important scientific projects like FireCloud, AnVIL, and BioData Catalyst. Learn more.

Running concurrent notebooks from a single Workspace under separate tmux sessions

Comments

12 comments

  • Avatar
    Anika Das

    Hi Konstantin, 

    Thank you for writing in about this issue! It seems like the runs are failing because tmux isn't yet fully supported in Terra cloud environments. I'd be happy to file tmux support as a feature request with the appropriate engineering team and follow up with you if that support gets built. 

    Please let us know if there is anything else we can help you with!

    Kind Regards, 
    Anika 

    1
    Comment actions Permalink
  • Avatar
    Konstantin Bobkov

    Thank you very much for the quick response, Anika! Do you think that using nohup instead of tmux would work?

    0
    Comment actions Permalink
  • Avatar
    Anika Das

    Hi Konstantin, 

    I'm not certain about the functionality of nohup in Terra, but I will look into it and let you know what I find!

    Kind Regards,

    Anika

    1
    Comment actions Permalink
  • Avatar
    Anika Das

    Hi Konstantin, 

    Based on this article, https://www.maketecheasier.com/nohup-and-uses/ , it sounds like nohup is a command that is used to keep something running in the terminal even if the user logs out; the user can run process they want, but the terminal needs to be open in the browser while they’re using it. 

    You mentioned you are launching your notebooks from the terminal. Is there a reason you are doing this, rather than launching them from the Notebooks tab?

    Kind Regards, 

    Anika

    0
    Comment actions Permalink
  • Avatar
    Konstantin Bobkov

    Hi Anika,

    Thank you very much for looking into this. I'm running from a terminal because I need to analyze thousands of input files and have concurrent runs on different sets of data. In the terminal, the notebook output gets redirected into the STDOUT and does not clog up the browser screen. I'm currently using tmux for running a single notebook in the background. Being able to run in the background from tmux is not the issue, the issue arises when I launch the second notebook from another tmux session in the same terminal. That's when I get messages like:

    2021-06-16 15:56:00 YarnScheduler: WARN: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
    2021-06-16 15:56:15 YarnScheduler: WARN: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

    So, the question is, does nohup provide for the possibility to run multiple notebooks from the same terminal without clashes for available resources? Does my question make sense?

    Thank you again!

    Best regards,

    Konstantin

    0
    Comment actions Permalink
  • Avatar
    Anika Das
    Hi Konstantin, 
     
    We had our engineers take a look, and they gave it a try themselves.
    They tried running multiple basic notebooks in the terminal concurrently via tmux/nohup and it seemed to work. They suspect your issue has more to do with concurrently running Hail. Even if you open 2 notebooks in separate tabs and do:
    import hail as hl
    hl.init()
    it crashes because each notebook kernel tries to spin up a JVM SparkContext which is sized based on the VM's total available memory.
    Is there some way to structure your analysis so only 1 Hail context is started at a time but other things run concurrently in the background?
    Note: it's possible to run bash scripts in the background from notebook cells like:
    %%script bash --bg --out script_outsleep 1000
    echo hi!
    Hope that helps! Please let us know if you have any further questions!
     
    Kind Regards,
    Anika
    1
    Comment actions Permalink
  • Avatar
    Konstantin Bobkov

    Hi Anika,

    Bingo!

    I can confirm that I can run two notebooks concurrently via tmux as long as only one of them is using hail. Since our last exchange, we have split the analysis such that hail is only used by one of the notebooks. It would be great if one could run hail concurrently from two notebooks, I don't know if this is possible in principle.

    Thank you very much again for looking into this!

    Best regards,

    Konstantin

    0
    Comment actions Permalink
  • Avatar
    Anika Das

    Hi Konstantin, 

    Yay! We're so glad that worked for you! According to our engineers, running hail concurrently should be possible in principle -- the issue is that each time hl.init() runs, it tries to allocate a SparkContext, and there aren't enough resources on the VM to run 2 SparkContexts concurrently. It may be possible to manually instantiate a SparkContext with fewer resources and do hl.init(sc=mySparkContext)

    This hasn't been tested though :)

    Please let us know if you have further questions!

    Kind Regards, 

    Anika

    1
    Comment actions Permalink
  • Avatar
    Konstantin Bobkov

    Hi Anika,

    Thank you very much for the suggestion! I'll definitely look into this and see if it works.

    Best regards,

    Konstantin

    0
    Comment actions Permalink
  • Avatar
    Kumar Thurimella

    Dear Konstantin and Anika,

    With nohup or tmux am I able to run the environment and leave cells in my notebook running even if I close the actual browser window? If I wanted to run something overnight, I would like to do it through my notebook while still being able to close the lid down. I am essentially following these steps here:

     

    https://stackoverflow.com/questions/47331050/how-to-run-jupyter-notebook-in-the-background-no-need-to-keep-one-terminal-for

     

    Kumar

    0
    Comment actions Permalink
  • Avatar
    Samantha (she/her)

    Hi Kumar Thurimella,

    Unfortunately, closing the browser or your laptop will stop the VM. If you are using a notebook/RStudio/terminal, or other interactive analysis application where you create a cloud environment, you need to keep the session active (hence, interactive analysis).

    It sounds like a better option for you would be to run a batch job in Cromwell within Terra. You would just need to represent the script as a WDL. If you're new to running workflows on Terra, I would suggest taking a look at our support documentation to help you get started.

    Please let me know if you have any questions.

    Best,
    Samantha

    0
    Comment actions Permalink
  • Avatar
    Samantha (she/her)

    One thing to add - if you are okay with keeping your browser/lid open, you can actually change the autopause time so the VM doesn't stop overnight. See Adjusting autopause for Cloud Environments using Swagger for more information.

    0
    Comment actions Permalink

Please sign in to leave a comment.

Powered by Zendesk