Data transfer between Notebook and the rest of the workspace

Post author
Sehyun Oh

Hi! I'm trying to use notebook and data/tools in workspace interactively and having some issues.

I uploaded my file through Terra (Data > Other Data > Files > notebook (directory) > + (upload button)), and tried to access from notebook, but notebook couldn't find the file. (I could still see the file from `gsutil ls`)

So, I followed the instruction on how to setup environmental variables in notebook (https://broadinstitute.zendesk.com/hc/en-us/articles/360026639112-New-Environmental-Variables-for-Jupyter-Notebooks) and load my file into notebook, but there was no `$WORKSPACE_NAME` or `$WORKSPACE_BUCKET` defined. 

The way worked out at the end was copy the file to where it already is - `gsutil cp` the file using gs url of the file and the working directory, which are practically same (?) and both hard-coded. Even though notebook finally recognizes the file, this seems inefficient and not flexible/scalable... 

Is there any better way to do this? Thanks.

Comments

3 comments

  • Comment author
    Sushma Chaluvadi

    If I understand correctly, you are trying to read in a file that lives within your workspace Google bucket into your Notebook. Here are some steps that I have done to perform the above steps!


    Here is a screenshot example of code that I used to list all the files in my Workspace google bucket and then using gsutil cp to copy one of the files to my home/jupyter-user folder within the VM that exists whenever you start a cluster.

     

    If I am trying to copy from my Data Model, I click on a metadata link > copy the bucket link to the path of the file and repeat the steps to gsutil cp to /home/jupyter-user.




    If at this step you are seeing errors regarding permissions to the bucket, it might be possible that the file that you are trying to access from the bucket is not publicly accessible. It is also possible that the bucket is not publicly accessible. You would need to identify the owner of the bucket and ask them to grant your permissions to the bucket. If the bucket is yours, but is external to the workspace bucket, you will need to add your PROXY group to the bucket permissions.

    This can be done as follows:
    Navigate to the hamburger menu > Go to Profile > and the PROXY Group will be listed in the Profile section.



    The PROXY group should be added to the bucket permissions as follows:



    Please let me know if you have further questions!

    1
  • Comment author
    Sehyun Oh
    • Edited

    Thanks Sushma! I think this is practically what I did too. :) So, now my understanding is that the only way to move data in and out of the notebook is using 'gsutil cp', right?

    0
  • Comment author
    Sushma Chaluvadi

    Yes I do believe that if you want to load files into your notebook from a bucket (workspace associated or external), gsutil is the way to go about doing that.

    0

Please sign in to leave a comment.