Copying notebook output to a Google bucket

Liz Kiernan

Data generated in a Jupyter Notebook is inaccessible from outside your virtual Cloud Environment, even for collaborators working in the same workspace! To transfer data generated within a notebook to more permanent, accessible storage, follow steps 1 and 2 (choose Python or R kernel for exact code) below.

You must have permission to upload to/download from a Google bucket!This includes the workspace bucket (you must be a workspace owner or writer).

Step 1. Set environment variables

Setting the environment variables lets the notebook grab variables such as the workspace name and Google bucket directly. This makes cleaner and more flexible notebooks that don't require you to hardcode these variables. Use the syntax below:

  • For a Python kernel notebook, use the following code.

    import os
    bucket = os.environ['WORKSPACE_BUCKET']
  • For a notebook with an R kernel, use the following code.

    bucket <- Sys.getenv('WORKSPACE_BUCKET')

Rerun after pausing the Cloud Environment virtual machine (VM)Note: You need to rerun the code cells that set the environment variables (step 1) after pausing the notebook Cloud Environment. This is because the workspace variables are part of the Cloud Environment (and not the virtual disk associated with the cluster), and they will go away when the Cloud Environment is paused. 

Step 2. Save output files to a bucket with bash commands

Note: The workspace bucket is a Google bucket, so bash commands in the notebooks need to be preceded by "gsutil."

These commands will work only if you run the commands above to set the environment variables. Once you execute these cells, the data files should be visible in the workspace bucket.

To save all generated files after the notebook runs, use the commands below. If you want to copy individual files, you can replace `*` with the file name to copy.

  • # Copy all files in the notebook into the bucket
    !gsutil cp ./* $bucket

    # Run list command to see if file is in the bucket
    !gsutil ls $bucket
  • # Copy all files generated in the notebook into the bucket
    system(paste0("gsutil cp ./* ",bucket),intern=TRUE)

    # Run list command to see if file is in the bucket
    system(paste0("gsutil ls ",bucket),intern=TRUE)

Was this article helpful?

Comments

11 comments

Please sign in to leave a comment.