Copying notebook output to a Google bucket

Liz Kiernan

Data generated in a Jupyter Notebook is inaccessible from outside your virtual laptop in the cloud (integrated development environment - IDE - in the cloud), even for collaborators working in the same workspace! To transfer data generated within a notebook to more permanent, accessible storage, follow steps 1 and 2 (choose Python or R kernel for exact code) below.

You must have permission to upload to/download from a Google Bucket!This includes the workspace bucket (you must be a workspace owner or writer).

Step 1. Set environment variables

Setting the environment variables lets the notebook grab variables such as the workspace name and Google Bucket ID directly. This makes cleaner and more flexible notebooks that don't require you to hardcode these variables. Use the syntax below.

  • For a Python kernel notebook, use the following code.

    import os
    bucket = os.environ['WORKSPACE_BUCKET']
  • For a notebook with an R kernel, use the following code.

    bucket <- Sys.getenv('WORKSPACE_BUCKET')

Rerun after pausing the laptop in the cloud virtual machine (VM)Note: If you pause the laptop in the cloud running your notebook (the Cloud Environment), you need to rerun the code cells that set the environment variables (step 1). This is because the workspace variables are part of the Cloud Environment (and not the virtual disk associated with the cluster), and they will go away when the laptop in the cloud VM is paused. 

Step 2. Save output files to workspace storage with bash commands

Note: The workspace cloud storage is a Google Bucket, so bash commands in the notebooks need to be preceded by gcloud storage.

What to expect

These commands will work only if you run the commands above to set the environment variables. Once you execute these cells, the data files should be visible in the workspace bucket.

To save all generated files after the notebook runs, use the commands below. To copy individual files, replace * with the file name to copy.

  • # Copy all files in the notebook into the bucket
    !gcloud storage cp ./* $bucket

    # Run list command to see if file is in the bucket
    !gcloud storage ls $bucket
  • # Copy all files generated in the notebook into the bucket
    system(paste0("gcloud storage cp ./* ",bucket),intern=TRUE)

    # Run list command to see if file is in the bucket
    system(paste0("gcloud storage ls ",bucket),intern=TRUE)

Was this article helpful?

2 out of 2 found this helpful

Comments

11 comments

Please sign in to leave a comment.