Copying notebook output to a Google bucket

Liz Kiernan

Data generated by running a Jupyter notebook is inaccessible from outside your virtual Cloud Environment (including by collaborators working in the same workspace!). To transfer data generated within a notebook to more permanent storage, follow steps 1 and 2 ("a" for Python and "b" for R) below.

Note that you will need to rerun the code cells that set the environment variables (step 1) after pausing the notebook Cloud Environment. This is because the workspace variables are part of the Cloud Environment (and not the virtual disk associated with the cluster), and they will go away when the notebook cluster is paused. 

1. Set environment variables

Setting the environment variables lets the notebook grab variables such as the workspace name and Google bucket directly. This makes cleaner and more flexible notebooks that don't require you to hardcode these variables in. Use the syntax below:

Python kernel 

import os

BILLING_PROJECT_ID = os.environ['WORKSPACE_NAMESPACE']
WORKSPACE = os.environ['WORKSPACE_NAME']
bucket = os.environ['WORKSPACE_BUCKET']

R kernel 

project <- Sys.getenv('WORKSPACE_NAMESPACE')
workspace <- Sys.getenv('WORKSPACE_NAME')
bucket <- Sys.getenv('WORKSPACE_BUCKET')


2. Save output files to a bucket with bash commands

Note: the workspace bucket is a Google bucket, so bash commands in the notebooks need to be preceded by "gsutil."

These commands will only work if you have run the commands above to set the environment variables. Once you execute these cells, the data files should be visible in the workspace bucket.

To save all generated files after the notebook runs, use the commands below. If you want to copy individual files, you can replace `*` with the file name to copy.

Python kernel 

# Copy all files in the notebook into the bucket
!gsutil cp ./* $bucket

# Run list command to see if file is in the bucket
!gsutil ls $bucket

R kernel

# Copy all files generated in the notebook into the bucket
system(paste0("gsutil cp ./* ",bucket),intern=TRUE)

# Run list command to see if file is in the bucket
system(paste0("gsutil ls ",bucket),intern=TRUE)

Was this article helpful?

2 out of 2 found this helpful

Have more questions? Submit a request

Comments

11 comments

Please sign in to leave a comment.