Analyzing data from a workspace bucket in a notebook

Allie Hajian
  • Updated

The virtual machine running a Jupyter notebook has its own storage, separate from the workspace bucket. To analyze data in your workspace bucket in a notebook, you will need to import the data to the virtual disk. Below are the two steps, and exact code for both Python and R notebook kernels, for copying from the workspace bucket to the notebook disk.

1. Set environment variables in a Jupyter notebook

Setting the environment variables lets the notebook grab variables such as the workspace name and Google bucket directly. This makes cleaner and more flexible notebooks that don't require you to hardcode these variables in. Use the syntax below, exactly as it's written.

Python kernel

import os

BILLING_PROJECT_ID = os.environ['WORKSPACE_NAMESPACE']
WORKSPACE = os.environ['WORKSPACE_NAME']
bucket = os.environ['WORKSPACE_BUCKET']

R kernel

project <- Sys.getenv('WORKSPACE_NAMESPACE')
workspace <- Sys.getenv('WORKSPACE_NAME')
bucket <- Sys.getenv('WORKSPACE_BUCKET')

2. Copy files in a workspace bucket to a notebook with bash commands

Note: the workspace bucket is a Google bucket, so basic bash commands in the notebooks need to be preceded by !gsutil. Once you execute these cells, the data files should be visible in the workspace bucket.

These commands will only work if you have run the commands above to set the environment variables. 

To save all generated files after the notebook runs, use the commands below. If you want to copy individual files, you can replace `*` with the file name to copy.

You can also add the -R flag to recursively copy everything in the file tree from the point you specify (example: gsutil cp -R gs://bucket-name/bucket_directory/).

Python kernel

# Copy all files from the workspace bucket to the notebook disk
!gsutil cp $bucket/* .

# Run list command to see if file is in the notebook disk
!ls

R kernel

# Copy all files from the workspace bucket to the notebook disk
system(paste0("gsutil cp ", bucket, "/* ."),intern=TRUE)

# Run list command to see if file is in the bucket
system(paste0("gsutil ls ",bucket),intern=TRUE)

Was this article helpful?

3 out of 3 found this helpful

Have more questions? Submit a request

Comments

0 comments

Please sign in to leave a comment.