Python version mismatch in notebook cluster

June 05, 2019 19:21
2 comments

Hi-

I have written a jupyter notebook for a demo workspace in Terra, utilizing Hail. This notebook runs without error locally but fails in Terra. Below is the script being used and the error output:

from firecloud import fiss
import pandas as pd
import os
import io
import numpy as np
import hail as hl

BILLING_PROJECT_ID = os.environ['GOOGLE_PROJECT']
samples = pd.read_csv(io.StringIO(fiss.fapi.get_entities_tsv(BILLING_PROJECT_ID, WORKSPACE, "sample").text), sep='\t')

samples = (
   hl
   .Table
   .from_pandas(
      samples, 
      key = 'sample'
   )
)

Hail version: 0.2.11-daed180b84d8
Error summary: PythonException: Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/worker.py", line 124, in main
("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.

I don't know what to make of this error. Any input would be appreciated!

Also, this workspace is owned by dsp-comms-dev and has no sensitive data. Access can be granted if need be.

Thanks!

-Tim

Comments

2 comments

Adelaide Rhodes
- June 06, 2019 13:11
Hi Tim -

This might be resolved by setting environmental variables when you start up the notebook runtime.

The script has to be in a public bucket location.

You could go to your FW bucket and make that one file public through the google console.

I hope this helps.

By the way, check out our contest in the link underneath our signature! I think your lab might have interest in this!

Adelaide

Do you have a treasured workflow or favorite notebook? Enter it in the Terra Open Science Contest for a chance to win a trip to BOSC 2019!

0
tmajarian
- June 06, 2019 13:29
Changing the kernel from Python3 to PySpark3 fixed this issue.

0

Please sign in to leave a comment.