Python version mismatch in notebook cluster
Hi-
I have written a jupyter notebook for a demo workspace in Terra, utilizing Hail. This notebook runs without error locally but fails in Terra. Below is the script being used and the error output:
from firecloud import fiss
import pandas as pd
import os
import io
import numpy as np
import hail as hl
BILLING_PROJECT_ID = os.environ['GOOGLE_PROJECT']
samples = pd.read_csv(io.StringIO(fiss.fapi.get_entities_tsv(BILLING_PROJECT_ID, WORKSPACE, "sample").text), sep='\t')
samples = (
hl
.Table
.from_pandas(
samples,
key = 'sample'
)
)
Hail version: 0.2.11-daed180b84d8
Error summary: PythonException: Traceback (most recent call last):
File "/usr/lib/spark/python/pyspark/worker.py", line 124, in main
("%d.%d" % sys.version_info[:2], version))
Exception: Python in worker has different version 2.7 than that in driver 3.6, PySpark cannot run with different minor versions.Please check environment variables PYSPARK_PYTHON and PYSPARK_DRIVER_PYTHON are correctly set.
I don't know what to make of this error. Any input would be appreciated!
Also, this workspace is owned by dsp-comms-dev and has no sensitive data. Access can be granted if need be.
Thanks!
-Tim
Comments
2 comments
Hi Tim -
This might be resolved by setting environmental variables when you start up the notebook runtime.
The script has to be in a public bucket location.
You could go to your FW bucket and make that one file public through the google console.
I hope this helps.
By the way, check out our contest in the link underneath our signature! I think your lab might have interest in this!
Adelaide
Do you have a treasured workflow or favorite notebook? Enter it in the Terra Open Science Contest for a chance to win a trip to BOSC 2019!
Changing the kernel from Python3 to PySpark3 fixed this issue.
Please sign in to leave a comment.