pyspark will pick one version of python from the multiple versions of python installed in the machine. In my case, I have python 3, 2.7 and 2.6 installed in my machine and pyspark was picking python 3 by default. If we have to change the python version used by pyspark, set the following environment variable and run pyspark.

export PYSPARK_PYTHON=python2.6

similarly we can configure any version of python with pyspark. Ensure that python2.6 or whatever you are specifying is available

Advertisement