You can install java 8 jdk
or java 8 oracle
NOTE: It must install java on all nodes
Download jdk-8u211-linux-x64.tar.gz
from this link: https://download.oracle.com/otn/java/jdk/8u211-b12/478a62b7d4e34b78b671c754eaaf38ab/jdk-8u211-linux-x64.tar.gz (it requires login for download)
tar -xvf jdk-8u211-linux-x64.tar.gz
sudo mkdir -p /usr/lib/jvm
sudo mv ./jdk1.8.0_211 /usr/lib/jvm/
sudo update-alternatives --install "/usr/bin/java" "java" "/usr/lib/jvm/jdk1.8.0_211/bin/java" 1
sudo update-alternatives --install "/usr/bin/javac" "javac" "/usr/lib/jvm/jdk1.8.0_211/bin/javac" 1
sudo update-alternatives --install "/usr/bin/javaws" "javaws" "/usr/lib/jvm/jdk1.8.0_211/bin/javaws" 1
sudo chmod a+x /usr/bin/java
sudo chmod a+x /usr/bin/javac
sudo chmod a+x /usr/bin/javaws
sudo chown -R root:root /usr/lib/jvm/jdk1.8.0_211
# export PATH=$PATH:/usr/lib/jvm/jdk1.8.0_211/bin/
sudo update-alternatives --config java
java -version
# sudo apt install openjdk-8-jre-headless
# https://stackoverflow.com/questions/50064646/py4j-protocol-py4jjavaerror-occurred-while-calling-zorg-apache-spark-api-python/50098044
sudo apt-get install software_properties_common
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install openjdk-11-jdk
java -version
NOTE: this step must be done on all nodes
rm -rf /opt/spark
wget https://archive.apache.org/dist/spark/spark-2.2.0/spark-2.2.0-bin-hadoop2.7.tgz
tar xvf spark-2.2.0-bin-hadoop2.7.tgz
sudo mv spark-2.2.0-bin-hadoop2.7 /opt/spark
vim ~/.bashrc
# add below lines
export SPARK_HOME=/opt/spark
export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin
YOUR_IP=167.71.204.181 # this is your external ip
SPARK_LOCAL_IP=${YOUR_IP} SPARK_MASTER_HOST=${YOUR_IP} start-master.sh
SPARK_LOCAL_IP=${YOUR_IP} start-slave.sh spark://${YOUR_IP}:7077
# check master is on
ss -tunelp | grep 8080
# check slave is on
ss -tunelp | grep 8081
To stop master and slave, run:
stop-slave.sh
stop-mastwer.sh
For example:
master: 192.168.205.10
slave1: 192.168.205.11
slave2: 192.168.205.12
On master, edit hosts file:
sudo vim /etc/hosts
# add below lines
192.168.205.10 master
192.168.205.11 slave1
192.168.205.12 slave2
# reboot if need to apply new change
We need to add ssh public key of master
node to all nodes (inlude master)
On all nodes
sudo apt-get install openssh-server openssh-client
On master
ssh-keygen -t rsa -P ""
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
Copy master public key to all slave nodes
ssh-copy-id user@pd-master
ssh-copy-id user@pd-slave1
ssh-copy-id user@pd-slave2
Check
ssh master
ssh slave01
ssh slave02
On master, config spark env
cd /usr/local/spark/conf
cp spark-env.sh.template spark-env.sh
sudo vim spark-env.sh
# add below lines
export SPARK_MASTER_HOST='<MASTER-IP>'
export JAVA_HOME=<Path_of_JAVA_installation>
# example in this instruction
export SPARK_MASTER_HOST=192.168.205.10
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_211
On master, add worker
cd /usr/local/spark/conf
sudo vim slaves
# add below, same values you have declared on /etc/hosts
master
slave01
slave02
On master
Start spark:
cd /usr/local/spark
./sbin/start-all.sh
Stop spark:
./sbin/stop-all.sh
On master :
jps
Access web:
http://:8080/