Skip to content

hui61/hadoop-spark-docker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Run Hadoop Spark and Hive within Docker Containers

环境:MacOS Ventura 13.5

机型:MacBook Pro (M1, 2021)

1. Download resource files

Move hadoop-3.3.1-aarch64.tar.gzjdk-8u301-linux-aarch64.tar.gzscala-2.12.14.tgzspark-3.2.1-bin-hadoop3.2.tgz and pyspark-3.4.1.tar.gz to resources folder

2. build Dockerfile
docker build -f Dockerfile -t puppets/hadoop:1.1 .
3. create hadoop network
sudo docker network create --driver=bridge hadoop
4. start container
sudo ./start-container.sh

output:

start hadoop-master container...
start hadoop-slave1 container...
start hadoop-slave2 container...
5. start hadoop
docker exec -it hadoop-master bash
./start-hadoop.sh

因为yarn配置在hadoop-slave2节点,所以还需要去hadoop-slave2启动

docker exec -it hadoop-slave2 bash
./start-hadoop.sh
6. update mysql password
./update-mysql-password.sh
7. run wordcount

在master节点运行任务

./run-wordcount.sh 3.3.1

output

input file1.txt:
Hello Hadoop

input file2.txt:
Hello Docker

wordcount output:
Docker    1
Hadoop    1
Hello    2
8. start hive
schematool -initSchema -dbType mysql
9. WebUI

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published