Skip to content

Base Docker image with just essentials: Hadoop, Hive, Spark, Flink

License

Notifications You must be signed in to change notification settings

lschampion/hadoop-hive-hbase-spark-docker

 
 

Repository files navigation

Big data playground: Hadoop + Hive + HBase + Spark

Base Docker image with just essentials: Hadoop, Hive ,HBase and Spark.

Software

Usage

Take a look at this repo to see how I use it as a part of a Docker Compose cluster.

User and password in alpine: 123456

SSH auto configed in hadoop cluster

HBase auto start after zookeeper and HDFS namenode started

Hive JDBC port is exposed to host:

  • URI: jdbc:hive2://localhost:10000
  • Driver: org.apache.hive.jdbc.HiveDriver (org.apache.hive:hive-jdbc:3.1.2)
  • User and password: unused.

Scripts

build_env_base_image.sh helps build env image, which contain environment variables and component version information.

build_app_base_image.sh help build app image, which would participate in docker-compose.

use both above build scripts like

sh build_env/app_base_image.sh your_version

rm_none_images.sh helps removing <none> tag of images in developing. use docker image ps to checkout which image is created.

tar-source-files/file_list.txt show which local package may use. refering it when build your image

get_hadoop_container_id.sh help when you want to find the running hadoop container IDs.

/scripts/ssh_auto_configer of directory contains SSH configing scripts:

auto_ssh.sh  # single script to auto config ssh with several params
eliminate_pub_key.sh  
id_rsa_gen.sh  
ping_test.sh  
sshd_restart.start # sshd auto run for alpine,maybe not run.   
ssh_service.sh # Aggregated script to auto config ssh

./scripts/format_hdfs.sh could resolve clusterID inconsistent error, which caused by formating HDFS without correct operation.

config file stored in .conf directory,make sure that chechout this directory before build image.

Special Instructions:

hbase startup need zookeeper cluster and, it is configed in hbase-site.xml . default as follow:

  <property>
    <!-- 指定 zk 的地址,多个用“,”分割,注意集群模式的HBase 不能在此处制定zookeeper的端口号 -->
    <name>hbase.zookeeper.quorum</name>
    <value>zoo1,zoo2,zoo3</value>
  </property>

Version compatibility notes

Maintaining

TODO

  • Upgrade spark to 3.0
  • When upgraded, enable Spark-Hive integration.

About

Base Docker image with just essentials: Hadoop, Hive, Spark, Flink

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Shell 89.4%
  • C 10.6%