Skip to content

Latest commit

 

History

History
337 lines (243 loc) · 10.1 KB

README.md

File metadata and controls

337 lines (243 loc) · 10.1 KB

Mini-submarine

This is a docker image built for submarine development and quick start test.

Please Note: don't use the image in production environment. It's only for test purpose.

Start mini-submarine

Use the image we provide

docker pull apache/submarine:mini-0.3.0

Create image by yourself

You may need a VPN if your network is limited

1.Clone the source code of Submarine

git clone https://github.com/apache/submarine.git

2.Build Submarine

cd ./submarine
mvn clean install package -DskipTests

3.Build image of mini-submarine

You can download in advance of these three kind of compressed file for building : zookeeper-3.4.14.tar.gz , hadoop-2.9.2.tar.gz , spark-2.4.4-bin-hadoop2.7.tgz and put them into "submarine/dev-support/mini-submarine/"

cd submarine/dev-support/mini-submarine/
./build_mini-submarine.sh

Package An Existing Release Candidates

When doing release, the release manager might needs to package a artifact candidates in this docker image and public the image candidate for a vote. In this scenario, we can do this:

Put submarine candidate aritifacts to a folder like "~/releases/submarine-release"

$ ls $release_candidates_path
submarine-dist-0.3.0-hadoop-2.9.tar.gz        submarine-dist-0.3.0-src.tar.gz.asc
submarine-dist-0.3.0-hadoop-2.9.tar.gz.asc    submarine-dist-0.3.0-src.tar.gz.sha512
submarine-dist-0.3.0-hadoop-2.9.tar.gz.sha512 submarine-dist-0.3.0-src.tar.gz
export submarine_version=0.3.0
export release_candidates_path=~/releases/submarine-release 
./build_mini-submarine.sh
#docker run -it -h submarine-dev --net=bridge --privileged -P local/mini-submarine:0.3.0 /bin/bash
docker tag local/mini-submarine:0.3.0 apache/mini-submarine:0.3.0:RC0
docker push apache/mini-submarine:0.3.0:RC0

In the container, we can verify that the submarine jar version is the expected 0.3.0. Then we can upload this image with a "RC" tag for a vote.

Run mini-submarine image

docker run -it -h submarine-dev --name mini-submarine --net=bridge --privileged -P local/mini-submarine:0.4.0-SNAPSHOT /bin/bash

# In the container, use root user to bootstrap hdfs and yarn
/tmp/hadoop-config/bootstrap.sh

# Two commands to check if yarn and hdfs is running as expected
yarn node -list -showDetails

If you pull the image directly, please replace "local/mini-submarine:0.4.0-SNAPSHOT" with "apache/submarine:mini-0.3.0".

You should see info like this:

Total Nodes:1
         Node-Id      Node-State	Node-Http-Address	Number-of-Running-Containers
submarine-dev:35949         RUNNING	submarine-dev:8042                            0
Detailed Node Information :
  Configured Resources : <memory:8192, vCores:16, nvidia.com/gpu: 1>
  Allocated Resources : <memory:0, vCores:0>
  Resource Utilization by Node : PMem:4144 MB, VMem:4189 MB, VCores:0.25308025
  Resource Utilization by Containers : PMem:0 MB, VMem:0 MB, VCores:0.0
  Node-Labels :
hdfs dfs -ls /user

drwxr-xr-x - yarn supergroup 0 2019-07-22 07:59 /user/yarn

Run workbench server

  1. Setup mysql mariadb server

Because mysql and mariadb use the GPL license, So there is no binary file containing mysql in the image, you need to manually execute the script to install it.

/tmp/hadoop-config/setup-mysql.sh

You can execute command mysql -uroot login mysql mariadb.

  1. Start submarine server
su yarn
/opt/submarine-current/bin/submarine-daemon.sh start getMysqlJar
  1. Login submarine workbench

Execute the following command in your host machine, Get the access URL of the submarine workbench running in docker

echo "http://localhost:$(docker inspect --format='{{(index (index .NetworkSettings.Ports "8080/tcp") 0).HostPort}}' mini-submarine)"

The URL returned by the command (like to: http://localhost:32819) is opened through a browser. The username and initial password of the workbench are both admin.

Run a sumbarine job

Switch to user yarn

su yarn

Navigate to submarine example directory

cd /home/yarn/submarine/

Run a mnist TF job with submarine + TonY runtime

# run TF 1 distributed training job 
./run_submarine_mnist_tony.sh

# run TF 2 distributed training job
./run_submarine_mnist_tf2_tony.sh

When run_submarine_mnist_tony.sh is executed, mnist data is download from the url, google mnist, by default. If the url is unaccessible, you can use parameter "-d" to specify a customized url. For example, if you are in mainland China, you can use the following command

./run_submarine_mnist_tony.sh -d http://yann.lecun.com/exdb/mnist/

Run a mnist TF job via submarine server

Submarine server is supposed to manage jobs lifecycle. Clients can just submit job parameters or yaml file to submarine server instead of submitting jobs directly by themselves. Submarine server can handle the rest of the work.

Set submarine.server.rpc.enabled to true in the file of /opt/submarine-current/conf/submarine-site

  <property>
    <name>submarine.server.rpc.enabled</name>
    <value>true</value>
    <description>Run jobs using rpc server.</description>
  </property>

Run the following command to submit a job via submarine server

./run_submarine_mnist_tony_rpc.sh

Try your own submarine program

Run container with your source code. You can also use "docker cp" to an existing running container

  1. docker run -it -h submarine-dev --net=bridge --privileged -v pathToMyScrit.py:/home/yarn/submarine/myScript.py local/hadoop-docker:submarine /bin/bash

  2. Refer to the run_submarine_mnist_tony.sh and modify the script to your script

  3. Try to run it. Since this is a single node environment, keep in mind that the workers could have conflicts with each other. For instance, the mnist_distributed.py example has a workaround to fix the conflicts when two workers are using same "data_dir" to download data set.

Update Submarine Version

You can follow the documentation instructions to update your own modified and compiled submarine package to the submarine container.

Build Submarine

cd submarine-project-dir/
mvn clean install package -DskipTests

Copy submarine jar to mini-submarine container

docker cp submarine-all/target/submarine-all-<SUBMARINE_VERSION>-hadoop-<HADOOP_VERSION>.jar <container-id>:/tmp/

Modify environment variables

cd /home/yarn/submarine
vi run_customized_submarine-all_mnist.sh

# Need to modify environment variables based on hadoop and submarine version numbers
SUBMARINE_VERSION=<submarine-version-number>
HADOOP_VERSION=<hadoop-version-number> # default 2.9

Test submarine jar package in container

cd /home/yarn/submarine
./run_customized_submarine-all_mnist.sh

Debug Submarine

When using mini-submarine, you can debug submarine client, applicationMaster and executor for trouble shooting.

Debug submarine client

Run the following command to start mini-submarine.

docker run -it -P -h submarine-dev --net=bridge --expose=8000 --privileged local/mini-submarine:0.4.0-SNAPSHOT /bin/bash

Debug submarine client with the parameter "--debug"

./run_submarine_mnist_tony.sh --debug

Port 8000 is used in the mini-submarine. You need to find the debug port mapping between mini-subamrine and the host on which run mini-subamrine.

docker port <SUBMARINE_CONTAINER_ID>

For example, we can get some info like this

8000/tcp -> 0.0.0.0:32804

Then port 32804 can be used for remote debug.

Debug submarine job applicationMaster

Run the following command to start mini-submarine.

docker run -it -P -h submarine-dev --net=bridge --expose=8001 --privileged local/mini-submarine:0.4.0-SNAPSHOT /bin/bash

Add the following configuration in the file /usr/local/hadoop/etc/hadoop/tony.xml.

<property>
  <name>tony.task.am.jvm.opts</name>
  <value>-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8001</value>
</property>

You can use run_submarine_mnist_tony.sh to submit a job. Port 8001 is used for AM debugging in mini-submarine. And the debug port mapping can be obtained using the way as Debug submarine client shows.

Debug submarine job executor

Run the following command to start mini-submarine.

docker run -it -P -h submarine-dev --net=bridge --expose=8002 --privileged local/mini-submarine:0.4.0-SNAPSHOT /bin/bash

Add the following configuration in the file /usr/local/hadoop/etc/hadoop/tony.xml.

<property>
  <name>tony.task.executor.jvm.opts</name>
  <value>-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=8002</value>
</property>

Port 8002 is used for executor debugging in mini-submarine. To avoid port confliction, you need to use only one executor, which means the parameter of submarine job should be like this

--num_workers 1 \
--num_ps 0 \

You can get the debug port mapping using the way as Debug submarine client shows.

Run a distributedShell job with docker container

You can also run a distributedShell job in mini-submarine.

cd && ./yarn-ds-docker.sh

Run a spark job

Spark jobs are supported as well.

cd && cd spark-script && ./run_spark.sh

Question and answer

  1. Submarine package name error

    Because the package name of submarine 0.3.0 or higher has been changed from apache.hadoop.yarn.submarine to apache.submarine, So you need to set the Runtime settings in the /usr/local/hadoop/etc/hadoop/submarine-site.xml file.

    <configuration>
       <property>
         <name>submarine.runtime.class</name>
         <value>org.apache.submarine.server.submitter.yarn.YarnRuntimeFactory</value>
       </property>
    </configuration>