Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add instructions to build a docker for GraphStorm-wholegraph on AWS #475

Closed
wants to merge 1 commit into from

Conversation

classicsong
Copy link
Contributor

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

RUN mkdir -p ${SSHDIR}
RUN ssh-keygen -t rsa -f ${SSHDIR}/id_rsa -N ''
RUN cp ${SSHDIR}/id_rsa.pub ${SSHDIR}/authorized_keys

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to modify /root/.ssh/config too.
RUN touch /root/.ssh/config;echo -e "Host *\n StrictHostKeyChecking no\n UserKnownHostsFile=/dev/null\n Port ${SSH_PORT}" > /root/.ssh/config

&& make && make install

ENV PATH "/opt/amazon/efa/bin:$PATH"

Copy link
Contributor

@isratnisa isratnisa Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need install NCCL test (Step 8) to verify the EFA+NCCL setup.

RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libnccl2_2.15.1-1+cuda11.8_amd64.deb
RUN dpkg -i libnccl2_2.15.1-1+cuda11.8_amd64.deb
RUN wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/libnccl-dev_2.15.1-1+cuda11.8_amd64.deb
RUN dpkg -i libnccl-dev_2.15.1-1+cuda11.8_amd64.deb
Copy link
Contributor

@isratnisa isratnisa Sep 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also follow step 6 from the EC2 doc to install NCCL.

ENV dev_type=GPU
# Install DGL GPU version
RUN pip3 install dgl==1.0.4+cu117 -f https://data.dgl.ai/wheels/cu117/repo.html && rm -rf /root/.cache

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to fix python installation. By default docker is using conda.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it has to do with python installation but I had to add:

pip install --no-cache-dir boto3 'h5py>=2.10.0' scipy tqdm 'pyarrow>=3' 'transformers==4.28.1' pandas pylint scikit-learn ogb psutil```

@classicsong classicsong added the draft label only to be used by dev team - skips CI for small changes label Oct 10, 2023
@classicsong classicsong removed the 0.2.1 label Nov 14, 2023
@classicsong classicsong deleted the wholegraph-docker branch February 16, 2024 03:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
draft label only to be used by dev team - skips CI for small changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants