Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Doc] Add Doc String for graphstorm.dataloading.dataloading #471

Closed
wants to merge 26 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
41b36cf
add doc string for dataloaders
GentleZhu Sep 24, 2023
44aced7
Fix the logging (#472)
zheng-da Sep 24, 2023
4405ec4
Enable hdf5 embed writing (#463)
HouyuZhang1007 Sep 25, 2023
732194d
precision_recall need set return_proba=True (#451)
CongWeilin Sep 25, 2023
3f7b81a
Support load conf for min/max transformation. (#473)
classicsong Sep 25, 2023
72bbe9a
[Bug Fix] Disable storing embeddings when inferring on ec/er task (#477)
jalencato Sep 25, 2023
603d481
[GSProcessing] Improve handling of configs with unknown version. (#480)
thvasilo Sep 25, 2023
776ad95
Updated MAG example with GLEM instructions (#474)
wangz10 Sep 26, 2023
c7606a6
Fix a bug in restoring models in the inference script (#482)
zheng-da Sep 26, 2023
a0d7e8b
[Doc] Reorg API Documentation part-1 (#460)
zhjwy9343 Sep 26, 2023
1905464
Enable nccl for embedding save (#484)
classicsong Sep 27, 2023
3228b97
Use zfill to make sure the saved embeddings or prediction results are…
classicsong Sep 27, 2023
1a26d12
[Bug Fix] Change on EC test (#491)
jalencato Sep 27, 2023
dbf92e7
fix comments
GentleZhu Sep 27, 2023
7bce306
[BugFix] Handle the case when user provides test_mask but do not want…
classicsong Sep 27, 2023
5a4cb98
Fix the problem of caching twice (#493)
zheng-da Sep 27, 2023
b19bdcf
[GSProcessing] Name reverse edges as `dst:relation-rev:src` (#490)
thvasilo Sep 27, 2023
58262c6
Fix lint does not work with astroid==3.0 error. (#496)
classicsong Sep 27, 2023
74c99ec
[Wholegraph] Build docker image for WholeGraph-GraphStorm (#485)
isratnisa Sep 27, 2023
4c825d0
[Bug Fix] Add test to EC (#497)
jalencato Sep 27, 2023
87aa152
[GSProcessing] Add GSProcessing documentation (#467)
thvasilo Sep 27, 2023
06f5c96
[Doc] Add Doc String for graphstorm.dataloading.dataset (#470)
GentleZhu Sep 27, 2023
e66cfcd
add doc string for dataloaders
GentleZhu Sep 24, 2023
09b7b97
fix comments
GentleZhu Sep 27, 2023
0e95964
Merge branch 'dataloader-doc' of https://github.com/awslabs/graphstor…
GentleZhu Sep 28, 2023
a31f359
add line before and after code::
GentleZhu Sep 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .github/workflow_scripts/lint_check.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ cd ../../
set -ex

python3 -m pip install --upgrade prospector pip
pip3 uninstall -y astroid
yes | pip3 install astroid==2.15.7
FORCE_CUDA=1 python3 -m pip install -e '.[test]' --no-build-isolation
pylint --rcfile=./tests/lint/pylintrc ./python/graphstorm/data/*.py
pylint --rcfile=./tests/lint/pylintrc ./python/graphstorm/dataloading/
Expand Down
40 changes: 40 additions & 0 deletions docker/build_docker_wholegraph.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
#!/bin/bash

# process argument 1: graphstorm home folder
if [ -z "$1" ]; then
echo "Please provide the graphstorm home folder that the graphstorm codes are cloned to."
echo "For example, ./build_docker_wholegraph.sh /graph-storm/"
exit 1
else
GSF_HOME="$1"
fi

# process argument 2: docker image name, default is graphstorm
if [ -z "$2" ]; then
IMAGE_NAME="graphstorm-wholegraph"
else
IMAGE_NAME="$2"
fi

# process argument 3: image's tag name, default is local
if [ -z "$3" ]; then
TAG="local"
else
TAG="$3"
fi

# Copy scripts and tools codes to the docker folder
mkdir -p $GSF_HOME"/docker/code"
cp -r $GSF_HOME"/python" $GSF_HOME"/docker/code/python"
cp -r $GSF_HOME"/inference_scripts" $GSF_HOME"/docker/code/inference_scripts"
cp -r $GSF_HOME"/tools" $GSF_HOME"/docker/code/tools"
cp -r $GSF_HOME"/training_scripts" $GSF_HOME"/docker/code/training_scripts"

# Build OSS docker for EC2 instances that an pull ECR docker images
DOCKER_FULLNAME="${IMAGE_NAME}:${TAG}"

echo "Build a local docker image ${DOCKER_FULLNAME}"
docker build --no-cache -f $GSF_HOME"/docker/wholegraph/Dockerfile" . -t $DOCKER_FULLNAME

# remove the temporary code folder
rm -rf $GSF_HOME"/docker/code"
60 changes: 60 additions & 0 deletions docker/wholegraph/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
FROM nvcr.io/nvidia/dgl:23.07-py3

#################################################
## Install EFA installer
ARG EFA_INSTALLER_VERSION=latest
RUN cd $HOME \
&& curl -O https://efa-installer.amazonaws.com/aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz \
&& tar -xf $HOME/aws-efa-installer-${EFA_INSTALLER_VERSION}.tar.gz \
&& cd aws-efa-installer \
&& apt-get update \
&& apt-get install -y libhwloc-dev \
&& ./efa_installer.sh -y -g -d --skip-kmod --skip-limit-conf --no-verify \
&& rm -rf /var/lib/apt/lists/*

###################################################
## Install AWS-OFI-NCCL plugin
ARG AWS_OFI_NCCL_VERSION=v1.7.1-aws
RUN git clone https://github.com/aws/aws-ofi-nccl.git /opt/aws-ofi-nccl \
&& cd /opt/aws-ofi-nccl \
&& git checkout ${AWS_OFI_NCCL_VERSION} \
&& ./autogen.sh \
&& ./configure --prefix=/opt/aws-ofi-nccl/ \
--with-libfabric=/opt/amazon/efa/ \
--with-cuda=/usr/local/cuda \
&& make && make install

ENV PATH "/opt/amazon/efa/bin:$PATH"

# Install WholeGraph
COPY wholegraph/install_wholegraph.sh install_wholegraph.sh
RUN bash install_wholegraph.sh

# Install GraphStorm
RUN pip install --no-cache-dir boto3 'h5py>=2.10.0' scipy tqdm 'pyarrow>=3' 'transformers==4.28.1' pandas pylint scikit-learn ogb psutil
RUN git clone https://github.com/awslabs/graphstorm

# Increase nofile limit
RUN echo "root soft nofile 1048576" >> /etc/security/limits.conf \
&& echo "root hard nofile 1048576" >> /etc/security/limits.conf

# Make EFA NCCL plugin the default plugin
RUN sed -i '/nccl_rdma_sharp_plugin/d' /etc/ld.so.conf.d/hpcx.conf \
&& echo "/opt/aws-ofi-nccl/lib" >> /etc/ld.so.conf.d/hpcx.conf \
&& ldconfig

# Set up SSH
RUN apt-get update && apt-get install -y openssh-client openssh-server && rm -rf /var/lib/apt/lists/*
ENV SSH_PORT=2222
RUN cat /etc/ssh/sshd_config > /tmp/sshd_config && \
sed "0,/^#Port 22/s//Port ${SSH_PORT}/" /tmp/sshd_config > /etc/ssh/sshd_config
ENV HOME=/root
ENV SSHDIR $HOME/.ssh
RUN mkdir -p ${SSHDIR}
RUN ssh-keygen -t rsa -f ${SSHDIR}/id_rsa -N ''
RUN cp ${SSHDIR}/id_rsa.pub ${SSHDIR}/authorized_keys
RUN touch /root/.ssh/config;echo -e "Host *\n StrictHostKeyChecking no\n UserKnownHostsFile=/dev/null\n Port ${SSH_PORT}" > /root/.ssh/config
EXPOSE 2222
RUN mkdir /run/sshd

CMD ["/usr/sbin/sshd", "-D"]
25 changes: 25 additions & 0 deletions docker/wholegraph/install_wholegraph.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
#!/bin/bash
git clone https://github.com/fmtlib/fmt.git /opt/fmt
cd /opt/fmt
git checkout 9.1.0
mkdir build && cd build
cmake -DCMAKE_POSITION_INDEPENDENT_CODE=TRUE ..
make
make install

git clone https://github.com/gabime/spdlog.git /opt/spdlog
cd /opt/spdlog && mkdir build && cd build
cmake .. && make -j
cp libspdlog.a /usr/lib/libspdlog.a
export PYTHON=/usr/bin/python

cd /opt/rapids/
git clone https://github.com/rapidsai/wholegraph.git -b branch-23.08
cd /opt/rapids/wholegraph/
pip install scikit-build
export WHOLEGRAPH_CMAKE_CUDA_ARCHITECTURES="70-real;80-real;90"
# fix a bug in CMakeList.txt when build pylibwholegraph
old="import sysconfig; print(sysconfig.get_config_var('BINLIBDEST'))"
string="import sysconfig; print(\"%s/%s\" % (sysconfig.get_config_var(\"LIBDIR\"), sysconfig.get_config_var(\"INSTSONAME\")))"
sed -i "s|$old|$string|" /opt/rapids/wholegraph/python/pylibwholegraph/CMakeLists.txt
bash build.sh libwholegraph pylibwholegraph -v
10 changes: 10 additions & 0 deletions docs/source/_templates/dataloadertemplate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}


{{ name | underline}}

.. autoclass:: {{ name }}
:show-inheritance:
:special-members: __iter__, __next__
10 changes: 10 additions & 0 deletions docs/source/_templates/datasettemplate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}


{{ name | underline}}

.. autoclass:: {{ name }}
:show-inheritance:
:members: prepare_data, get_node_feats, get_edge_feats, get_labels
11 changes: 11 additions & 0 deletions docs/source/_templates/inferencetemplate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}


{{ name | underline}}

.. autoclass:: {{ name }}
:show-inheritance:
:members: setup_device, setup_evaluator, evaluator, device, infer

10 changes: 10 additions & 0 deletions docs/source/_templates/modeltemplate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}


{{ name | underline}}

.. autoclass:: {{ name }}
:show-inheritance:
:members: forward, save_model, restore_model, predict, create_optimizer
12 changes: 12 additions & 0 deletions docs/source/_templates/trainertemplate.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
.. role:: hidden
:class: hidden-section
.. currentmodule:: {{ module }}


{{ name | underline}}

.. autoclass:: {{ name }}
:show-inheritance:
:members: setup_device, setup_evaluator, save_model, remove_saved_model, save_topk_models,
get_best_model_path, restore_model, evaluator, optimizer, device, fit, eval

62 changes: 0 additions & 62 deletions docs/source/api/graphstorm.customized.rst

This file was deleted.

27 changes: 23 additions & 4 deletions docs/source/api/graphstorm.dataloading.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,29 +3,48 @@
graphstorm.dataloading
==========================

GraphStorm dataloading module includes a set of graph datasets and dataloaders for different
GraphStorm dataloading module includes a set of graph DataSets and DataLoaders for different
graph machine learning tasks.

If users would like to customize DataLoaders, please extend those classes in the
:ref:`Base DataLoaders <basedataloaders>` section and customize their abstract methods.

.. currentmodule:: graphstorm.dataloading

.. _basedataloaders:

Base DataLoaders
-------------------

.. autosummary::
:toctree: ../generated/
:nosignatures:
:template: dataloadertemplate.rst

GSgnnNodeDataLoaderBase
GSgnnEdgeDataLoaderBase
GSgnnLinkPredictionDataLoaderBase

DataSets
------------

.. autosummary::
:toctree: ../generated/
:nosignatures:
:template: classtemplate.rst
:template: datasettemplate.rst

GSgnnNodeTrainData
GSgnnNodeInferData
GSgnnEdgeTrainData
GSgnnEdgeInferData

Dataloaders
DataLoaders
------------

.. autosummary::
:toctree: ../generated/
:nosignatures:
:template: classtemplate.rst
:template: dataloadertemplate.rst

GSgnnNodeDataLoader
GSgnnEdgeDataLoader
Expand Down
38 changes: 38 additions & 0 deletions docs/source/api/graphstorm.eval.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
.. _apieval:

graphstorm.eval
=======================

GraphStorm provides built-in evaluation methods for different Graph Machine
Learning (GML) tasks.

If users want to implement customized evaluators or evaluation methods, a best practice is to
extend base evaluators, i.e., the ``GSgnnInstanceEvaluator`` class for node or edge prediction
tasks, and ``GSgnnLPEvaluator`` for link prediction tasks, and then implement the abstract methods.

.. currentmodule:: graphstorm.eval

Base Evaluators
----------------

.. autosummary::
:toctree: ../generated/
:nosignatures:
:template: evaltemplate.rst

GSgnnInstanceEvaluator
GSgnnLPEvaluator

Evaluators
-----------

.. autosummary::
:toctree: ../generated/
:nosignatures:
:template: evaltemplate.rst

GSgnnLPEvaluator
GSgnnMrrLPEvaluator
GSgnnPerEtypeMrrLPEvaluator
GSgnnAccEvaluator
GSgnnRegressionEvaluator
20 changes: 0 additions & 20 deletions docs/source/api/graphstorm.evaluator.rst

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
.. _apiinferrer:
.. _apiinference:

graphstorm.inferrer
graphstorm.inference
====================

GraphStorm inferrers assemble the distributed inference pipeline for different tasks.
Expand All @@ -13,7 +13,7 @@ graphstorm.inferrer
.. autosummary::
:toctree: ../generated/
:nosignatures:
:template: classtemplate.rst
:template: inferencetemplate.rst

GSgnnLinkPredictionInferrer
GSgnnNodePredictionInferrer
Expand Down
Loading