Skip to content

Commit

Permalink
[GSProcessing] Add GSProcessing documentation (awslabs#467)
Browse files Browse the repository at this point in the history
*Issue #, if available:*

*Description of changes:*

* Add documentation for GSProcessing

By submitting this pull request, I confirm that you can use, modify,
copy, and redistribute this contribution, under the terms of your
choice.

---------

Co-authored-by: xiang song(charlie.song) <[email protected]>
  • Loading branch information
thvasilo and classicsong committed Sep 28, 2023
1 parent f001ef4 commit afe8d26
Show file tree
Hide file tree
Showing 9 changed files with 1,480 additions and 0 deletions.
20 changes: 20 additions & 0 deletions graphstorm-processing/docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions graphstorm-processing/docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
53 changes: 53 additions & 0 deletions graphstorm-processing/docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# pylint: skip-file
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))


# -- Project information -----------------------------------------------------

project = 'graphstorm-processing'
copyright = '2023, AGML Team'
author = 'AGML Team, Amazon'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
230 changes: 230 additions & 0 deletions graphstorm-processing/docs/source/developer/developer-guide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
Developer Guide
---------------

The project is set up using ``poetry`` to make easier for developers to
jump into the project.

The steps we recommend are:

Install JDK 8, 11
~~~~~~~~~~~~~~~~~

PySpark requires a compatible Java installation to run, so
you will need to ensure your active JDK is using either
Java 8 or 11.

On MacOS you can do this using ``brew``:

.. code-block:: bash
brew install openjdk@11
On Linux it will depend on your distribution's package
manager. For Ubuntu you can use:

.. code-block:: bash
sudo apt install openjdk-11-jdk
On Amazon Linux 2 you can use:

.. code-block:: bash
sudo yum install java-11-amazon-corretto-headless
sudo yum install java-11-amazon-corretto-devel
Install ``pyenv``
~~~~~~~~~~~~~

``pyenv`` is a tool to manage multiple Python version installations. It
can be installed through the installer below on a Linux machine:

.. code-block:: bash
curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash
or use ``brew`` on a Mac:

.. code-block:: bash
brew update
brew install pyenv
For more info on ``pyenv`` see `its documentation. <https://github.com/pyenv/pyenv>`

Create a Python 3.9 env and activate it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We use Python 3.9 in our images so this most closely resembles the
execution environment on our Docker images that will be used for distributed
training.

.. code-block:: bash
pyenv install 3.9
pyenv global 3.9
..
Note: We recommend not mixing up ``conda`` and ``pyenv``. When developing for
this project, simply ``conda deactivate`` until there's no ``conda``
env active (even ``base``) and just rely on ``pyenv`` and ``poetry`` to handle
dependencies.

Install ``poetry``
~~~~~~~~~~~~~~

``poetry`` is a dependency and build management system for Python. To install it
use:

.. code-block:: bash
curl -sSL https://install.python-poetry.org | python3 -
Install dependencies through ``poetry``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now we are ready to install our dependencies through ``poetry``.

We have split the project dependencies into the “main” dependencies that
``poetry`` installs by default, and the ``dev`` dependency group that
installs that dependencies that are only needed to develop the library.

**On a POSIX system** (tested on Ubuntu, CentOS, MacOS) run:

.. code-block:: bash
# Install all dependencies into local .venv
poetry install --with dev
Once all dependencies are installed you should be able to run the unit
tests for the project and continue with development using:

.. code-block:: bash
poetry run pytest ./graphstorm-processing/tests
You can also activate and use the virtual environment using:

.. code-block:: bash
poetry shell
# We're now using the graphstorm-processing-py3.9 env so we can just run
pytest ./graphstorm-processing/tests
To learn more about ``poetry`` see its `documentation <https://python-poetry.org/docs/basic-usage/>`_

Use ``black`` to format code [optional]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We use `black <https://black.readthedocs.io/en/stable/index.html>`_ to
format code in this project. ``black`` is an opinionated formatter that
helps speed up development and code reviews. It is included in our
``dev`` dependencies so it will be installed along with the other dev
dependencies.

To use ``black`` in the project you can run (from the project's root,
same level as ``pyproject.toml``)

.. code-block:: bash
# From the project's root directory, graphstorm-processing run:
black .
To get a preview of the changes ``black`` would make you can use:

.. code-block:: bash
black . --diff --color
You can auto-formatting with ``black`` to VSCode using the `Black
Formatter <https://marketplace.visualstudio.com/items?itemName=ms-python.black-formatter>`__


Use mypy and pylint to lint code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We include the ``mypy`` and ``pylint`` linters as a dependency under the ``dev`` group
of dependencies. These linters perform static checks on your code and
can be used in a complimentary manner.

We recommend `using VSCode and enabling the mypy linter <https://code.visualstudio.com/docs/python/linting#_general-settings>`_
to get in-editor annotations.

You can also lint the project code through:

.. code-block:: bash
poetry run mypy ./graphstorm_processing
To learn more about ``mypy`` and how it can help development
`see its documentation <https://mypy.readthedocs.io/en/stable/>`_.


Our goal is to minimize ``mypy`` errors as much as possible for the
project. New code should be linted and not introduce additional mypy
errors. When necessary it's OK to use ``type: ignore`` to silence
``mypy`` errors inline, but this should be used sparingly.

As a project, GraphStorm requires a 10/10 pylint score, so
ensure your code conforms to the expectation by running

.. code-block:: bash
pylint --rcfile=/path/to/graphstorm/tests/lint/pylintrc
on your code before commits. To make this easier we include
a pre-commit hook below.

Use a pre-commit hook to ensure ``black`` and ``pylint`` runs before commits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To make code formatting and ``pylint`` checks easier for graphstorm-processing
developers, we recommend using a pre-commit hook.

We include ``pre-commit`` in the project's ``dev`` dependencies, so once
you have activated the project's venv (``poetry shell``) you can just
create a file named ``.pre-commit-config.yaml`` with the following contents:

.. code-block:: yaml
# .pre-commit-config.yaml
repos:
- repo: https://github.com/psf/black
rev: 23.7.0
hooks:
- id: black
language_version: python3.9
files: 'graphstorm_processing\/.*\.pyi?$|tests\/.*\.pyi?$|scripts\/.*\.pyi?$'
exclude: 'python\/.*\.pyi'
- repo: local
hooks:
- id: pylint
name: pylint
entry: pylint
language: system
types: [python]
args:
[
"--rcfile=./tests/lint/pylintrc"
]
And then run:

.. code-block:: bash
pre-commit install
which will install the ``black`` and ``pylin`` hooks into your local repository and
ensure it runs before every commit.

.. note::

The pre-commit hook will also apply to all commits you make to the root
GraphStorm repository. Since that Graphstorm doesn't use ``black``, you might
want to remove the hooks. You can do so from the root repo
using ``rm -rf .git/hooks``.

Both projects use ``pylint`` to check Python files so we'd still recommend using
that hook even if you're doing development for both GSProcessing and GraphStorm.
Loading

0 comments on commit afe8d26

Please sign in to comment.