Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GSProcessing] Add documentation #467

Merged
merged 3 commits into from
Sep 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 20 additions & 0 deletions graphstorm-processing/docs/Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# Minimal makefile for Sphinx documentation
#

# You can set these variables from the command line, and also
# from the environment for the first two.
SPHINXOPTS ?=
SPHINXBUILD ?= sphinx-build
SOURCEDIR = source
BUILDDIR = build

# Put it first so that "make" without argument is like "make help".
help:
@$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)

.PHONY: help Makefile

# Catch-all target: route all unknown targets to Sphinx using the new
# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS).
%: Makefile
@$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O)
35 changes: 35 additions & 0 deletions graphstorm-processing/docs/make.bat
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
@ECHO OFF

pushd %~dp0

REM Command file for Sphinx documentation

if "%SPHINXBUILD%" == "" (
set SPHINXBUILD=sphinx-build
)
set SOURCEDIR=source
set BUILDDIR=build

if "%1" == "" goto help

%SPHINXBUILD% >NUL 2>NUL
if errorlevel 9009 (
echo.
echo.The 'sphinx-build' command was not found. Make sure you have Sphinx
echo.installed, then set the SPHINXBUILD environment variable to point
echo.to the full path of the 'sphinx-build' executable. Alternatively you
echo.may add the Sphinx directory to PATH.
echo.
echo.If you don't have Sphinx installed, grab it from
echo.http://sphinx-doc.org/
exit /b 1
)

%SPHINXBUILD% -M %1 %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%
goto end

:help
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
53 changes: 53 additions & 0 deletions graphstorm-processing/docs/source/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
# pylint: skip-file
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))


# -- Project information -----------------------------------------------------

project = 'graphstorm-processing'
copyright = '2023, AGML Team'
author = 'AGML Team, Amazon'


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
]

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = []


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
230 changes: 230 additions & 0 deletions graphstorm-processing/docs/source/developer/developer-guide.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
Developer Guide
---------------

The project is set up using ``poetry`` to make easier for developers to
jump into the project.

The steps we recommend are:

Install JDK 8, 11
~~~~~~~~~~~~~~~~~

PySpark requires a compatible Java installation to run, so
you will need to ensure your active JDK is using either
Java 8 or 11.
thvasilo marked this conversation as resolved.
Show resolved Hide resolved

On MacOS you can do this using ``brew``:
thvasilo marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: bash

brew install openjdk@11

On Linux it will depend on your distribution's package
manager. For Ubuntu you can use:

.. code-block:: bash

sudo apt install openjdk-11-jdk

On Amazon Linux 2 you can use:

.. code-block:: bash

sudo yum install java-11-amazon-corretto-headless
sudo yum install java-11-amazon-corretto-devel

Install ``pyenv``
~~~~~~~~~~~~~

``pyenv`` is a tool to manage multiple Python version installations. It
can be installed through the installer below on a Linux machine:

.. code-block:: bash

curl -L https://github.com/pyenv/pyenv-installer/raw/master/bin/pyenv-installer | bash

or use ``brew`` on a Mac:
thvasilo marked this conversation as resolved.
Show resolved Hide resolved

.. code-block:: bash

brew update
brew install pyenv

For more info on ``pyenv`` see `its documentation. <https://github.com/pyenv/pyenv>`

Create a Python 3.9 env and activate it.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We use Python 3.9 in our images so this most closely resembles the
execution environment on our Docker images that will be used for distributed
training.

.. code-block:: bash

pyenv install 3.9
pyenv global 3.9

..

Note: We recommend not mixing up ``conda`` and ``pyenv``. When developing for
this project, simply ``conda deactivate`` until there's no ``conda``
env active (even ``base``) and just rely on ``pyenv`` and ``poetry`` to handle
dependencies.

Install ``poetry``
~~~~~~~~~~~~~~

``poetry`` is a dependency and build management system for Python. To install it
use:

.. code-block:: bash

curl -sSL https://install.python-poetry.org | python3 -

Install dependencies through ``poetry``
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now we are ready to install our dependencies through ``poetry``.

We have split the project dependencies into the “main” dependencies that
``poetry`` installs by default, and the ``dev`` dependency group that
installs that dependencies that are only needed to develop the library.

**On a POSIX system** (tested on Ubuntu, CentOS, MacOS) run:

.. code-block:: bash

# Install all dependencies into local .venv
poetry install --with dev
thvasilo marked this conversation as resolved.
Show resolved Hide resolved

Once all dependencies are installed you should be able to run the unit
tests for the project and continue with development using:

.. code-block:: bash

poetry run pytest ./graphstorm-processing/tests

You can also activate and use the virtual environment using:

.. code-block:: bash

poetry shell
# We're now using the graphstorm-processing-py3.9 env so we can just run
pytest ./graphstorm-processing/tests

To learn more about ``poetry`` see its `documentation <https://python-poetry.org/docs/basic-usage/>`_

Use ``black`` to format code [optional]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We use `black <https://black.readthedocs.io/en/stable/index.html>`_ to
format code in this project. ``black`` is an opinionated formatter that
helps speed up development and code reviews. It is included in our
``dev`` dependencies so it will be installed along with the other dev
dependencies.

To use ``black`` in the project you can run (from the project's root,
same level as ``pyproject.toml``)

.. code-block:: bash

# From the project's root directory, graphstorm-processing run:
black .

To get a preview of the changes ``black`` would make you can use:

.. code-block:: bash

black . --diff --color

You can auto-formatting with ``black`` to VSCode using the `Black
Formatter <https://marketplace.visualstudio.com/items?itemName=ms-python.black-formatter>`__


Use mypy and pylint to lint code
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

We include the ``mypy`` and ``pylint`` linters as a dependency under the ``dev`` group
of dependencies. These linters perform static checks on your code and
can be used in a complimentary manner.

We recommend `using VSCode and enabling the mypy linter <https://code.visualstudio.com/docs/python/linting#_general-settings>`_
to get in-editor annotations.

You can also lint the project code through:

.. code-block:: bash

poetry run mypy ./graphstorm_processing

To learn more about ``mypy`` and how it can help development
`see its documentation <https://mypy.readthedocs.io/en/stable/>`_.


Our goal is to minimize ``mypy`` errors as much as possible for the
project. New code should be linted and not introduce additional mypy
errors. When necessary it's OK to use ``type: ignore`` to silence
``mypy`` errors inline, but this should be used sparingly.

As a project, GraphStorm requires a 10/10 pylint score, so
ensure your code conforms to the expectation by running

.. code-block:: bash

pylint --rcfile=/path/to/graphstorm/tests/lint/pylintrc

on your code before commits. To make this easier we include
a pre-commit hook below.

Use a pre-commit hook to ensure ``black`` and ``pylint`` runs before commits
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

To make code formatting and ``pylint`` checks easier for graphstorm-processing
developers, we recommend using a pre-commit hook.

We include ``pre-commit`` in the project's ``dev`` dependencies, so once
you have activated the project's venv (``poetry shell``) you can just
create a file named ``.pre-commit-config.yaml`` with the following contents:

.. code-block:: yaml

# .pre-commit-config.yaml
repos:
- repo: https://github.com/psf/black
rev: 23.7.0
hooks:
- id: black
language_version: python3.9
files: 'graphstorm_processing\/.*\.pyi?$|tests\/.*\.pyi?$|scripts\/.*\.pyi?$'
exclude: 'python\/.*\.pyi'
- repo: local
hooks:
- id: pylint
name: pylint
entry: pylint
language: system
types: [python]
args:
[
"--rcfile=./tests/lint/pylintrc"
]


And then run:

.. code-block:: bash

pre-commit install

which will install the ``black`` and ``pylin`` hooks into your local repository and
ensure it runs before every commit.

.. note::

The pre-commit hook will also apply to all commits you make to the root
GraphStorm repository. Since that Graphstorm doesn't use ``black``, you might
want to remove the hooks. You can do so from the root repo
using ``rm -rf .git/hooks``.

Both projects use ``pylint`` to check Python files so we'd still recommend using
that hook even if you're doing development for both GSProcessing and GraphStorm.
Loading