Merge pull request #9 from deeprob-org/issue-#3-#6

PR for documentation issues #3 and #6
deeprob-org · Oct 13, 2021 · 99a3079 · 99a3079
2 parents f8a5bb2 + f170bbd
commit 99a3079
Show file tree

Hide file tree

Showing 14 changed files with 169 additions and 82 deletions.
diff --git a/Makefile b/Makefile
@@ -4,23 +4,29 @@ COVERAGE     = coverage
 SETUP_SOURCE = setup.py
 
 COVERAGE_FLAGS = --source deeprob
-UNITTEST_FLAGS = --verbose
+UNITTEST_FLAGS = --verbose --start-directory test
 
-.PHONY: pip_clean
+.PHONY: clean
 
-show_coverage: unit_tests
-	$(COVERAGE) report -m
+# Print Coverage information on stdout
+coverage_cli: unit_tests
+	$(COVERAGE) report
 
+# Run Unit Tests
 unit_tests:
-	$(COVERAGE) run $(COVERAGE_FLAGS) -m $(UNITTEST) $(UNITTEST_FLAGS)
-
-pip_package: $(SETUP_SOURCE)
-	$(PYTHON) $< sdist bdist_wheel
+	$(COVERAGE) run $(COVERAGE_FLAGS) -m $(UNITTEST) discover $(UNITTEST_FLAGS)
 
+# Upload the PIP package
 pip_upload: pip_package
 	$(PYTHON) -m twine upload dist/*
 
-pip_clean:
+# Build the PIP package
+pip_package: $(SETUP_SOURCE)
+	$(PYTHON) $< sdist bdist_wheel
+
+# Clean files and directories
+clean:
+	rm -rf .coverage
 	rm -rf dist
 	rm -rf build
 	rm -rf deeprob_kit.egg-info
diff --git a/README.md b/README.md
@@ -4,40 +4,48 @@
 ![Logo](docs/deeprob-logo.svg)
 
 ## Abstract
-DeeProb-kit is a Python library that implements deep probabilistic models such as various kinds of
-**Sum-Product Networks**, **Normalizing Flows** and their possible combinations for probabilistic inference.
-Some models are implemented using **PyTorch** for fast training and inference on GPUs.
+
+**DeeProb-kit** is a general-purpose Python library that implements several deep probabilistic models,
+putting an effort into unifying and standardizing the experiments across most of them.
+The implementation in a single library of different models permits to easily combine them,
+as a common practice in deep learning research.
+The library consists of a collection of deep probabilistic models such as various kinds of
+*Sum-Product Networks*, *Normalizing Flows* and their possible combinations for density estimation.
+Some models are implemented using *PyTorch* for fast training and inference on GPUs.
 
 ## Features
+
 - Inference algorithms for SPNs. <sup>[1](#r1) [4](#r4)</sup>
-- Learning algorithms for SPNs structure. <sup>[1](#r1) [2](#r2) [3](#r3) [4](#r4)</sup>
-- Chow-Liu Trees (CLT) as SPN leaves. <sup>[11](#r11) [12](#r12)</sup>
-- Batch Expectation-Maximization (EM) for SPNs with arbitrarily leaves. <sup>[13](#r13) [14](#r14)</sup>
+- Learning algorithms for SPNs structure. <sup>[1](#r1) [2](#r2) [3](#r3) [4](#r4) [5](#r5)</sup>
+- Chow-Liu Trees (CLT) as SPN leaves. <sup>[12](#r12) [13](#r13)</sup>
+- Batch Expectation-Maximization (EM) for SPNs with arbitrarily leaves. <sup>[14](#r14) [15](#r15)</sup>
 - Structural marginalization and pruning algorithms for SPNs.
 - High-order moments computation for SPNs.
 - JSON I/O operations for SPNs and CLTs. <sup>[4](#r4)</sup>
 - Plotting operations based on NetworkX for SPNs and CLTs. <sup>[4](#r4)</sup>
-- Randomized And Tensorized SPNs (RAT-SPNs) using PyTorch. <sup>[5](#r5)</sup>
-- Masked Autoregressive Flows (MAFs) using PyTorch. <sup>[6](#r6)</sup>
-- Real Non-Volume-Preserving (RealNVP) and Non-linear Independent Component Estimation (NICE) flows. <sup>[7](#r7) [8](#r8)</sup>
-- Deep Generalized Convolutional SPNs (DGC-SPNs) using PyTorch. <sup>[10](#r10)</sup>
+- Randomized And Tensorized SPNs (RAT-SPNs) using PyTorch. <sup>[6](#r6)</sup>
+- Masked Autoregressive Flows (MAFs) using PyTorch. <sup>[7](#r7)</sup>
+- Real Non-Volume-Preserving (RealNVP) and Non-linear Independent Component Estimation (NICE) flows. <sup>[8](#r8) [9](#r9)</sup>
+- Deep Generalized Convolutional SPNs (DGC-SPNs) using PyTorch. <sup>[11](#r11)</sup>
 
 The collection of implemented models is summarized in the following table.
-The supported data dimensionality for each model is showed in the **Input Dimensionality** column.
-Moreover, the **Supervised** column tells which model is suitable for a supervised learning task,
+The supported data dimensionality for each model is showed in the *Input Dimensionality* column.
+Moreover, the *Supervised* column tells which model is suitable for a supervised learning task,
 other than density estimation task.
 
 | Model      | Description                                        | Input Dimensionality | Supervised |
 |------------|----------------------------------------------------|:--------------------:|:----------:|
 | Binary-CLT | Binary Chow-Liu Tree (CLT)                         |           D          |      ❌     |
 | SPN        | Vanilla Sum-Product Network, using LearnSPN        |           D          |      ✔     |
+| XPC        | Random Probabilistic Circuits, using LearnXPC      |           D          |      ✔     |
 | RAT-SPN    | Randomized and Tensorized Sum-Product Network      |           D          |      ✔     |
 | DGC-SPN    | Deep Generalized Convolutional Sum-Product Network | (1, D, D); (3, D, D) |      ✔     |
 | MAF        | Masked Autoregressive Flow                         |           D          |      ❌     |
 | NICE       | Non-linear Independent Components Estimation Flow  | (1, H, W); (3, H, W) |      ❌     |
 | RealNVP    | Real-valued Non-Volume-Preserving Flow             | (1, H, W); (3, H, W) |      ❌     |
 
-## Installation & Documentation
+## Installation
+
 The library can be installed either from PIP repository or by source code.
 ```shell
 # Install from PIP repository
@@ -47,51 +55,17 @@ pip install deeprob-kit
 # Install from `main` git branch
 pip install -e git+https://github.com/deeprob-org/deeprob-kit.git@main#egg=deeprob-kit
 ```
-The documentation is generated automatically by Sphinx (with Read-the-Docs theme), and it's hosted using GitHub Pages
-at [deeprob-kit]().
-
-## Datasets and Experiments
-A collection of 29 binary datasets, which most of them are used in Probabilistic Circuits literature,
-can be found at [UCLA-StarAI-Binary-Datasets](https://github.com/UCLA-StarAI/Density-Estimation-Datasets).
-
-Moreover, a collection of 5 continuous datasets, commonly present in works regarding Normalizing Flows,
-can be found at [MAF-Continuous-Datasets](https://zenodo.org/record/1161203#.Wmtf_XVl8eN).
-
-After downloading them, the datasets must be stored in the `experiments/datasets` directory to be able to run the experiments
-(and Unit Tests).
-The experiments scripts are available in the `experiments` directory and can be launched using the command line
-by specifying the dataset and hyper-parameters.
-
-## Code Examples
-A collection of code examples can be found in the `examples` directory.
-However, the examples are not intended to produce state-of-the-art results,
-but only to present the library.
-
-The following table contains a description about them and a code complexity ranging from one to three stars.
-The **Complexity** column consists of a measure that roughly represents how many features of the library are used, as well as
-the expected time required to run the script.
-
-|        Example       |                                    Description                                    | Complexity |
-|----------------------|-----------------------------------------------------------------------------------|:----------:|
-| naive_model.py       | Learn, evaluate and print statistics about a naive factorized model.              |      ⭐     |      
-| spn_plot.py          | Instantiate, prune, marginalize and plot some SPNs.                               |      ⭐     |
-| clt_plot.py          | Learn a Binary CLT and plot it.                                                   |      ⭐     |
-| spn_moments.py       | Instantiate and compute moments statistics about the random variables.            |      ⭐     |
-| sklearn_interface.py | Learn and evaluate a SPN using the scikit-learn interface.                        |      ⭐     |
-| spn_custom_leaf.py   | Learn, evaluate and serialize a SPN with a user-defined leaf distribution.        |      ⭐     |
-| clt_to_spn.py        | Learn a Binary CLT, convert it to a structured decomposable SPN and plot it.      |      ⭐     |
-| spn_clt_em.py        | Instantiate a SPN with Binary CLTs, apply EM algorithm and sample some data.      |     ⭐⭐     |
-| clt_queries.py       | Learn a Binary CLT, plot it, run some queries and sample some data.               |     ⭐⭐     |
-| ratspn_mnist.py      | Train and evaluate a RAT-SPN on MNIST.                                            |     ⭐⭐     |
-| dgcspn_olivetti.py   | Train, evaluate and complete some images with DGC-SPN on Olivetti-Faces.          |     ⭐⭐     |
-| dgcspn_mnist.py      | Train and evaluate a DGC-SPN on MNIST.                                            |     ⭐⭐     |
-| nvp1d_moons.py       | Train and evaluate a 1D RealNVP on Moons dataset.                                 |     ⭐⭐     |
-| maf_cifar10.py       | Train and evaluate a MAF on CIFAR10.                                              |     ⭐⭐⭐    |
-| nvp2d_mnist.py       | Train and evaluate a 2D RealNVP on MNIST.                                         |     ⭐⭐⭐    |
-| nvp2d_cifar10.py     | Train and evaluate a 2D RealNVP on CIFAR10.                                       |     ⭐⭐⭐    |
-| spn_latent_mnist.py  | Train and evaluate a SPN on MNIST using the features extracted by an autoencoder. |     ⭐⭐⭐    |
+
+## Project Directories
+
+The documentation is generated automatically by Sphinx using sources stored in the [docs](docs) directory.
+
+A collection of code examples and experiments can be found in the [examples](examples) and [experiments](experiments)
+directories respectively.
+Moreover, benchmark code can be found in the [benchmark](benchmark) directory.
 
 ## Related Repositories
+
 - [SPFlow](https://github.com/SPFlow/SPFlow)
 - [RAT-SPN](https://github.com/cambridge-mlg/RAT-SPN)
 - [Random-PC](https://github.com/gengala/Random-Probabilistic-Circuits)
@@ -109,30 +83,33 @@ the expected time required to run the script.
 
 <b id="r4">4.</b> Molina, Vergari et al. [*SPFLOW : An easy and extensible library for deep probabilistic learning using Sum-Product Networks*][MolinaVergari2019]. CoRR (2019).
 
-<b id="r5">5.</b> Peharz et al. [*Probabilistic Deep Learning using Random Sum-Product Networks*][Peharz2020a]. UAI (2020).
+<b id="r5">5.</b> Di Mauro et al. [*Sum-Product Network structure learning by efficient product nodes discovery*][DiMauro2018]. AIxIA (2018).
+
+<b id="r6">6.</b> Peharz et al. [*Probabilistic Deep Learning using Random Sum-Product Networks*][Peharz2020a]. UAI (2020).
 
-<b id="r6">6.</b> Papamakarios et al. [*Masked Autoregressive Flow for Density Estimation*][Papamakarios2017]. NeurIPS (2017).
+<b id="r7">7.</b> Papamakarios et al. [*Masked Autoregressive Flow for Density Estimation*][Papamakarios2017]. NeurIPS (2017).
 
-<b id="r7">7.</b> Dinh et al. [*Density Estimation using RealNVP*][Dinh2017]. ICLR (2017).
+<b id="r8">8.</b> Dinh et al. [*Density Estimation using RealNVP*][Dinh2017]. ICLR (2017).
 
-<b id="r8">8.</b> Dinh et al. [*NICE: Non-linear Independent Components Estimation*][Dinh2015]. ICLR (2015).
+<b id="r9">9.</b> Dinh et al. [*NICE: Non-linear Independent Components Estimation*][Dinh2015]. ICLR (2015).
 
-<b id="r9">9.</b> Papamakarios, Nalisnick et al. [*Normalizing Flows for Probabilistic Modeling and Inference*][PapamakariosNalisnick2021]. JMLR (2021).
+<b id="r10">10.</b> Papamakarios, Nalisnick et al. [*Normalizing Flows for Probabilistic Modeling and Inference*][PapamakariosNalisnick2021]. JMLR (2021).
 
-<b id="r10">10.</b> Van de Wolfshaar and Pronobis. [*Deep Generalized Convolutional Sum-Product Networks for Probabilistic Image Representations*][VanWolfshaarPronobis2020]. PGM (2020).
+<b id="r11">11.</b> Van de Wolfshaar and Pronobis. [*Deep Generalized Convolutional Sum-Product Networks for Probabilistic Image Representations*][VanWolfshaarPronobis2020]. PGM (2020).
 
-<b id="r11">11.</b> Rahman et al. [*Cutset Networks: A Simple, Tractable, and Scalable Approach for Improving the Accuracy of Chow-Liu Trees*][Rahman2014]. ECML-PKDD (2014).
+<b id="r12">12.</b> Rahman et al. [*Cutset Networks: A Simple, Tractable, and Scalable Approach for Improving the Accuracy of Chow-Liu Trees*][Rahman2014]. ECML-PKDD (2014).
 
-<b id="r12">12.</b> Di Mauro, Gala et al. [*Random Probabilistic Circuits*][DiMauroGala2021]. UAI (2021).
+<b id="r13">13.</b> Di Mauro, Gala et al. [*Random Probabilistic Circuits*][DiMauroGala2021]. UAI (2021).
 
-<b id="r13">13.</b> Desana and Schnörr. [*Learning Arbitrary Sum-Product Network Leaves with Expectation-Maximization*][DesanaSchnörr2016]. CoRR (2016).
+<b id="r14">14.</b> Desana and Schnörr. [*Learning Arbitrary Sum-Product Network Leaves with Expectation-Maximization*][DesanaSchnörr2016]. CoRR (2016).
 
-<b id="r14">14.</b> Peharz et al. [*Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits*][Peharz2020b]. ICML (2020).
+<b id="r15">15.</b> Peharz et al. [*Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits*][Peharz2020b]. ICML (2020).
 
 [Peharz2015]: http://proceedings.mlr.press/v38/peharz15.pdf
 [PoonDomingos2011]: https://arxiv.org/pdf/1202.3732.pdf
 [MolinaVergari2018]: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/16865/16619
 [MolinaVergari2019]: https://arxiv.org/pdf/1901.03704.pdf
+[DiMauro2018]: http://www.di.uniba.it/~ndm/pubs/dimauro18ia.pdf
 [Peharz2020a]: http://proceedings.mlr.press/v115/peharz20a/peharz20a.pdf
 [Papamakarios2017]: https://proceedings.neurips.cc/paper/2017/file/6c1da886822c67822bcf3679d04369fa-Paper.pdf
 [Dinh2017]: https://arxiv.org/pdf/1605.08803v3.pdf

diff --git a/benchmark/README.md b/benchmark/README.md
@@ -0,0 +1,15 @@
+## Benchmark
+
+The `benchmark` directory contains benchmark scripts of models and algorithms implemented in the library.
+All the scripts can be launched by command line.
+Every script will print out an estimation of the time required to run an algorithm (expressed in milliseconds).
+
+The following table contains a description about the benchmark scripts and the external libraries used for comparison.
+Please install the packages in `requirements.txt` to be able to run the scripts.
+
+|       Benchmark      |                                    Description                               |  Compared Libraries |
+|----------------------|------------------------------------------------------------------------------|:-------------------:|
+| clt_queries.py       | Benchmark on Binary Chow-Liu Trees (CLTs): learning, inference and sampling. | [*SPFlow*][SPFlow]  |      
+| spn_queries.py       | Benchmark on Sum-Product Networks (SPNs): inference and sampling.            | [*SPFlow*][SPFlow]  |
+
+[SPFlow]: https://github.com/SPFlow/SPFlow 
diff --git a/docs/README.md b/docs/README.md
@@ -0,0 +1,19 @@
+## Documentation
+
+The documentation is generated automatically by Sphinx, using sources stored in the `docs` directory
+(with a slightly modified [*Read-the-Docs*](https://readthedocs.org/) theme).
+Sooner or later we will make it available also online, probably hosted using GitHub pages. 
+
+If you wish to build the documentation yourself, you will need to install the dependencies listed in `requirements.txt`
+and execute the Makefile script as following:
+```bash
+# Clean existing documentation (optional)
+make clean
+
+# Build source code API documentation
+make sphinx_api
+
+# Build HTML documentation
+make sphinx_html
+```
+The output HTML documentation can be found inside `_build/html` directory.
diff --git a/docs/conf.py b/docs/conf.py
@@ -4,9 +4,9 @@
 
 # -- Project information -----------------------------------------------------
 project = 'DeeProb-kit'
-author = 'Lorenzo Loconte'
+author = 'Lorenzo Loconte, Gennaro Gala'
 copyright = '2021, {}'.format(author)
-release = version = '0.6.4'
+release = version = '1.0.0'
 
 # -- General configuration ---------------------------------------------------
 extensions = [
@@ -18,7 +18,7 @@
     'sphinx_rtd_theme'
 ]
 source_suffix = ['.rst', '.md']
-exclude_patterns = ['api/modules.rst']
+exclude_patterns = ['api/modules.rst', 'README.md']
 
 # -- Options for HTML output -------------------------------------------------
 html_static_path = ['_static']

diff --git a/docs/home.md b/docs/home.md
diff --git a/docs/index.rst b/docs/index.rst
@@ -5,7 +5,11 @@ A Python library for Deep Probabilistic Modeling
 	:maxdepth: 1
 	:caption: Read Me
 
-	home.md
+	markdown/home.md
+	markdown/docs.md
+	markdown/examples.md
+	markdown/experiments.md
+	markdown/benchmark.md
 
 .. toctree::
 	:maxdepth: 5

diff --git a/docs/markdown/benchmark.md b/docs/markdown/benchmark.md
@@ -0,0 +1,2 @@
+```{include} ../../benchmark/README.md
+```
diff --git a/docs/markdown/docs.md b/docs/markdown/docs.md
@@ -0,0 +1,2 @@
+```{include} ../../docs/README.md
+```
diff --git a/docs/markdown/examples.md b/docs/markdown/examples.md
@@ -0,0 +1,2 @@
+```{include} ../../examples/README.md
+```
diff --git a/docs/markdown/experiments.md b/docs/markdown/experiments.md
@@ -0,0 +1,2 @@
+```{include} ../../experiments/README.md
+```
diff --git a/docs/markdown/home.md b/docs/markdown/home.md
@@ -0,0 +1,3 @@
+```{include} ../../README.md
+:relative-images:
+```
diff --git a/examples/README.md b/examples/README.md
@@ -0,0 +1,31 @@
+## Code Examples
+
+A collection of code examples can be found in the `examples` directory.
+In order to run the code examples, it is necessary clone the repository.
+However, additional datasets are not required.
+Note that the given examples are not intended to produce state-of-the-art results,
+but only to present the library.
+
+The following table contains a description about them and a code complexity ranging from one to three stars.
+The *Complexity* column consists of a measure that roughly represents how many features of the library are used, as well as
+the expected time required to run the script.
+
+|        Example       |                                    Description                                    | Complexity |
+|----------------------|-----------------------------------------------------------------------------------|:----------:|
+| naive_model.py       | Learn, evaluate and print statistics about a naive factorized model.              |      ⭐     |      
+| spn_plot.py          | Instantiate, prune, marginalize and plot some SPNs.                               |      ⭐     |
+| clt_plot.py          | Learn a Binary CLT and plot it.                                                   |      ⭐     |
+| spn_moments.py       | Instantiate and compute moments statistics about the random variables.            |      ⭐     |
+| sklearn_interface.py | Learn and evaluate a SPN using the scikit-learn interface.                        |      ⭐     |
+| spn_custom_leaf.py   | Learn, evaluate and serialize a SPN with a user-defined leaf distribution.        |      ⭐     |
+| clt_to_spn.py        | Learn a Binary CLT, convert it to a structured decomposable SPN and plot it.      |      ⭐     |
+| spn_clt_em.py        | Instantiate a SPN with Binary CLTs, apply EM algorithm and sample some data.      |     ⭐⭐     |
+| clt_queries.py       | Learn a Binary CLT, plot it, run some queries and sample some data.               |     ⭐⭐     |
+| ratspn_mnist.py      | Train and evaluate a RAT-SPN on MNIST.                                            |     ⭐⭐     |
+| dgcspn_olivetti.py   | Train, evaluate and complete some images with DGC-SPN on Olivetti-Faces.          |     ⭐⭐     |
+| dgcspn_mnist.py      | Train and evaluate a DGC-SPN on MNIST.                                            |     ⭐⭐     |
+| nvp1d_moons.py       | Train and evaluate a 1D RealNVP on Moons dataset.                                 |     ⭐⭐     |
+| maf_cifar10.py       | Train and evaluate a MAF on CIFAR10.                                              |     ⭐⭐⭐    |
+| nvp2d_mnist.py       | Train and evaluate a 2D RealNVP on MNIST.                                         |     ⭐⭐⭐    |
+| nvp2d_cifar10.py     | Train and evaluate a 2D RealNVP on CIFAR10.                                       |     ⭐⭐⭐    |
+| spn_latent_mnist.py  | Train and evaluate a SPN on MNIST using the features extracted by an autoencoder. |     ⭐⭐⭐    |