Skip to content

Commit

Permalink
Merge pull request #9 from deeprob-org/issue-#3-#6
Browse files Browse the repository at this point in the history
PR for documentation issues #3 and #6
  • Loading branch information
gengala authored Oct 13, 2021
2 parents f8a5bb2 + f170bbd commit 99a3079
Show file tree
Hide file tree
Showing 14 changed files with 169 additions and 82 deletions.
24 changes: 15 additions & 9 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,23 +4,29 @@ COVERAGE = coverage
SETUP_SOURCE = setup.py

COVERAGE_FLAGS = --source deeprob
UNITTEST_FLAGS = --verbose
UNITTEST_FLAGS = --verbose --start-directory test

.PHONY: pip_clean
.PHONY: clean

show_coverage: unit_tests
$(COVERAGE) report -m
# Print Coverage information on stdout
coverage_cli: unit_tests
$(COVERAGE) report

# Run Unit Tests
unit_tests:
$(COVERAGE) run $(COVERAGE_FLAGS) -m $(UNITTEST) $(UNITTEST_FLAGS)

pip_package: $(SETUP_SOURCE)
$(PYTHON) $< sdist bdist_wheel
$(COVERAGE) run $(COVERAGE_FLAGS) -m $(UNITTEST) discover $(UNITTEST_FLAGS)

# Upload the PIP package
pip_upload: pip_package
$(PYTHON) -m twine upload dist/*

pip_clean:
# Build the PIP package
pip_package: $(SETUP_SOURCE)
$(PYTHON) $< sdist bdist_wheel

# Clean files and directories
clean:
rm -rf .coverage
rm -rf dist
rm -rf build
rm -rf deeprob_kit.egg-info
109 changes: 43 additions & 66 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,40 +4,48 @@
![Logo](docs/deeprob-logo.svg)

## Abstract
DeeProb-kit is a Python library that implements deep probabilistic models such as various kinds of
**Sum-Product Networks**, **Normalizing Flows** and their possible combinations for probabilistic inference.
Some models are implemented using **PyTorch** for fast training and inference on GPUs.

**DeeProb-kit** is a general-purpose Python library that implements several deep probabilistic models,
putting an effort into unifying and standardizing the experiments across most of them.
The implementation in a single library of different models permits to easily combine them,
as a common practice in deep learning research.
The library consists of a collection of deep probabilistic models such as various kinds of
*Sum-Product Networks*, *Normalizing Flows* and their possible combinations for density estimation.
Some models are implemented using *PyTorch* for fast training and inference on GPUs.

## Features

- Inference algorithms for SPNs. <sup>[1](#r1) [4](#r4)</sup>
- Learning algorithms for SPNs structure. <sup>[1](#r1) [2](#r2) [3](#r3) [4](#r4)</sup>
- Chow-Liu Trees (CLT) as SPN leaves. <sup>[11](#r11) [12](#r12)</sup>
- Batch Expectation-Maximization (EM) for SPNs with arbitrarily leaves. <sup>[13](#r13) [14](#r14)</sup>
- Learning algorithms for SPNs structure. <sup>[1](#r1) [2](#r2) [3](#r3) [4](#r4) [5](#r5)</sup>
- Chow-Liu Trees (CLT) as SPN leaves. <sup>[12](#r12) [13](#r13)</sup>
- Batch Expectation-Maximization (EM) for SPNs with arbitrarily leaves. <sup>[14](#r14) [15](#r15)</sup>
- Structural marginalization and pruning algorithms for SPNs.
- High-order moments computation for SPNs.
- JSON I/O operations for SPNs and CLTs. <sup>[4](#r4)</sup>
- Plotting operations based on NetworkX for SPNs and CLTs. <sup>[4](#r4)</sup>
- Randomized And Tensorized SPNs (RAT-SPNs) using PyTorch. <sup>[5](#r5)</sup>
- Masked Autoregressive Flows (MAFs) using PyTorch. <sup>[6](#r6)</sup>
- Real Non-Volume-Preserving (RealNVP) and Non-linear Independent Component Estimation (NICE) flows. <sup>[7](#r7) [8](#r8)</sup>
- Deep Generalized Convolutional SPNs (DGC-SPNs) using PyTorch. <sup>[10](#r10)</sup>
- Randomized And Tensorized SPNs (RAT-SPNs) using PyTorch. <sup>[6](#r6)</sup>
- Masked Autoregressive Flows (MAFs) using PyTorch. <sup>[7](#r7)</sup>
- Real Non-Volume-Preserving (RealNVP) and Non-linear Independent Component Estimation (NICE) flows. <sup>[8](#r8) [9](#r9)</sup>
- Deep Generalized Convolutional SPNs (DGC-SPNs) using PyTorch. <sup>[11](#r11)</sup>

The collection of implemented models is summarized in the following table.
The supported data dimensionality for each model is showed in the **Input Dimensionality** column.
Moreover, the **Supervised** column tells which model is suitable for a supervised learning task,
The supported data dimensionality for each model is showed in the *Input Dimensionality* column.
Moreover, the *Supervised* column tells which model is suitable for a supervised learning task,
other than density estimation task.

| Model | Description | Input Dimensionality | Supervised |
|------------|----------------------------------------------------|:--------------------:|:----------:|
| Binary-CLT | Binary Chow-Liu Tree (CLT) | D ||
| SPN | Vanilla Sum-Product Network, using LearnSPN | D ||
| XPC | Random Probabilistic Circuits, using LearnXPC | D ||
| RAT-SPN | Randomized and Tensorized Sum-Product Network | D ||
| DGC-SPN | Deep Generalized Convolutional Sum-Product Network | (1, D, D); (3, D, D) ||
| MAF | Masked Autoregressive Flow | D ||
| NICE | Non-linear Independent Components Estimation Flow | (1, H, W); (3, H, W) ||
| RealNVP | Real-valued Non-Volume-Preserving Flow | (1, H, W); (3, H, W) ||

## Installation & Documentation
## Installation

The library can be installed either from PIP repository or by source code.
```shell
# Install from PIP repository
Expand All @@ -47,51 +55,17 @@ pip install deeprob-kit
# Install from `main` git branch
pip install -e git+https://github.com/deeprob-org/deeprob-kit.git@main#egg=deeprob-kit
```
The documentation is generated automatically by Sphinx (with Read-the-Docs theme), and it's hosted using GitHub Pages
at [deeprob-kit]().

## Datasets and Experiments
A collection of 29 binary datasets, which most of them are used in Probabilistic Circuits literature,
can be found at [UCLA-StarAI-Binary-Datasets](https://github.com/UCLA-StarAI/Density-Estimation-Datasets).

Moreover, a collection of 5 continuous datasets, commonly present in works regarding Normalizing Flows,
can be found at [MAF-Continuous-Datasets](https://zenodo.org/record/1161203#.Wmtf_XVl8eN).

After downloading them, the datasets must be stored in the `experiments/datasets` directory to be able to run the experiments
(and Unit Tests).
The experiments scripts are available in the `experiments` directory and can be launched using the command line
by specifying the dataset and hyper-parameters.

## Code Examples
A collection of code examples can be found in the `examples` directory.
However, the examples are not intended to produce state-of-the-art results,
but only to present the library.

The following table contains a description about them and a code complexity ranging from one to three stars.
The **Complexity** column consists of a measure that roughly represents how many features of the library are used, as well as
the expected time required to run the script.

| Example | Description | Complexity |
|----------------------|-----------------------------------------------------------------------------------|:----------:|
| naive_model.py | Learn, evaluate and print statistics about a naive factorized model. ||
| spn_plot.py | Instantiate, prune, marginalize and plot some SPNs. ||
| clt_plot.py | Learn a Binary CLT and plot it. ||
| spn_moments.py | Instantiate and compute moments statistics about the random variables. ||
| sklearn_interface.py | Learn and evaluate a SPN using the scikit-learn interface. ||
| spn_custom_leaf.py | Learn, evaluate and serialize a SPN with a user-defined leaf distribution. ||
| clt_to_spn.py | Learn a Binary CLT, convert it to a structured decomposable SPN and plot it. ||
| spn_clt_em.py | Instantiate a SPN with Binary CLTs, apply EM algorithm and sample some data. | ⭐⭐ |
| clt_queries.py | Learn a Binary CLT, plot it, run some queries and sample some data. | ⭐⭐ |
| ratspn_mnist.py | Train and evaluate a RAT-SPN on MNIST. | ⭐⭐ |
| dgcspn_olivetti.py | Train, evaluate and complete some images with DGC-SPN on Olivetti-Faces. | ⭐⭐ |
| dgcspn_mnist.py | Train and evaluate a DGC-SPN on MNIST. | ⭐⭐ |
| nvp1d_moons.py | Train and evaluate a 1D RealNVP on Moons dataset. | ⭐⭐ |
| maf_cifar10.py | Train and evaluate a MAF on CIFAR10. | ⭐⭐⭐ |
| nvp2d_mnist.py | Train and evaluate a 2D RealNVP on MNIST. | ⭐⭐⭐ |
| nvp2d_cifar10.py | Train and evaluate a 2D RealNVP on CIFAR10. | ⭐⭐⭐ |
| spn_latent_mnist.py | Train and evaluate a SPN on MNIST using the features extracted by an autoencoder. | ⭐⭐⭐ |

## Project Directories

The documentation is generated automatically by Sphinx using sources stored in the [docs](docs) directory.

A collection of code examples and experiments can be found in the [examples](examples) and [experiments](experiments)
directories respectively.
Moreover, benchmark code can be found in the [benchmark](benchmark) directory.

## Related Repositories

- [SPFlow](https://github.com/SPFlow/SPFlow)
- [RAT-SPN](https://github.com/cambridge-mlg/RAT-SPN)
- [Random-PC](https://github.com/gengala/Random-Probabilistic-Circuits)
Expand All @@ -109,30 +83,33 @@ the expected time required to run the script.

<b id="r4">4.</b> Molina, Vergari et al. [*SPFLOW : An easy and extensible library for deep probabilistic learning using Sum-Product Networks*][MolinaVergari2019]. CoRR (2019).

<b id="r5">5.</b> Peharz et al. [*Probabilistic Deep Learning using Random Sum-Product Networks*][Peharz2020a]. UAI (2020).
<b id="r5">5.</b> Di Mauro et al. [*Sum-Product Network structure learning by efficient product nodes discovery*][DiMauro2018]. AIxIA (2018).

<b id="r6">6.</b> Peharz et al. [*Probabilistic Deep Learning using Random Sum-Product Networks*][Peharz2020a]. UAI (2020).

<b id="r6">6.</b> Papamakarios et al. [*Masked Autoregressive Flow for Density Estimation*][Papamakarios2017]. NeurIPS (2017).
<b id="r7">7.</b> Papamakarios et al. [*Masked Autoregressive Flow for Density Estimation*][Papamakarios2017]. NeurIPS (2017).

<b id="r7">7.</b> Dinh et al. [*Density Estimation using RealNVP*][Dinh2017]. ICLR (2017).
<b id="r8">8.</b> Dinh et al. [*Density Estimation using RealNVP*][Dinh2017]. ICLR (2017).

<b id="r8">8.</b> Dinh et al. [*NICE: Non-linear Independent Components Estimation*][Dinh2015]. ICLR (2015).
<b id="r9">9.</b> Dinh et al. [*NICE: Non-linear Independent Components Estimation*][Dinh2015]. ICLR (2015).

<b id="r9">9.</b> Papamakarios, Nalisnick et al. [*Normalizing Flows for Probabilistic Modeling and Inference*][PapamakariosNalisnick2021]. JMLR (2021).
<b id="r10">10.</b> Papamakarios, Nalisnick et al. [*Normalizing Flows for Probabilistic Modeling and Inference*][PapamakariosNalisnick2021]. JMLR (2021).

<b id="r10">10.</b> Van de Wolfshaar and Pronobis. [*Deep Generalized Convolutional Sum-Product Networks for Probabilistic Image Representations*][VanWolfshaarPronobis2020]. PGM (2020).
<b id="r11">11.</b> Van de Wolfshaar and Pronobis. [*Deep Generalized Convolutional Sum-Product Networks for Probabilistic Image Representations*][VanWolfshaarPronobis2020]. PGM (2020).

<b id="r11">11.</b> Rahman et al. [*Cutset Networks: A Simple, Tractable, and Scalable Approach for Improving the Accuracy of Chow-Liu Trees*][Rahman2014]. ECML-PKDD (2014).
<b id="r12">12.</b> Rahman et al. [*Cutset Networks: A Simple, Tractable, and Scalable Approach for Improving the Accuracy of Chow-Liu Trees*][Rahman2014]. ECML-PKDD (2014).

<b id="r12">12.</b> Di Mauro, Gala et al. [*Random Probabilistic Circuits*][DiMauroGala2021]. UAI (2021).
<b id="r13">13.</b> Di Mauro, Gala et al. [*Random Probabilistic Circuits*][DiMauroGala2021]. UAI (2021).

<b id="r13">13.</b> Desana and Schnörr. [*Learning Arbitrary Sum-Product Network Leaves with Expectation-Maximization*][DesanaSchnörr2016]. CoRR (2016).
<b id="r14">14.</b> Desana and Schnörr. [*Learning Arbitrary Sum-Product Network Leaves with Expectation-Maximization*][DesanaSchnörr2016]. CoRR (2016).

<b id="r14">14.</b> Peharz et al. [*Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits*][Peharz2020b]. ICML (2020).
<b id="r15">15.</b> Peharz et al. [*Einsum Networks: Fast and Scalable Learning of Tractable Probabilistic Circuits*][Peharz2020b]. ICML (2020).

[Peharz2015]: http://proceedings.mlr.press/v38/peharz15.pdf
[PoonDomingos2011]: https://arxiv.org/pdf/1202.3732.pdf
[MolinaVergari2018]: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/16865/16619
[MolinaVergari2019]: https://arxiv.org/pdf/1901.03704.pdf
[DiMauro2018]: http://www.di.uniba.it/~ndm/pubs/dimauro18ia.pdf
[Peharz2020a]: http://proceedings.mlr.press/v115/peharz20a/peharz20a.pdf
[Papamakarios2017]: https://proceedings.neurips.cc/paper/2017/file/6c1da886822c67822bcf3679d04369fa-Paper.pdf
[Dinh2017]: https://arxiv.org/pdf/1605.08803v3.pdf
Expand Down
15 changes: 15 additions & 0 deletions benchmark/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
## Benchmark

The `benchmark` directory contains benchmark scripts of models and algorithms implemented in the library.
All the scripts can be launched by command line.
Every script will print out an estimation of the time required to run an algorithm (expressed in milliseconds).

The following table contains a description about the benchmark scripts and the external libraries used for comparison.
Please install the packages in `requirements.txt` to be able to run the scripts.

| Benchmark | Description | Compared Libraries |
|----------------------|------------------------------------------------------------------------------|:-------------------:|
| clt_queries.py | Benchmark on Binary Chow-Liu Trees (CLTs): learning, inference and sampling. | [*SPFlow*][SPFlow] |
| spn_queries.py | Benchmark on Sum-Product Networks (SPNs): inference and sampling. | [*SPFlow*][SPFlow] |

[SPFlow]: https://github.com/SPFlow/SPFlow
19 changes: 19 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
## Documentation

The documentation is generated automatically by Sphinx, using sources stored in the `docs` directory
(with a slightly modified [*Read-the-Docs*](https://readthedocs.org/) theme).
Sooner or later we will make it available also online, probably hosted using GitHub pages.

If you wish to build the documentation yourself, you will need to install the dependencies listed in `requirements.txt`
and execute the Makefile script as following:
```bash
# Clean existing documentation (optional)
make clean

# Build source code API documentation
make sphinx_api

# Build HTML documentation
make sphinx_html
```
The output HTML documentation can be found inside `_build/html` directory.
6 changes: 3 additions & 3 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@

# -- Project information -----------------------------------------------------
project = 'DeeProb-kit'
author = 'Lorenzo Loconte'
author = 'Lorenzo Loconte, Gennaro Gala'
copyright = '2021, {}'.format(author)
release = version = '0.6.4'
release = version = '1.0.0'

# -- General configuration ---------------------------------------------------
extensions = [
Expand All @@ -18,7 +18,7 @@
'sphinx_rtd_theme'
]
source_suffix = ['.rst', '.md']
exclude_patterns = ['api/modules.rst']
exclude_patterns = ['api/modules.rst', 'README.md']

# -- Options for HTML output -------------------------------------------------
html_static_path = ['_static']
Expand Down
3 changes: 0 additions & 3 deletions docs/home.md

This file was deleted.

6 changes: 5 additions & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,11 @@ A Python library for Deep Probabilistic Modeling
:maxdepth: 1
:caption: Read Me

home.md
markdown/home.md
markdown/docs.md
markdown/examples.md
markdown/experiments.md
markdown/benchmark.md

.. toctree::
:maxdepth: 5
Expand Down
2 changes: 2 additions & 0 deletions docs/markdown/benchmark.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
```{include} ../../benchmark/README.md
```
2 changes: 2 additions & 0 deletions docs/markdown/docs.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
```{include} ../../docs/README.md
```
2 changes: 2 additions & 0 deletions docs/markdown/examples.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
```{include} ../../examples/README.md
```
2 changes: 2 additions & 0 deletions docs/markdown/experiments.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
```{include} ../../experiments/README.md
```
3 changes: 3 additions & 0 deletions docs/markdown/home.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
```{include} ../../README.md
:relative-images:
```
31 changes: 31 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
## Code Examples

A collection of code examples can be found in the `examples` directory.
In order to run the code examples, it is necessary clone the repository.
However, additional datasets are not required.
Note that the given examples are not intended to produce state-of-the-art results,
but only to present the library.

The following table contains a description about them and a code complexity ranging from one to three stars.
The *Complexity* column consists of a measure that roughly represents how many features of the library are used, as well as
the expected time required to run the script.

| Example | Description | Complexity |
|----------------------|-----------------------------------------------------------------------------------|:----------:|
| naive_model.py | Learn, evaluate and print statistics about a naive factorized model. ||
| spn_plot.py | Instantiate, prune, marginalize and plot some SPNs. ||
| clt_plot.py | Learn a Binary CLT and plot it. ||
| spn_moments.py | Instantiate and compute moments statistics about the random variables. ||
| sklearn_interface.py | Learn and evaluate a SPN using the scikit-learn interface. ||
| spn_custom_leaf.py | Learn, evaluate and serialize a SPN with a user-defined leaf distribution. ||
| clt_to_spn.py | Learn a Binary CLT, convert it to a structured decomposable SPN and plot it. ||
| spn_clt_em.py | Instantiate a SPN with Binary CLTs, apply EM algorithm and sample some data. | ⭐⭐ |
| clt_queries.py | Learn a Binary CLT, plot it, run some queries and sample some data. | ⭐⭐ |
| ratspn_mnist.py | Train and evaluate a RAT-SPN on MNIST. | ⭐⭐ |
| dgcspn_olivetti.py | Train, evaluate and complete some images with DGC-SPN on Olivetti-Faces. | ⭐⭐ |
| dgcspn_mnist.py | Train and evaluate a DGC-SPN on MNIST. | ⭐⭐ |
| nvp1d_moons.py | Train and evaluate a 1D RealNVP on Moons dataset. | ⭐⭐ |
| maf_cifar10.py | Train and evaluate a MAF on CIFAR10. | ⭐⭐⭐ |
| nvp2d_mnist.py | Train and evaluate a 2D RealNVP on MNIST. | ⭐⭐⭐ |
| nvp2d_cifar10.py | Train and evaluate a 2D RealNVP on CIFAR10. | ⭐⭐⭐ |
| spn_latent_mnist.py | Train and evaluate a SPN on MNIST using the features extracted by an autoencoder. | ⭐⭐⭐ |
Loading

0 comments on commit 99a3079

Please sign in to comment.