Setting the consistency of atoms and coordinates in function definiti…

…ons (#16) * Setting the consistency of atoms and coordinates for representations * Added cov test setup * Changed assert to raise exception * Remove Python2 dependencies * Cleaned f90 headers * Added cites and changed interface to be cite-consistent (#17) * Bump version to minor, show breaking interface
qmlcode · Nov 12, 2024 · 43503af · 43503af
1 parent bbfac9b
commit 43503af
Show file tree

Hide file tree

Showing 40 changed files with 413 additions and 1,318 deletions.
diff --git a/.coveragerc b/.coveragerc
@@ -0,0 +1,5 @@
+[report]
+exclude_also =
+    def __repr__
+    raise ValueError
+    raise NotImplementedError
diff --git a/Makefile b/Makefile
@@ -16,7 +16,7 @@ all: env
 env:
 	${mamba} env create -f ./environment_dev.yaml -p ./env --quiet
 	${python} -m pre_commit install
-	# ${python} -m pip install -e .
+	${python} -m pip install -e .
 
 ./.git/hooks/pre-commit:
 	${python} -m pre_commit install
@@ -34,7 +34,7 @@ types:
 	${python} -m monkeytype list-modules | grep ${pkg} | parallel -j${j} "${python} -m monkeytype apply {} > /dev/null && echo {}"
 
 cov:
-	${python} -m pytest -vrs --cov=${pkg} --cov-report html tests
+	${python} -m pytest --cov=${pkg} --cov-config .coveragerc --cov-report html tests
 
 compile:
 	${python} _compile.py

diff --git a/README.rst b/README.rst
@@ -1,6 +1,6 @@
-====
-What
-====
+===============
+What is qmllib?
+===============
 
 ``qmllib`` is a Python/Fortran toolkit for representation of molecules and solids
 for machine learning of properties of molecules and solids. The library is not
@@ -10,7 +10,7 @@ the goal is to provide usable and efficient implementations of concepts such as
 representations and kernels.
 
 ==============
-QML or QMLLib?
+QML or qmllib?
 ==============
 
 ``qmllib`` represents the core library functionality derived from the original
@@ -19,9 +19,10 @@ applications, but without the high-level abstraction, for example SKLearn.
 
 This package is and should stay free-function design oriented.
 
-Breaking changes from ``qml``:
+If you are moving from ``qml`` to ``qmllib``, note that there are breaking
+changes to the interface to make it more consistent with both argument orders
+and function naming.
 
-* FCHL representations callable interface to be consistent with other representations (e.i. atoms, coordinates)
 
 ==============
 How to install
@@ -52,6 +53,7 @@ or if you want a specific feature branch
 
     pip install git+https://github.com/qmlcode/qmllib@feature_branch
 
+
 =================
 How to contribute
 =================
@@ -73,27 +75,72 @@ You know have a conda environment in `./env` and are ready to run
 
 happy developing
 
+
 ==========
 How to use
 ==========
 
-.. code-block:: python
-
-    raise NotImplementedError
+Notebook examples are coming. For now, see test files in ``tests/*``.
 
 ===========
 How to cite
 ===========
 
-.. code-block:: python
-
-    raise NotImplementedError
-
-=========
-What TODO
-=========
-
-* Setup ifort flags
-* Setup based on FCC env variable or --global-option flags
-* Find MKL from env (for example conda)
-* Find what numpy has been linked too (lapack or mkl)
+Please cite the representation that you are using accordingly.
+
+- | **Implementation**
+  Toolkit for Quantum Chemistry Machine Learning,
+  https://github.com/qmlcode/qmllib, <version or git commit>
+
+- | **FCHL19** ``generate_fchl19``
+  FCHL revisited: Faster and more accurate quantum machine learning,
+  Christensen, Bratholm, Faber, Lilienfeld,
+  J. Chem. Phys. 152, 044107 (2020),
+  https://doi.org/10.1063/1.5126701
+
+- | **FCHL18** ``generate_fchl18``
+  Alchemical and structural distribution based representation for universal quantum machine learning,
+  Faber, Christensen, Huang, Lilienfeld,
+  J. Chem. Phys. 148, 241717 (2018),
+  https://doi.org/10.1063/1.5020710
+
+- | **Columb Matrix** ``generate_columnb_matrix_*``
+  Fast and Accurate Modeling of Molecular Atomization Energies with Machine Learning,
+  Rupp, Tkatchenko, Müller, Lilienfeld,
+  Phys. Rev. Lett. 108, 058301 (2012)
+  DOI: https://doi.org/10.1103/PhysRevLett.108.058301
+
+- | **Bag of Bonds (BoB)** ``generate_bob``
+  Assessment and Validation of Machine Learning Methods for Predicting Molecular Atomization Energies,
+  Hansen, Montavon, Biegler, Fazli, Rupp, Scheffler, Lilienfeld, Tkatchenko, Müller,
+  J. Chem. Theory Comput. 2013, 9, 8, 3404–3419
+  https://doi.org/10.1021/ct400195d
+
+- | **SLATM** ``generate_slatm``
+  Understanding molecular representations in machine learning: The role of uniqueness and target similarity,
+  Huang, Lilienfeld,
+  J. Chem. Phys. 145, 161102 (2016)
+  https://doi.org/10.1063/1.4964627
+
+- | **ACSF** ``generate_acsf``
+  Atom-centered symmetry functions for constructing high-dimensional neural network potentials,
+  Behler,
+  J Chem Phys 21;134(7):074106 (2011)
+  https://doi.org/10.1063/1.3553717
+
+- | **AARAD** ``generate_aarad``
+  Alchemical and structural distribution based representation for universal quantum machine learning,
+  Faber, Christensen, Huang, Lilienfeld,
+  J. Chem. Phys. 148, 241717 (2018),
+  https://doi.org/10.1063/1.5020710
+
+
+===================
+What is left to do?
+===================
+
+- Compile based on ``FCC`` env variable
+- if ``ifort`` find the right flags
+- Find MKL from env (for example conda)
+- Find what numpy has been linked too (lapack or mkl)
+- Notebook examples
diff --git a/environment_dev.yaml b/environment_dev.yaml
@@ -14,6 +14,7 @@ dependencies:
     - pre-commit
     - pyarrow
     - pytest
+    - pytest-cov
     - scikit-learn
     - scipy
     # build

diff --git a/src/qmllib/kernels/fdistance.f90 b/src/qmllib/kernels/fdistance.f90
@@ -1,4 +1,3 @@
-
 subroutine fmanhattan_distance(A, B, D)
 
    implicit none

diff --git a/src/qmllib/kernels/fgradient_kernels.f90 b/src/qmllib/kernels/fgradient_kernels.f90
@@ -1,25 +1,3 @@
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
 subroutine fglobal_kernel(x1, x2, q1, q2, n1, n2, nm1, nm2, sigma, kernel)
 
     implicit none

diff --git a/src/qmllib/kernels/fkernels.f90 b/src/qmllib/kernels/fkernels.f90
@@ -1,4 +1,3 @@
-
 subroutine fget_local_kernels_gaussian(q1, q2, n1, n2, sigmas, &
         & nm1, nm2, nsigmas, kernels)
 

diff --git a/src/qmllib/kernels/fkpca.f90 b/src/qmllib/kernels/fkpca.f90
@@ -1,4 +1,3 @@
-
 subroutine fkpca(k, n, centering, kpca)
 
    implicit none

diff --git a/src/qmllib/kernels/fkwasserstein.f90 b/src/qmllib/kernels/fkwasserstein.f90
@@ -1,4 +1,3 @@
-
 module searchtools
 
    implicit none

diff --git a/src/qmllib/kernels/gradient_kernels.py b/src/qmllib/kernels/gradient_kernels.py
@@ -59,12 +59,10 @@ def get_global_kernel(
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
     N2 = np.array([len(Q) for Q in Q2], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
-    assert (
-        N2.shape[0] == X2.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
+    if not (N2.shape[0] == X2.shape[0]):
+        raise ValueError("Error: List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
     Q2_input = np.zeros((max(N2), X2.shape[0]), dtype=np.int32)
@@ -114,12 +112,10 @@ def get_local_kernels(
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
     N2 = np.array([len(Q) for Q in Q2], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
-    assert (
-        N2.shape[0] == X2.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("Error: List of charges does not match shape of representations")
+    if not (N2.shape[0] == X2.shape[0]):
+        raise ValueError("Error: List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
     Q2_input = np.zeros((max(N2), X2.shape[0]), dtype=np.int32)
@@ -176,12 +172,10 @@ def get_local_kernel(
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
     N2 = np.array([len(Q) for Q in Q2], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
-    assert (
-        N2.shape[0] == X2.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
+    if not (N2.shape[0] == X2.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
     Q2_input = np.zeros((max(N2), X2.shape[0]), dtype=np.int32)
@@ -228,9 +222,8 @@ def get_local_symmetric_kernels(X1: ndarray, Q1: List[List[int]], SIGMAS: List[f
 
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("Error: List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
     for i, q in enumerate(Q1):
@@ -275,9 +268,8 @@ def get_local_symmetric_kernel(
 
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("Error: List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
     for i, q in enumerate(Q1):
@@ -329,12 +321,10 @@ def get_atomic_local_kernel(
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
     N2 = np.array([len(Q) for Q in Q2], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
-    assert (
-        N2.shape[0] == X2.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
+    if not (N2.shape[0] == X2.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
     Q2_input = np.zeros((max(N2), X2.shape[0]), dtype=np.int32)
@@ -394,12 +384,10 @@ def get_atomic_local_gradient_kernel(
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
     N2 = np.array([len(Q) for Q in Q2], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
-    assert (
-        N2.shape[0] == X2.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
+    if not (N2.shape[0] == X2.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
     Q2_input = np.zeros((max(N2), X2.shape[0]), dtype=np.int32)
@@ -475,12 +463,10 @@ def get_local_gradient_kernel(
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
     N2 = np.array([len(Q) for Q in Q2], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
-    assert (
-        N2.shape[0] == X2.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
+    if not (N2.shape[0] == X2.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
     Q2_input = np.zeros((max(N2), X2.shape[0]), dtype=np.int32)
@@ -552,12 +538,10 @@ def get_gdml_kernel(
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
     N2 = np.array([len(Q) for Q in Q2], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
-    assert (
-        N2.shape[0] == X2.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
+    if not (N2.shape[0] == X2.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
     Q2_input = np.zeros((max(N2), X2.shape[0]), dtype=np.int32)
@@ -627,9 +611,8 @@ def get_symmetric_gdml_kernel(
 
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
 
@@ -692,12 +675,10 @@ def get_gp_kernel(
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
     N2 = np.array([len(Q) for Q in Q2], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
-    assert (
-        N2.shape[0] == X2.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
+    if not (N2.shape[0] == X2.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
     Q2_input = np.zeros((max(N2), X2.shape[0]), dtype=np.int32)
@@ -765,9 +746,8 @@ def get_symmetric_gp_kernel(
 
     N1 = np.array([len(Q) for Q in Q1], dtype=np.int32)
 
-    assert (
-        N1.shape[0] == X1.shape[0]
-    ), "Error: List of charges does not match shape of representations"
+    if not (N1.shape[0] == X1.shape[0]):
+        raise ValueError("List of charges does not match shape of representations")
 
     Q1_input = np.zeros((max(N1), X1.shape[0]), dtype=np.int32)
-Original file line number
+Diff line change
@@ Expand Up / @@ -14,6 +14,7 @@ dependencies: @@
         - pre-commit
         - pyarrow
         - pytest
+        - pytest-cov
         - scikit-learn
         - scipy
         # build
@@ Expand Down @@
Original file line number	Diff line number	Diff line change
		@@ -1,4 +1,3 @@

		subroutine fmanhattan_distance(A, B, D)

		implicit none
Expand Down
Original file line number	Diff line number	Diff line change
		@@ -1,25 +1,3 @@






















		subroutine fglobal_kernel(x1, x2, q1, q2, n1, n2, nm1, nm2, sigma, kernel)

		implicit none
Expand Down
Original file line number	Diff line number	Diff line change
		@@ -1,4 +1,3 @@

		subroutine fget_local_kernels_gaussian(q1, q2, n1, n2, sigmas, &
		& nm1, nm2, nsigmas, kernels)

Expand Down
Original file line number	Diff line number	Diff line change
		@@ -1,4 +1,3 @@

		subroutine fkpca(k, n, centering, kpca)

		implicit none
Expand Down