A brief guide for PyDMD developers on how to leverage GLAPPO for new and old PyDMD codes.
Be careful on the argument on which you call build_linalg_module()
. It may happen that some NumPy arrays
are created en passant to be used as arguments for more complicated functions. These are not good candidates
for build_linalg_module()
, as they clearly do not convey information about user preferences on array typing.
Always check that the user is providing appropriate array pairs/triplets in PyDMD entrypoints (e.g. fit()
).
linalg.py
provides some utility functions (is_array(X)
, assert_same_linalg_type(X,*args)
) to facilitate writing
this kind of checks.
No need to check the output of internal functions like DMDBase._optimal_dmd_matrices()
. This clutters the
code and provides no additional value, our PRs are carefully reviewed by developers from the core team of
PyDMD.
Test new and old code for all new possibilities introduced by GLAPPO, for instance tensorized training. There are
many examples in tests
.
GLAPPO enables batched DMD, namely applying the same DMD operator to multiple datasets in one highly optimized call.
In order to support batching, make sure you index shapes starting from the last one, e.g. shape[-2]
instead of
shape[0]
to identify the space dimension, or shape[-1]
instead of shape[1]
to identify the time dimension).
Also, all calls to .T
should be dropped in favor of .swapaxes(-1, -2)
. You find tons of examples in classes which
already support batched training.
In this paragraph we're going to look at some snippets from the diff of the GLAPPification of SubspaceDMD
.
During the development I recommend to keep the
GLAPPO interface
close to you.
-import numpy as np
-
from .dmdbase import DMDBase
from .dmdoperator import DMDOperator
+from .utils import compute_svd
+from pydmd.linalg import build_linalg_module, is_array
from .snapshots import Snapshots
GLAPPO defines its own linear algebra functions, therefore you should not need NumPy. Simple
indexes arrays (for example) generated with np.arange()
are an exception, but you should
make sure not to mix NumPy and GLAPPO. It's better to try to remove all explicit references to
a linear algebra package (and GLAPPO provides its own arange
method).
def compute_operator(self, Yp, Yf):
[...]
+ n = Yp.shape[-2] // 2
- n = Yp.shape[0] // 2
Using .shape[-2]
instead of .shape[0]
is critical, since during tensorized training
Yp
is expected to have 3 dimensions. The first dimensions is commonly assumed to be the
batch dimension, therefore the other axes (space and time) are shifted one position to the right.
+ linalg_module = build_linalg_module(Yp)
- Uq, _, _ = reducedsvd(O)
+ O = linalg_module.multi_dot((Yf, Vp, Vp.swapaxes(-1, -2).conj()))
linalg_module
is the implementation of the GLAPPO interface for the array type represented by
the input. It contains all the available methods, in the specific flavor for the desired
linear algebra backend.
- Uq1, Uq2 = Uq[:n, :r], Uq[n:, :r]
+ Uq1 = Uq[..., :n, :r]
+ Uq2 = Uq[..., n:, :r]
Same as for the shape
, it's important to use the leading ...
in order to be generic
on the number of axes before the space/time trailing axes. The change above supports 2
axes as well as 3, since the (optional) leading batch dimension is matched by the three
dots.
- M = Uq2.dot(V) * np.reciprocal(S)
+ M = linalg_module.dot(Uq2, V) / (S[:, None] if Yp.ndim == 3 else S)
np.reciprocal
is not part of GLAPPO, therefore we use the standard/
operator;- In case we're performing a tensorized training (
Yp.ndim == 3
) we need to make sure that the divider is broadcasted properly against the tensor on the left side of the operator. Tests are your friend here: just run a "fake" tensorized training (i.e. batch dimension = 1) and they're going to fail if you got the position ofNone
wrong.
+ elif is_array(self._rescale_mode):
GLAPPO has its own suite of utility methods to work with generic matrices, you can look it up here.
+ scaling_factors = linalg_module.to(
+ self.as_array, self._rescale_mode
+ )
linalg_module.to(reference, *args)
takes a reference
and converts all args
to the same linear
algebra backend. It also makes sure that all the arrays are on the same device as the reference
(e.g. CPU/GPU for PyTorch).
- high_dimensional_eigenvectors = M.dot(W) * np.reciprocal(
- self.eigenvalues
+ high_dimensional_eigenvectors = linalg_module.dot(M, W) / (
+ self.eigenvalues[:, None] if M.ndim == 3 else self.eigenvalues
)
- def fit(self, X):
+ def fit(self, X, batch=False):
[...]
self._reset()
- self._snapshots_holder = Snapshots(X)
+ self._snapshots_holder = Snapshots(X, batch=batch)
Batch training requires a flag to be set to True
. This is needed because the
default behavior of PyDMD for ndim > 2
arrays is to flatten all the axes except
the last one.
- n_samples = self.snapshots.shape[1]
- Y0 = self.snapshots[:, :-3]
- Y1 = self.snapshots[:, 1:-2]
- Y2 = self.snapshots[:, 2:-1]
- Y3 = self.snapshots[:, 3:]
+ n_samples = self.snapshots.shape[-1]
+ Y0 = self.snapshots[..., :-3]
+ Y1 = self.snapshots[..., 1:-2]
+ Y2 = self.snapshots[..., 2:-1]
+ Y3 = self.snapshots[..., 3:]
Generic dimension indexing, same as above.
- Yp = np.vstack((Y0, Y1))
- Yf = np.vstack((Y2, Y3))
+ linalg_module = build_linalg_module(X)
+ Yp = linalg_module.cat((Y0, Y1), axis=-2)
+ Yf = linalg_module.cat((Y2, Y3), axis=-2)
GLAPPO provides its own method cat()
instead of h/v/stack
. This method takes
an axis
parameter which you can use to select the axis to stack on. In this case
we're stacking on the space dimension.
Due to the strong requirements of torch.mul
and torch.linalg.multi_dot
, the implementation of these
two functions in pytorch_linalg.py
forces a cast to the biggest complex type found in the argumnets.
We decided to take this path instead of placing the burden on user/implementors since for some algorithms
it's hard to control consistently whether the output is complex or real (e.g. torch.linalg.eig
) and casts
will happen internally quite often. This damages memory efficiency and performance, but ensures correct
results. It will be subject of investigation if we receive complains from our users.
This kind of casts is logged, in order to get the logs enable the INFO
logging level:
import logging
logging.basicConfig(level=logging.DEBUG)