Skip to content

Commit

Permalink
Merge branch 'main' into adapt-models
Browse files Browse the repository at this point in the history
  • Loading branch information
frostedoyster committed Nov 21, 2024
2 parents 364bd9b + 444fb72 commit f598aab
Show file tree
Hide file tree
Showing 27 changed files with 195 additions and 68 deletions.
13 changes: 7 additions & 6 deletions docs/src/advanced-concepts/fitting-generic-targets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ like this:
type:
cartiesian:
rank: 1
num_properties: 10
num_subtargets: 10
The crucial fields here are:

Expand All @@ -77,12 +77,13 @@ The crucial fields here are:
a Cartesian vector. The ``rank`` field specifies the rank of the target. For
Cartesian vectors, the rank is 1. Other possibilities for the ``type`` are
``scalar`` (for a scalar target) and ``spherical`` (for a spherical tensor).
- ``num_properties``: This field specifies the number of independent properties in the
target that need to be learned. They are treated as entirely equivalent by models in
- ``num_subtargets``: This field specifies the number of sub-targets that need to be
learned as part of this target. They are treated as entirely equivalent by models in
metatrain and will often be represented as outputs of the same neural network layer.
A common use case for this field is when you are learning a discretization of a
continuous target, such as the grid points of a band structure. In this case, there
are 10 properties.
continuous target, such as the grid points of a band structure. In the example
above, there are 10 sub-targets. In ``metatensor``, these correspond to the number
of ``properties`` of the target.

A few more words should be spent on ``spherical`` targets. These should be made of a
certain number of irreducible spherical tensors. For example, if you are learning a
Expand All @@ -103,7 +104,7 @@ the target section should would look like this:
irreps:
- {o3_lambda: 0, o3_sigma: 1}
- {o3_lambda: 2, o3_sigma: 1}
num_properties: 10
num_subtargets: 10
where ``o3_lambda`` specifies the L value of the spherical tensor and ``o3_sigma`` its
parity with respect to inversion (1 for proper tensors, -1 for pseudo-tensors).
Expand Down
2 changes: 1 addition & 1 deletion docs/src/advanced-concepts/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,4 +13,4 @@ such as output naming, auxiliary outputs, and wrapper models.
multi-gpu
auto-restarting
fine-tuning
fitting-generic-targets
preparing-generic-targets
112 changes: 112 additions & 0 deletions docs/src/advanced-concepts/preparing-generic-targets.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
Preparing generic targets for reading by metatrain
==================================================

Besides energy-like targets, the library also supports reading (and training on)
more generic targets.

Input file
----------

In order to read a generic target, you will have to specify its layout in the input
file. Suppose you want to learn a target named ``mtt::my_target``, which is
represented as a set of 10 independent per-atom 3D Cartesian vector (we need to
learn 3x10 values for each atom). The ``target`` section in the input file
should look
like this:

.. code-block:: yaml
targets:
mtt::my_target:
read_from: dataset.xyz
key: my_target
quantity: ""
unit: ""
per_atom: True
type:
cartiesian:
rank: 1
num_subtargets: 10
The crucial fields here are:

- ``per_atom``: This field should be set to ``True`` if the target is a per-atom
property. Otherwise, it should be set to ``False``.
- ``type``: This field specifies the type of the target. In this case, the target is
a Cartesian vector. The ``rank`` field specifies the rank of the target. For
Cartesian vectors, the rank is 1. Other possibilities for the ``type`` are
``scalar`` (for a scalar target) and ``spherical`` (for a spherical tensor).
- ``num_subtargets``: This field specifies the number of sub-targets that need to be
learned as part of this target. They are treated as entirely equivalent by models in
metatrain and will often be represented as outputs of the same neural network layer.
A common use case for this field is when you are learning a discretization of a
continuous target, such as the grid points of a band structure. In the example
above, there are 10 sub-targets. In ``metatensor``, these correspond to the number
of ``properties`` of the target.

A few more words should be spent on ``spherical`` targets. These should be made of a
certain number of irreducible spherical tensors. For example, if you are learning a
property that can be decomposed into two proper spherical tensors with L=0 and L=2,
the target section should would look like this:

.. code-block:: yaml
targets:
mtt::my_target:
quantity: ""
read_from: dataset.xyz
key: energy
unit: ""
per_atom: True
type:
spherical:
irreps:
- {o3_lambda: 0, o3_sigma: 1}
- {o3_lambda: 2, o3_sigma: 1}
num_subtargets: 10
where ``o3_lambda`` specifies the L value of the spherical tensor and ``o3_sigma`` its
parity with respect to inversion (1 for proper tensors, -1 for pseudo-tensors).

Preparing your targets -- ASE
-----------------------------

If you are using the ASE readers to read your targets, you will have to save them
either in the ``.info`` (if the target is per structure, i.e. not per atom) or in the
``.arrays`` (if the target is per atom) attributes of the ASE atoms object. Then you can
dump the atoms object to a file using ``ase.io.write``.

The ASE reader will automatically try to reshape the target data to the format expected
given the target section in the input file. In case your target data is invalid, an
error will be raised.

Reading targets with more than one spherical tensor is not supported by the ASE reader.
In that case, you should use the metatensor reader.

Preparing your targets -- metatensor
------------------------------------

If you are using the metatensor readers to read your targets, you will have to save them
as a ``metatensor.torch.TensorMap`` object with ``metatensor.torch.TensorMap.save()``
into a file with the ``.npz`` extension.

The metatensor reader will verify that the target data in the input files corresponds to
the metadata in the provided ``TensorMap`` objects. In case of a mismatch, errors will
be raised.

In particular:

- if the target is per atom, the samples should have the [``system``, ``atom``] names,
otherwise the [``system``] name.
- if the target is a ``scalar``, only one ``TensorBlock`` should be present, the keys
of the ``TensorMap`` should be a ``Labels.single()`` object, and there should be no
components.
- if the target is a ``cartesian`` tensor, only one ``TensorBlock`` should be present,
the keys of the ``TensorMap`` should be a ``Labels.single()`` object, and there should
be one components, with names [``xyz``] for a rank-1 tensor,
[``xyz_1``, ``xyz_2``, etc.] for higher rank tensors.
- if the target is a ``spherical`` tensor, the ``TensorMap`` can contain multiple
``TensorBlock``, each corresponding to one irreducible spherical tensor. The keys of
the ``TensorMap`` should have the ``o3_lambda`` and ``o3_sigma`` names, corresponding
to the values provided in the input file, and each ``TensorBlock`` should be one
component label, with name ``o3_mu`` and values going from -L to L.
2 changes: 1 addition & 1 deletion docs/src/architectures/soap-bpnn.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ SOAP-BPNN

This is a Behler-Parrinello neural network :footcite:p:`behler_generalized_2007` with
using features based on the Smooth overlab of atomic positions (SOAP)
:footcite:p:`bartok_representing_2013`. The SOAP features are calculated wit `rascaline
:footcite:p:`bartok_representing_2013`. The SOAP features are calculated with `rascaline
<https://luthaf.fr/rascaline/latest/index.html>`_.

Installation
Expand Down
2 changes: 1 addition & 1 deletion examples/programmatic/llpr/llpr.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@
"unit": "kcal/mol",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": False,
"stress": False,
"virial": False,
Expand Down
6 changes: 3 additions & 3 deletions examples/programmatic/llpr_forces/force_llpr.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
"unit": "kcal/mol",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": {
"read_from": "train.xyz",
"file_format": ".xyz",
Expand Down Expand Up @@ -58,7 +58,7 @@
"unit": "kcal/mol",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": {
"read_from": "valid.xyz",
"file_format": ".xyz",
Expand Down Expand Up @@ -87,7 +87,7 @@
"unit": "kcal/mol",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": {
"read_from": "test.xyz",
"file_format": ".xyz",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ def test_regression_train():
"unit": "eV",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": False,
"stress": False,
"virial": False,
Expand Down
2 changes: 1 addition & 1 deletion src/metatrain/experimental/gap/tests/test_errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ def test_ethanol_regression_train_and_invariance():
"unit": "kcal/mol",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": {
"read_from": DATASET_ETHANOL_PATH,
"reader": "ase",
Expand Down
4 changes: 2 additions & 2 deletions src/metatrain/experimental/gap/tests/test_regression.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ def test_regression_train_and_invariance():
"unit": "kcal/mol",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": False,
"stress": False,
"virial": False,
Expand Down Expand Up @@ -127,7 +127,7 @@ def test_ethanol_regression_train_and_invariance():
"key": "energy",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": {
"read_from": DATASET_ETHANOL_PATH,
"reader": "ase",
Expand Down
2 changes: 1 addition & 1 deletion src/metatrain/experimental/gap/tests/test_torchscript.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def test_torchscript():
"unit": "kcal/mol",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": False,
"stress": False,
"virial": False,
Expand Down
16 changes: 15 additions & 1 deletion src/metatrain/experimental/pet/trainer.py
Original file line number Diff line number Diff line change
Expand Up @@ -117,7 +117,7 @@ def train(
checkpoint_path = None

########################################
# STARTNG THE PURE PET TRAINING SCRIPT #
# STARTING THE PURE PET TRAINING SCRIPT #
########################################

logging.info("Initializing PET training...")
Expand Down Expand Up @@ -165,6 +165,20 @@ def train(
f"CUDA is deterministic: {FITTING_SCHEME.CUDA_DETERMINISTIC}"
)

st = """
Legend: LR -> Learning Rate
MAE -> Mean Square Error
RMSE -> Root Mean Square Error
V-E-MAE/at -> MAE of the Energy per atom on the Validation set
V-E-RMSE/at -> RMSE of the Energy per atom on the Validation set
V-F-MAE -> MAE of the Forces on the Validation set
V-F-RMSE -> RMSE of the Forces on the Validation set
T-E-MAE/at -> MAE of the Energy per atom on the Training set
T-E-RMSE/at -> RMSE of the Energy per atom on the Training set
T-F-MAE -> MAE of the Forces on the Training set
T-F-RMSE -> RMSE of the Forces on the Training set
Units of the Energy and Forces are the same units given in input"""
training_configuration_log += st
logging.info(training_configuration_log)

set_reproducibility(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ def test_continue(monkeypatch, tmp_path):
"unit": "eV",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": False,
"stress": False,
"virial": False,
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -255,7 +255,7 @@ def test_vector_output(per_atom):
"type": {
"spherical": {"irreps": [{"o3_lambda": 1, "o3_sigma": 1}]}
},
"num_properties": 100,
"num_subtargets": 100,
"per_atom": per_atom,
}
)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ def test_regression_train():
"unit": "eV",
"type": "scalar",
"per_atom": False,
"num_properties": 1,
"num_subtargets": 1,
"forces": False,
"stress": False,
"virial": False,
Expand Down
2 changes: 1 addition & 1 deletion src/metatrain/share/schema-dataset.json
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@
"per_atom": {
"type": "boolean"
},
"num_properties": {
"num_subtargets": {
"type": "integer"
},
"type": {
Expand Down
2 changes: 1 addition & 1 deletion src/metatrain/utils/data/readers/readers.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ def read_targets(
is_energy = (
(target["quantity"] == "energy")
and (not target["per_atom"])
and target["num_properties"] == 1
and target["num_subtargets"] == 1
and target["type"] == "scalar"
)
energy_or_generic = "energy" if is_energy else "generic"
Expand Down
12 changes: 6 additions & 6 deletions src/metatrain/utils/data/target_info.py
Original file line number Diff line number Diff line change
Expand Up @@ -278,13 +278,13 @@ def _get_scalar_target_info(target: DictConfig) -> TargetInfo:

block = TensorBlock(
# float64: otherwise metatensor can't serialize
values=torch.empty(0, target["num_properties"], dtype=torch.float64),
values=torch.empty(0, target["num_subtargets"], dtype=torch.float64),
samples=Labels(
names=sample_names,
values=torch.empty((0, len(sample_names)), dtype=torch.int32),
),
components=[],
properties=Labels.range("properties", target["num_properties"]),
properties=Labels.range("properties", target["num_subtargets"]),
)
layout = TensorMap(
keys=Labels.single(),
Expand Down Expand Up @@ -321,15 +321,15 @@ def _get_cartesian_target_info(target: DictConfig) -> TargetInfo:
block = TensorBlock(
# float64: otherwise metatensor can't serialize
values=torch.empty(
[0] + [3] * len(components) + [target["num_properties"]],
[0] + [3] * len(components) + [target["num_subtargets"]],
dtype=torch.float64,
),
samples=Labels(
names=sample_names,
values=torch.empty((0, len(sample_names)), dtype=torch.int32),
),
components=components,
properties=Labels.range("properties", target["num_properties"]),
properties=Labels.range("properties", target["num_subtargets"]),
)
layout = TensorMap(
keys=Labels.single(),
Expand Down Expand Up @@ -366,15 +366,15 @@ def _get_spherical_target_info(target: DictConfig) -> TargetInfo:
values=torch.empty(
0,
2 * irrep["o3_lambda"] + 1,
target["num_properties"],
target["num_subtargets"],
dtype=torch.float64,
),
samples=Labels(
names=sample_names,
values=torch.empty((0, len(sample_names)), dtype=torch.int32),
),
components=components,
properties=Labels.range("properties", target["num_properties"]),
properties=Labels.range("properties", target["num_subtargets"]),
)
keys.append([irrep["o3_lambda"], irrep["o3_sigma"]])
blocks.append(block)
Expand Down
2 changes: 1 addition & 1 deletion src/metatrain/utils/omegaconf.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def _resolve_single_str(config: str) -> DictConfig:
"unit": None,
"per_atom": False,
"type": "scalar",
"num_properties": 1,
"num_subtargets": 1,
}
)

Expand Down
2 changes: 1 addition & 1 deletion tests/resources/test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ training_set:
quantity: force
key: forces
per_atom: true
num_properties: 3
num_subtargets: 3

test_set: 0.5
validation_set: 0.1
Loading

0 comments on commit f598aab

Please sign in to comment.